Observability and OpenTelemetry in Azure

Observability is an important component of cloud architecture, it is the ability to understand the current state of a system by examining the data output in the form of logs, metrics and traces. This data can be massively helpful for security, resilience, reliability, cost-effectiveness and technical decision-making as you continually review and refine your cloud workloads.

OpenTelemetry has become the go-to way to to enable observability in the cloud. It’s an open source, vendor neutral framework with software and API (in the OTLP Specification) for transmitting, collecting, transforming and accessing telemetry data. There are many popular software languages supported to send telemetry from your apps. It is an incubating stage project with the Cloud Native Computing Foundation with over 40 vendors using it. It is supported by major cloud providers and other major alerting, monitoring, and analysis platforms including Prometheus, Grafana, DataDog, NewRelic, Dynatrace, Splunk, LightStep and many more.

The key concepts you need to know before reading on are:

Collector – Way to receive, process, and export telemetry data (think proxy) made of the below components (there are more but this is for base concepts)
Receiver – Receive data via pull (eg azure monitor receiver) or push (endpoints like http, grpc) items, there can be multiple
Exporter – Export telemetry data to another location (eg the azure monitor exporter, Prometheus, etc). There can also be multiple exporters
Language API Exporter – The app code exporter (eg python, .NET etc) to send to the OpenTelemetry collector

Observability in Azure

In the past, Azure has enabled observability using Application Insights and Azure Monitor which has their proprietary telemetry format, SDK, API, alerting system, analysis system, query language (Kusto QL), and storage.

From my experiences working with monitoring many applications and resources in Azure, the convenience of just flicking the switch on AppInsights autoinstrumentation and/or adding a few lines of code for extra logging, metrics, and traces makes observability a breeze to setup. It is similarly easy for enabling telemetry agents via VM Insights in VMs and Container Insights in AKS.

Azure Monitor has plenty of excellent features that I make use of daily and make for a great observability platform. There is autoinstrumentation available for many Azure services. So if observability is already available under this proprietary system, and easy to enable, why switch to OpenTelemetry?

Why use OpenTelemetry?

There’s been a lot of commitment by Microsoft to add OpenTelemetry support to Azure. They have listed reasons why they are investing in OpenTelemetry, notable points are being vendor neutral and more performant. I’ll drill into some of these points and other benefits (and drawbacks) for making the switch to OpenTelemetry.

Benefits

Vendor neutral

Using OpenTelemetry makes your applications, containers, and VMs etc more portable to migrate to another cloud provider and easy to reconfigure to send telemetry to another location. You can use Microsoft’s libraries to send telemetry (only) to Azure Monitor/AppInsights or the vanilla OpenTelemetry libraries to be truly vendor neutral. Note that you only get support from Microsoft if you are using their libraries (in the Azure SDK) and not the open source OpenTelemetry libraries or components.

Cost efficiency

Azure log workspaces can get expensive quickly. I’ve seen firsthand the need to alter code, change retention time, and switch on sampling as the costs were getting quite high. If you encounter this situation and the above doesn’t get the cost reduction results you wanted, you could easily switch to an alternative telemetry collector than Azure Monitor.

You could even switch to a custom AKS Kubernetes clusters using all the open source OpenTelemetry tools (eg structured similarly to the Grafana LGTM stack – not intended for PROD). OpenTelemetry supports gRPC and is more performant at scale than using AppInsights SDK so you’ll likely see savings in compute, storage and any egress of data.

Interoperability

OpenTelemetry is considered the industry standard for observability. Using this standard opens up your choices to easily use other tools (or preferred combinations) to analyse your telemetry data than what’s offered in Azure (Currently: Azure Monitor, Managed Grafana, and Managed Prometheus).

Centralised Observability

If you are running a hybrid or multi-cloud solution, you can handle all your monitoring, alerting, and logging in one place (and it doesn’t have to be in Azure Monitor or use Azure Arc). Using OpenTelemetry would also make it less disruptive if you stopped using Azure altogether and wanted to shift cloud provider in a multi-cloud solution (as long as you were using the vanilla OpenTelemetry libraries in your code).

Drawbacks

Time and Cost

For existing environments, if you have lots of existing apps and infrastructure, time and effort switching code in infrastructure configuration over would be the most important thing to calculate. It’s going to come at a cost to switch with no real visible gain except for flexibility (and maybe some cost reductions on Open Telemetry’s performance).

If you plan to use another monitoring/alerting platform than Azure Monitor, and send data there, this time and cost factor may not be so important. You’d also have to calculate the costs of that 3rd party platform and the costs of sending data there (a common way is via Event Hubs).

For new environments; or new infrastructure and apps in an existing environment, there’s no real time-cost argument (nor any other) to not use OpenTelemetry. AppInsights supports OTLP with the same connection string as used for its existing API to collect telemetry. Either the Azure Monitor distro exporter or OpenTelemetry dotnet exporter are a few lines of code to add (the same as you would have with AppInsights). Autoinstrumentation is also quite simple to add.

Still a work in progress

Azure doesn’t (currently) have 100% coverage of all its services to enable OpenTelemetry and the dust hasn’t settled on the raft of changes going on, many are in preview. Microsoft recommends using Azure Monitor OpenTelemetry for application instrumentation in .NET instead of what they now label as Application Insights SDK (Classic API). The catch is it’s not currently 100% backward compatible with the ‘Classic AppInsights API’ features so it’s best to check the feature table if moving existing apps to it.

AKS autoinstrumentation using OpenTelemetry is in private preview and work is underway for VM OpenTelemetry autoinstrumentation in the Azure Monitor Agent. Other areas like .NET Aspire, the Azure SDK and services mentioned later, OpenTelemetry support is quite feature-rich.

There’s more information on that new features coming soon on this roadmap blog post by Matt McCleary

No web browser telemetry like AppInsights

Web browser OpenTelemetry is not currently supported as it is for existing Application Insights JavaScript SDK. So you would need to find or implement an alternative if this is valuable.

Azure Monitor bells and whistles have a condition attached

Your app code will bound to Azure Monitor if you want to take advantage of the additional features in the Azure Monitor OpenTelemetry distro (see the next section) to send telemetry data. Which would result in extra costs/time in code changes to migrate to a different cloud. If the benefits provided by this distro aren’t important to you, you can always use the standard OpenTelemetry distro.

If you have lots of projects (or future ones), it’d be very important to decide early which distro you wish to use as it affects your app portability.

Enabling Instrumentation and the Azure Monitor OpenTelemetry Distribution

The term distribution for OpenTelemetry may not be what you think, it’s a customization of an OpenTelemetry component (collectors, exporters etc), usually adding features specific to the vendor. Microsoft has created the Azure Monitor OpenTelemetry distribution to support features either found in AppInsights SDK or specific to the Azure ecosystem.

Here are some of the more important additional features (there’s more noted by Microsoft) as reasons to use it over the standard distro:

Sampling compatible with the ‘Classic API’
Microsoft Entra auth (to appinsights i assume)
Autoretry and offline storage
Live Metrics

There’s a great overview of observability available for .NET in the MSLearn – .NET Observability with OpenTelemetry page and Azure’s OpenTelemetry here MSLearn – OpenTelemetry on Azure although a lot is specific to Azure Monitor.

Using the OpenTelemetry distribution

If you choose the OpenTelemetry distro over the Azure Monitor one (perhaps for portability or to send elsewhere than Azure Monitor) you will have to set up a few extra things. Also, where you send data is quite important. You want the lowest possible latency as you have lost the offline storage/retry feature in Azure Monitor’s distro. You can use sidecar containers (and it helps with portability) using the OpenTelemetry collector that then sends on to Azure Monitor or a custom built central collector (ideally in Kubernetes), these sidecars are quite easy to set up on several platforms as motioned later and Microsoft has guides to using these in some services.

Manual Instrumentation

For manual instrumentation (ie. in app code), Microsoft have added distro items to the OpenTelemetry distro register for exporting telemetry using OTLP to Azure Monitor in ASP.NET Core, .NET core, Python, Java, and Node. These components allow quick, low-code (one line for ASP.NET core) enablement of sending OTLP data to Azure Monitor, see the guide here. For more information, on how it integrates with .NET see Matt McCleary’s distro announcement.

If you don’t want to send telemetry to Azure, don’t want vendor lock-in (by using the Azure monitor distro) or want to send it to extra location, you can use the standard OpenTelemetry distro exporters for .NET, Java, Node, PHP, Go and Python like the GitHub opentelemetry-dotnet repo.

Autoinstrumentation

For autoinstrumentation, Azure’s turnkey autoinstrumentation in their service offerings are noted here, however it is unclear if they have all switched to OpenTelemetry and for most services you do not have the option to point to a collector other than Azure Monitor. To reduce your vendor lock-in, or if you want to reduce data sent to Azure Monitor, it makes sense to use the OpenTelemetry standard autoinstrumentation (called Zero-code instumentation) via a startup hook for .NET, Java, Node, PHP, Go, and Python. You can install it on platforms to send to Azure Monitor or other collectors.

Server/Platform instrumentation

It is possible to setup VM instrumentation by installing the OpenTelemetry collector on the VM and export to Azure Monitor or elsewhere (see Azure VMs later). Work is underway to make this seamless via the Azure Monitor Agent. But for other compute services what exporting you can configure is mostly limited, some of them will be covered below.

Should I make the switch?

For new projects and infrastructure, given the maturity of the OpenTelemetry libraries and the Azure Monitor OpenTelemetry distro. I can think of no reason why you shouldn’t be using OpenTelemetry. It’s important to take care in deciding which distro you want to use though.

For existing environments, a decision to migrate could be based on the benefits and drawbacks mentioned earlier. It would also be best to check for any gaps over AppInsights based on your workloads.

For multi-cloud or hybrid environments, there would likely be fewer arguments against switching as you’d likely want a centralised observability platform and that may not be Azure Monitor.

Using OpenTelemetry in Azure

I’ll list what’s available in the way OpenTelemetry for various common Azure services and Azure SDK client libraries using them. For the client libraries you would combine their settings with an OpenTelemetry exporter in code. There may be more features within a few months even with the changes that are going on.

Azure services (platform) telemetry data

There are no settings on Azure Monitor diagnostics to point your resource telemetry directly external except for one of the Azure Native ISV Services (Datadog, Dynatrace and more) that you have account with. So for any of the Azure resources below that don’t mention it, sending platform telemetry is not available. All resource types I’ve investigated below that do support OpenTelemetry (currently) don’t support the configuration of Azure platform telemetry export.

You have the option to configure Azure Monitor diagnostics to go to storage or event hubs and forward them from there (there are plenty of guides from third parties doing this with event hubs). Another option to get the data as a pull action like the Grafana Azure Connector (there are connectors for many other 3rd party tools).

There’s also the (unsupported by Microsoft) Azure Monitor receiver for the OpenTelemetry collector which you’d need to host (likely in AKS) to scrape all the platform telemetry . With all these options there are additional costs, such as egress, storage costs, event hub TUs, Azure monitor api calls and so on).

Azure App Service

Linux App Service Sidecars went to GA in November 2024 enabling setting up an OpenTelemetry collector in a sidecar is a piece of cake as per this MSLearn guide. Even with the extra overhead of setting up this compared to app code exporting directly to the main collector, it opens up a lot of flexibility to relocate your app or change your OTEL configuration (eg export to a different collector than Azure monitor) without disrupting your app. For Windows App Serivce, you could choose which distro exporter in code to use depending on where you want to send it (Azure Monitor or elsewhere)

Azure Container Apps

It’s possible to enable OpenTelemetry (in preview) using what’s termed in as an ‘OpenTelemetry agent’. This is very opaque as to know what the agent is (can’t find source code) but I believe it would be from the same codebase as Azure Monitor Agent (can’t find source code) for VMs and AKS that is in progress (see previously mentioned roadmap). The ACA OpenTelemetry agent configuration looks very similar to OpenTelemetry’s collector config, so they may be building on top of it.

You can enable OpenTelemetry in the Azure Portal OTEL blade of ACA to send telemetry to Azure Monitor (logs, traces), DataDog (metrics, traces) or any OTLP collector (logs, metrics, traces) following the MSLearn – ACA OpenTelemetry configuration guide. Unfortunately it won’t include the Azure resource metrics like system logs or Container Apps standard metrics.

Azure CosmosDB

CosmosDBClient allows OpenTelemetry distributed tracing (emitting System.Diagnostics.Activity data) (see this MSLearn guide) and you can use any OTLP exporter (eg Azure Monitor or vanilla OpenTelemetry) to export the traces.

Azure Storage

Under Azure SDK conventions, any client with an Options class and inherits from ClientOptions has the ability to turn on distributed tracing in your code. eg BlobClientOptions.Diagnostics.IsDistributedTracingEnabled. You would then configure your exporter as normal to export traces.

Azure ServiceBus

The ServiceBusClient ClientOptions allows distributed tracing in experimental mode in your app code.

Azure EventHub

The same as ServiceBusClient, you can send distributed traces when experimental mode is on and logs from the EventHubClient to your OpenTelemetry exporter.

Azure SQL

There is no mention I found for OpenTelemetry exporting features across all their service offerings or the Microsoft.Data.SqlClient lib. There is a ticket open for implementing tracing in the SqClient, you can use this instrumentation lib but with extreme caution! If you were running SQL server in a VM there is the (unsupported) SQL Server OpenTelemetry receiver to use with an OpenTelemetry collector if you have access to install a collector.

Azure Kubenetes Service

There are no options to export to anywhere other than Azure monitor (except metrics to Azure managed Prometheus). It is indicated on the roadmap (see links at end) that turnkey OpenTelemetry autoinstrumentation is in preview, it will use the Azure Monitor Agent which you can configure (via Data collection rules) to send to Azure storage, Log workspaces or Event Hubs but not to an OpenTelemetry collector. So again if you want to export elsewhere you’d have to manually set up a data workflow as mentioned in the above platform metrics section or follow OpenTelemetry’s guide for monitoring Kubernetes but still without the platform metrics.

Azure Virtual Machines

OpenTelemetry is not currently available as a feature, nor in any private/public preview, but in future OpenTelemetry autoinstrumentation will be possible via an update to the existing Azure Monitor Agent (AMA) used by VMs for telemetry. Some older VM implementations may not be using AMA currently so upgrading to that would be required.

Alternatively you can just roll your own OpenTelemetry collector installed on the VM. There are receivers available for Windows Event logs or Linux syslog as well as metrics on both.

.NET Aspire

Aspire is quite feature rich for OpenTelemetry with support in many of it’s integrations libraries from RabbitMQ to SqlClient. There would be limitations to platform telemetry based on where it’s deployed of course. You can view telemetry its own built-in dashboard, your own cloud-hosted or self-hosted OTEL services, or Azure Monitor to view all the telemetry and logs. Check out this article for a good overview of .NET Aspire OpenTelemetry – MSLearn .NET Aspire OpenTelemetry

Azure Functions

OpenTelemetry configuration is available (currently in preview) in all the languages of the Azure SDK except (at the time of writing) Java and doesn’t support C# isolated apps. See the guide here: MSLearn – Azure Functions OpenTelemetry. The good news is you can configure both host and code for opentelemetry and for flexibility is its configurable to set the OTEL_EXPORTER_OTLP_ENDPOINT env var to send the data to another location than Appinsights.

One major limitation to actually having any platform telemetry is metrics are not sent, only logs and traces. Another is functions core tools (for local dev) doesn’t allow OpenTelemetry (I assume for code is fine) so you have to verify telemetry settings in an Azure.

Using your OpenTelemetry data

There are a myriad of platforms to setup dashboards, alerts etc with your telemetry. The first thing people would go to is Azure monitor however Azure also offers Managed Grafana and Managed Prometheus making it very easy to configure and integrate with your Azure monitor data and Log workspaces.

You could also use Cloud-hosted 3^rd party services (at extra costs for that service and to get the data to it) as many allow consuming Azure’s OpenTelemetry data. As mentioned earlier there are several ways to deliver this data to those services.

Sending all your telemetry data outside of Azure

As mentioned in the drawbacks, OpenTelemetry support is not 100% complete for some popular services and a lot of the resources you are using have no option to specify exporting resource telemetry directly to an OpenTelemetry collector other than Azure’s. It is possible (in several ways mentioned before) to get all your data out in either push or pull methods from Azure Monitor, but based on what you choose, it could become very complex and potentially quite expensive, so planning and estimating before making the jump would be wise

Its highly unlikely you would be able to not use Azure Monitor altogether due to platform metrics being tightly coupled to the Azure ecosystem. If the platform metrics are the only thing being directed to Azure Monitor and with the default settings, the cost should be free or close to.

Summary

The benefits of using OpenTelemetry in Azure are quite strong and expanding, it makes sense to start using OpenTelemetry now on new apps and infrastructure at least. While it is easy and convenient to use Microsoft’s OpenTelemetry Distro, personally I’d avoid it unless it has some feature a customer wanted to use, at the cost of no support from Microsoft and some extra setup. The key reason is portability, using a vendor implementation of a vendor neutral framework seems like a bit of a code smell. Perhaps to aid portability for applications, being able to have a configuration switch for distro exporters (eg: Azure, AWS, Vanilla) would be great.

If you have any feedback or thoughts please leave a comment. I hope this post has given you a good insight into what’s possible with OpenTelemetry in Azure!