What is OpenTelemetry? Traces, Metrics and Logs Explained

May 04, 2025

What is OpenTelemetry? Traces, Metrics and Logs Explained - Odown - uptime monitoring and status page

OpenTelemetry is a powerful observability framework designed to help developers instrument, generate, collect, and export telemetry data from their applications and infrastructure. This open-source project has rapidly become the industry standard for implementing observability in modern distributed systems and cloud-native applications.

Understanding OpenTelemetry

The Core Components of OpenTelemetry

OpenTelemetry Data Types

How OpenTelemetry Works

Implementing OpenTelemetry

OpenTelemetry vs. Other Observability Solutions

Use Cases and Benefits

Common Challenges and Solutions

The Future of OpenTelemetry

How Odown Complements OpenTelemetry

Understanding OpenTelemetry

OpenTelemetry emerged from the merger of two prominent open-source projects: OpenCensus (from Google) and OpenTracing (a Cloud Native Computing Foundation project). This consolidation happened in 2019, creating a unified, vendor-neutral approach to observability instrumentation.

At its core, OpenTelemetry solves a critical problem: it standardizes how we collect and transmit telemetry data in distributed systems. Before OpenTelemetry, developers faced vendor lock-in when choosing observability tools. Each monitoring solution required its own specific instrumentation approach, making it difficult to switch providers or use multiple tools simultaneously.

OpenTelemetry breaks this dependency by providing a single set of APIs, libraries, agents, and instrumentation that capture distributed traces, metrics, and logs from your applications. The data can then be exported to various backends of your choice for analysis.

The project is now an incubating project within the Cloud Native Computing Foundation (CNCF), with broad industry support. It's not just another monitoring tool—it's a standard for instrumenting code that's backed by major cloud providers and observability vendors.

The Core Components of OpenTelemetry

OpenTelemetry's architecture consists of several key components that work together to provide a complete observability solution:

APIs and SDKs

The OpenTelemetry APIs define how to instrument code, while the SDKs implement these APIs for different programming languages. Currently, OpenTelemetry supports multiple languages including Java, Python, Go, JavaScript, .NET, Ruby, PHP, Erlang, and C++.

These APIs provide a standardized way to:

Create and manage spans for distributed tracing

Record metrics

Capture logs

Add context to telemetry data

Collectors

The OpenTelemetry Collector is a vendor-agnostic implementation for receiving, processing, and exporting telemetry data. It serves as a single agent that can:

Receive data in multiple formats (OTLP, Jaeger, Zipkin, Prometheus, etc.)

Process and transform data with capabilities like filtering, batching, and attribution

Export data to various backend systems

The collector comes in two deployment models:

Agent: Runs alongside your application (as a sidecar or daemon)
Gateway: Runs as a standalone service that receives data from multiple agents

Auto-instrumentation

One of OpenTelemetry's most powerful features is its ability to automatically instrument popular libraries and frameworks. This means you can often get valuable telemetry data with minimal code changes.

Auto-instrumentation typically works by:

Intercepting method calls in common libraries

Wrapping framework components

Monitoring runtime metrics

While the level of support varies by language, most mainstream web frameworks, databases, and messaging systems have auto-instrumentation available.

Semantic Conventions

OpenTelemetry defines semantic conventions—standardized names and attributes for common concepts across different systems. These conventions ensure telemetry data is consistent and interoperable regardless of source.

For example, HTTP request spans always use the same attribute names (http.method, http.url, etc.), making it easier to correlate data across services and technologies.

OpenTelemetry Data Types

OpenTelemetry captures three main types of telemetry data:

Traces

Traces track the journey of requests across services in a distributed system. A trace consists of spans—discrete operations within the request flow.

Each span includes:

Name and unique identifier

Start and end timestamps

Parent span reference (for nested operations)

Key-value attributes with metadata

Events with timestamps

Links to related spans

Traces help developers understand request flows, dependencies, and bottlenecks in complex systems.

Metrics

Metrics are numeric measurements collected over time. OpenTelemetry supports several metric types:

Counters: Cumulative values that only increase (e.g., request count)

Gauges: Values that can increase or decrease (e.g., memory usage)

Histograms: Distributions of measured values (e.g., request duration percentiles)

Metrics provide aggregate views of system behavior and performance, useful for monitoring, alerting, and capacity planning.

Logs

Logs are timestamped text records of discrete events. While OpenTelemetry initially focused on traces and metrics, the project now includes a logs specification that allows:

Correlation of logs with traces and metrics

Structured logging with consistent metadata

Common export pipeline for all telemetry types

The integration of logs with traces and metrics creates a more complete observability solution, letting developers switch between different telemetry types when investigating issues.

How OpenTelemetry Works

The typical data flow in an OpenTelemetry-instrumented system follows these steps:

Instrumentation: Your code creates telemetry data using OpenTelemetry APIs, either manually or through auto-instrumentation.
Processing: The SDK processes this data by applying samplers, batch processors, and other configurations.
Export: Data is sent to the OpenTelemetry Collector or directly to a backend system.
Collection: If using the Collector, it receives data from multiple sources, processes it, and forwards it to your chosen backends.
Storage and Analysis: Backend systems store and visualize the data for monitoring and troubleshooting.

This architecture provides flexibility in how you collect and route telemetry data. You can start simple with direct export from your applications, then add collectors for advanced processing as your needs grow.

Context Propagation

A key aspect of OpenTelemetry is how it propagates context across service boundaries. This context typically includes:

Trace identifiers

Span identifiers

Baggage (arbitrary key-value pairs)

When a service makes a request to another service, it includes this context in request headers. The receiving service extracts this context and uses it to correlate its telemetry with the upstream service.

OpenTelemetry supports multiple context propagation formats, including W3C TraceContext and Baggage, with the ability to add custom propagators.

Implementing OpenTelemetry

Integrating OpenTelemetry into your applications involves several steps:

Step 1: Choose Your Approach

You have three main options for instrumentation:

Auto-instrumentation: Minimal code changes but less control
Manual instrumentation: More work but more precise control
Hybrid approach: Use auto-instrumentation as a base and add manual instrumentation for critical paths

The right choice depends on your requirements, resources, and existing codebase.

Step 2: Set Up the SDK

Install the OpenTelemetry SDK for your language and configure its components:

Providers for traces, metrics, and logs

Processors for batching and filtering

Exporters for your backend systems

Samplers to control data volume

Here's a simplified example in Java:

SdkTracerProvider tracerProvider = SdkTracerProvider. builder()
.addSpanProcessor (BatchSpanProcessor.builder (OtlpGrpcSpanExporter. builder()
.setEndpoint ("http://collector:4317")
.build()).build())
.setSampler (Sampler.alwaysOn())
.build();

OpenTelemetrySdk openTelemetry = OpenTelemetrySdk.builder()
.setTracerProvider (tracerProvider)
.build();

Tracer tracer = openTelemetry. getTracer("app-name");

Step 3: Deploy the Collector

While optional, the OpenTelemetry Collector offers several advantages:

Unified data pipeline

Protocol translation

Advanced processing

Buffering and retries

A basic collector configuration in YAML might look like:

receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 1s
send_batch_size: 512
exporters:
otlp:
endpoint: backend.example.com:4317
tls:
insecure: false
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [otlp]
metrics:
receivers: [otlp]
processors: [batch]
exporters: [otlp]

Step 4: Instrument Your Code

Apply auto-instrumentation or add manual spans to key parts of your code. For manual instrumentation, the pattern is typically:

Start a span
Set attributes
Perform the operation
End the span

Example in Python:

from opentelemetry import trace
tracer = trace.get_tracer ("app-name")
def process_request (request):
with tracer.start_as_current_span ("process_request") as span:
span.set_attribute ("request.id", request.id)
span.set_attribute ("request.type", request.type)
# Process the request
result = do_something_with(request)
span.set_attribute ("result.status", result.status)
return result

Step 5: Configure Sampling

As your system scales, you'll need to implement sampling to control telemetry volume. OpenTelemetry offers several sampling strategies:

Always-on: Captures all data (good for development)

Always-off: Captures no data (for emergency situations)

Trace ID ratio: Samples a percentage of traces

Parent-based: Uses the sampling decision from the parent span

Rate limiting: Caps the number of samples per time period

Custom samplers can implement more complex logic based on trace attributes or system conditions.

OpenTelemetry vs. Other Observability Solutions

How does OpenTelemetry compare to other monitoring approaches?

OpenTelemetry vs. Vendor-Specific SDKs

Most observability vendors (Datadog, New Relic, Dynatrace, etc.) offer their own SDKs. Compared to these:

Vendor-specific SDKs:
- Pros: Optimized for specific backends, may have vendor-exclusive features
- Cons: Lock-in, inconsistent approaches across vendors, multiple agents required for multi-vendor setups
OpenTelemetry:
- Pros: Vendor-neutral, standardized approach, single instrumentation for multiple backends
- Cons: May lack some vendor-specific optimizations, still maturing in some areas

The trend is clear—many vendors now accept OpenTelemetry data directly and contribute to the project, recognizing its value as a standard.

OpenTelemetry vs. Other Open Standards

OpenTelemetry isn't the first open standard for telemetry:

Prometheus focuses primarily on metrics with a pull-based model

Zipkin and Jaeger pioneered distributed tracing standards

Fluentd and Fluent Bit standardized log collection

OpenTelemetry incorporates lessons from all these projects while providing unified APIs that span all telemetry types. It doesn't replace these tools but offers consistent instrumentation that can work with them.

Use Cases and Benefits

Organizations adopt OpenTelemetry for several critical use cases:

Microservice Debugging

In distributed systems, a single user request might touch dozens of services. Traditional debugging breaks down in this environment.

OpenTelemetry traces show the complete request journey, making it possible to:

Identify which service caused a failure

Pinpoint performance bottlenecks

Understand service dependencies

Correlate errors across services

Performance Optimization

The metrics and traces from OpenTelemetry help teams optimize performance by:

Establishing performance baselines

Identifying slow components and requests

Measuring the impact of optimizations

Detecting performance regressions

A common pattern is using traces to find problematic request patterns, then setting up metrics to monitor these patterns continuously.

Multi-Cloud Observability

For organizations running workloads across multiple cloud providers, OpenTelemetry offers consistent instrumentation regardless of environment. This allows:

Unified monitoring across clouds

Consistent context propagation

Common visualization and alerting

Vendor Flexibility

Perhaps the biggest benefit is avoiding vendor lock-in. With OpenTelemetry:

Switching observability vendors becomes a configuration change, not a re-instrumentation project

Using multiple specialized tools for different telemetry types becomes feasible

Testing new monitoring solutions alongside existing ones is straightforward

Cost Control

OpenTelemetry's sampling and filtering capabilities help control observability costs by:

Reducing data volume through intelligent sampling

Filtering out low-value telemetry

Consolidating agents and collectors

Common Challenges and Solutions

Implementing OpenTelemetry isn't without challenges. Here are some common issues and how to address them:

Data Volume Management

Challenge: Telemetry data can grow explosively, increasing costs and overwhelming backends.

Solutions:

Implement head-based sampling for high-volume services

Use tail-based sampling in collectors for more intelligent filtering

Configure attribute filtering to reduce cardinality

Start with critical services rather than instrumenting everything at once

Deployment Complexity

Challenge: Adding instrumentation to every service and managing collectors can be complex.

Solutions:

Use auto-instrumentation where possible

Deploy collectors as sidecars or per-node daemons

Leverage service meshes for automatic context propagation

Implement progressive instrumentation, starting with critical paths

Context Propagation Gaps

Challenge: Missing context between services breaks trace continuity.

Solutions:

Standardize on W3C TraceContext headers across all services

Use middleware for automatic header propagation

Implement bridge propagators for legacy systems

Test trace continuity across service boundaries

Maturity Concerns

Challenge: Some parts of OpenTelemetry are still evolving, particularly logs integration.

Solutions:

Follow the stability guidance in the OpenTelemetry documentation

Start with the most stable components (traces, then metrics)

Join the community to stay informed about changes

Consider vendor distributions that provide additional stability guarantees

The Future of OpenTelemetry

OpenTelemetry continues to evolve rapidly. Key trends to watch:

Continuous Improvement in Instrumentation

The project consistently expands auto-instrumentation coverage for frameworks and libraries. This means developers can expect:

More out-of-the-box visibility

Less manual instrumentation work

Better coverage of edge cases

Advanced Sampling Techniques

The community is developing more sophisticated sampling approaches:

Adaptive sampling based on system conditions

ML-driven sampling that learns what's important

Context-aware sampling that considers business value

Enhanced Profiling Integration

Profiling—collecting detailed execution data—is becoming a fourth pillar of observability alongside traces, metrics, and logs. OpenTelemetry is exploring integrations with profiling tools to provide deeper performance insights.

Unified Logs Pipeline

As the logs specification matures, expect tighter integration between logs and other telemetry types, making it easier to correlate all observability data.

eBPF Integration

Extended Berkeley Packet Filter (eBPF) technology enables kernel-level tracing with minimal overhead. OpenTelemetry projects are beginning to leverage eBPF for:

Zero-code instrumentation

Kernel-level visibility

Lower overhead collection

How Odown Complements OpenTelemetry

While OpenTelemetry provides deep internal observability for your applications, Odown offers complementary external monitoring capabilities that complete your observability strategy.

External Verification

OpenTelemetry gives you visibility into your application's internal behavior, but it doesn't tell you how users experience your services from the outside. Odown provides this external perspective through:

Regular uptime checks from multiple global locations

Real-time alerts when services become unavailable

Historical uptime analytics to track reliability over time

This external verification validates that your systems are not just running but actually accessible to users—something internal monitoring alone can't confirm.

SSL Certificate Monitoring

OpenTelemetry can track many aspects of your application health, but SSL certificate management remains a critical blind spot for many organizations. Odown fills this gap with dedicated SSL monitoring that:

Tracks certificate expiration dates

Monitors for certificate validity issues

Alerts you before certificates expire

Verifies proper certificate configuration

Certificate failures can render your services inaccessible even when they're running perfectly—making this monitoring complementary to your OpenTelemetry implementation.

Public Status Pages

When incidents do occur, communicating with users becomes just as important as resolving the technical issues. Odown's public status pages integrate with your monitoring data to:

Automatically publish service status information

Provide transparent incident communication

Display historical uptime metrics

Build trust through proactive communication

This creates a complete observability loop where internal telemetry drives technical responses while external monitoring informs user communication.

Unified Monitoring Strategy

The most effective monitoring strategies combine:

Deep internal observability (OpenTelemetry)
External verification (Odown uptime monitoring)
Security monitoring (Odown SSL certificate checks)
User communication (Odown status pages)

This comprehensive approach ensures you have both the technical data needed to maintain reliable services and the communication tools to build user trust.

By implementing OpenTelemetry alongside Odown's monitoring capabilities, you create a complete observability solution that addresses both the technical and communication aspects of modern service reliability.

What is OpenTelemetry? Traces, Metrics and Logs Explained

Table of Contents

Understanding OpenTelemetry

The Core Components of OpenTelemetry

APIs and SDKs

Collectors

Auto-instrumentation

Semantic Conventions

OpenTelemetry Data Types

Traces

Metrics

Logs

How OpenTelemetry Works

Context Propagation

Implementing OpenTelemetry

Step 1: Choose Your Approach

Step 2: Set Up the SDK

Step 3: Deploy the Collector

Step 4: Instrument Your Code

Step 5: Configure Sampling

OpenTelemetry vs. Other Observability Solutions

OpenTelemetry vs. Vendor-Specific SDKs

OpenTelemetry vs. Other Open Standards

Use Cases and Benefits

Microservice Debugging

Performance Optimization

Multi-Cloud Observability

Vendor Flexibility

Cost Control

Common Challenges and Solutions

Data Volume Management

Deployment Complexity

Context Propagation Gaps

Maturity Concerns

The Future of OpenTelemetry

Continuous Improvement in Instrumentation

Advanced Sampling Techniques

Enhanced Profiling Integration

Unified Logs Pipeline

eBPF Integration

How Odown Complements OpenTelemetry

External Verification

SSL Certificate Monitoring

Public Status Pages

Unified Monitoring Strategy

What is telemetry: Understanding data collection for system optimization

What is Ping?

Ready to Simplify YourUptime Monitoring?

Ready to Simplify Your
Uptime Monitoring?