How to Add OpenTelemetry Observability to AI Agents in Koog

Mar 25, 2026

Purpose

I deployed my first AI agent to production last month. It was handling customer support requests, and everything seemed fine—until my OpenAI bill arrived. $800 in one week. I had no idea which agent runs were expensive, which tools were being called, or where the bottlenecks were. The agent was a black box.

That’s when I realized: running AI agents without observability is like running a web service without logs. You’re flying blind.

The Problem with Black Box Agents

When an AI agent runs in production, you need answers to basic questions:

Which tools did the agent call?
How many tokens did each LLM request consume?
Where are the performance bottlenecks?
Why did costs spike on Tuesday?

Without instrumentation, you’re stuck guessing. I tried adding manual logging to my agent code, but that quickly became messy. Every tool call needed tracking. Every LLM request needed timing. Every response needed token counting. The logging code was longer than the agent logic itself.

Finding OpenTelemetry in Koog

I was already using Koog for building agents, so I checked if it had any observability features. Turns out, Koog has built-in OpenTelemetry integration. Once configured, it automatically captures:

Nested traces for nodes, tool calls, and LLM requests
Token counts per request
Cost calculations
Timing information

This is exactly what I needed—automatic instrumentation without polluting my agent code.

Setting Up OpenTelemetry

Before touching the agent code, I needed an OpenTelemetry collector running. I used the standard OpenTelemetry collector with Docker:

version: '3.8'
services:
  otel-collector:
    image: otel/opentelemetry-collector:latest
    command: ["--config=/etc/otel-collector-config.yaml"]
    volumes:
      - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
    ports:
      - "4317:4317"   # OTLP gRPC receiver
      - "4318:4318"   # OTLP HTTP receiver

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

exporters:
  logging:
    verbosity: detailed

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [logging]

This basic setup receives traces and logs them to the console. For production, you’d export to a real backend like Langfuse, W&B Weave, Jaeger, or your existing observability stack.

Configuring the Koog Agent

With the collector running, I added OpenTelemetry to my agent. The key is installing the OpenTelemetry.Feature:

var observableAgent = AIAgent.builder()
    .promptExecutor(promptExecutor)
    .llmModel(OpenAIModels.Chat.GPT5_2)
    .systemPrompt("You're a banking assistant")
    .toolRegistry(toolRegistry)
    .install(OpenTelemetry.Feature, config -> {
        config.addSpanExporter(OtlpGrpcSpanExporter.builder()
            .setEndpoint("http://localhost:4317")
            .build());
    })
    .build();

// Run the agent - traces are generated automatically
observableAgent.run("Process my transaction");

That’s it. Now every agent run generates detailed traces automatically.

What the Traces Look Like

After running the agent, I checked the collector logs. Here’s what I saw:

Agent Run [transaction-processing]
  └── Node: Parse Request (0.2s)
  └── Node: Validate Account (0.1s)
  └── Tool Call: get_account_balance (0.3s)
  └── LLM Request: GPT-5.2 (1.2s)
      ├── Input tokens: 245
      ├── Output tokens: 89
      └── Estimated cost: $0.002
  └── Node: Execute Transfer (0.4s)
  └── Tool Call: transfer_funds (0.6s)
  └── LLM Request: GPT-5.2 (0.8s)
      ├── Input tokens: 312
      ├── Output tokens: 156
      └── Estimated cost: $0.003

Each run shows me exactly what happened. I can see the slowest steps, the token counts, and the costs. No more guessing.

Connecting to a Visualization Backend

Console logs work for development, but for production I needed a proper dashboard. Koog supports Langfuse and W&B Weave out of the box.

For Langfuse, I updated the collector config to export there:

exporters:
  otlphttp:
    endpoint: https://cloud.langfuse.com/api/public/otel
    headers:
      Authorization: "Bearer ${LANGFUSE_PUBLIC_KEY}:${LANGFUSE_SECRET_KEY}"

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [otlphttp]

Now I have a full dashboard showing agent runs over time, cost trends, and the ability to drill into any specific run.

What I Learned

The first thing I noticed was that one particular tool—a document parser—was being called way more often than expected. The agent was retrying failed parses repeatedly. I added better error handling to the tool, and my costs dropped by 40%.

Without traces, I never would have found that issue.

Another surprise: the LLM calls were faster than I thought. The bottlenecks were actually in my tool implementations, not the model. I spent time optimizing the wrong things before I had visibility.

Common Mistakes to Avoid

I made a few mistakes along the way:

Not starting the collector first. I added the OpenTelemetry feature but forgot to start the collector. The agent ran fine, but traces went nowhere. Always verify the collector is running before testing.

Ignoring cost spikes. I saw the token counts but didn’t act on them until the bill arrived. Set up alerts for unusual cost patterns early.

Not reviewing traces regularly. Traces are only useful if you look at them. I set up a weekly review to check the slowest runs and most expensive operations.

When This Approach Works Best

OpenTelemetry with Koog is ideal when:

You have multiple agents running in production
Costs are unpredictable or rising
You need to debug why an agent made a particular decision
Compliance requires an audit trail of agent actions

For a simple prototype running locally, it might be overkill. But once you’re in production, you’ll want this visibility.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!