How to Add OpenTelemetry Observability to AI Agents in Koog
Purpose
I deployed my first AI agent to production last month. It was handling customer support requests, and everything seemed fine—until my OpenAI bill arrived. $800 in one week. I had no idea which agent runs were expensive, which tools were being called, or where the bottlenecks were. The agent was a black box.
That’s when I realized: running AI agents without observability is like running a web service without logs. You’re flying blind.
The Problem with Black Box Agents
When an AI agent runs in production, you need answers to basic questions:
- Which tools did the agent call?
- How many tokens did each LLM request consume?
- Where are the performance bottlenecks?
- Why did costs spike on Tuesday?
Without instrumentation, you’re stuck guessing. I tried adding manual logging to my agent code, but that quickly became messy. Every tool call needed tracking. Every LLM request needed timing. Every response needed token counting. The logging code was longer than the agent logic itself.
Finding OpenTelemetry in Koog
I was already using Koog for building agents, so I checked if it had any observability features. Turns out, Koog has built-in OpenTelemetry integration. Once configured, it automatically captures:
- Nested traces for nodes, tool calls, and LLM requests
- Token counts per request
- Cost calculations
- Timing information
This is exactly what I needed—automatic instrumentation without polluting my agent code.
Setting Up OpenTelemetry
Before touching the agent code, I needed an OpenTelemetry collector running. I used the standard OpenTelemetry collector with Docker:
version: '3.8'services: otel-collector: image: otel/opentelemetry-collector:latest command: ["--config=/etc/otel-collector-config.yaml"] volumes: - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml ports: - "4317:4317" # OTLP gRPC receiver - "4318:4318" # OTLP HTTP receiverreceivers: otlp: protocols: grpc: endpoint: 0.0.0.0:4317 http: endpoint: 0.0.0.0:4318
exporters: logging: verbosity: detailed
service: pipelines: traces: receivers: [otlp] exporters: [logging]This basic setup receives traces and logs them to the console. For production, you’d export to a real backend like Langfuse, W&B Weave, Jaeger, or your existing observability stack.
Configuring the Koog Agent
With the collector running, I added OpenTelemetry to my agent. The key is installing the OpenTelemetry.Feature:
var observableAgent = AIAgent.builder() .promptExecutor(promptExecutor) .llmModel(OpenAIModels.Chat.GPT5_2) .systemPrompt("You're a banking assistant") .toolRegistry(toolRegistry) .install(OpenTelemetry.Feature, config -> { config.addSpanExporter(OtlpGrpcSpanExporter.builder() .setEndpoint("http://localhost:4317") .build()); }) .build();
// Run the agent - traces are generated automaticallyobservableAgent.run("Process my transaction");That’s it. Now every agent run generates detailed traces automatically.
What the Traces Look Like
After running the agent, I checked the collector logs. Here’s what I saw:
Agent Run [transaction-processing] └── Node: Parse Request (0.2s) └── Node: Validate Account (0.1s) └── Tool Call: get_account_balance (0.3s) └── LLM Request: GPT-5.2 (1.2s) ├── Input tokens: 245 ├── Output tokens: 89 └── Estimated cost: $0.002 └── Node: Execute Transfer (0.4s) └── Tool Call: transfer_funds (0.6s) └── LLM Request: GPT-5.2 (0.8s) ├── Input tokens: 312 ├── Output tokens: 156 └── Estimated cost: $0.003Each run shows me exactly what happened. I can see the slowest steps, the token counts, and the costs. No more guessing.
Connecting to a Visualization Backend
Console logs work for development, but for production I needed a proper dashboard. Koog supports Langfuse and W&B Weave out of the box.
For Langfuse, I updated the collector config to export there:
exporters: otlphttp: endpoint: https://cloud.langfuse.com/api/public/otel headers: Authorization: "Bearer ${LANGFUSE_PUBLIC_KEY}:${LANGFUSE_SECRET_KEY}"
service: pipelines: traces: receivers: [otlp] exporters: [otlphttp]Now I have a full dashboard showing agent runs over time, cost trends, and the ability to drill into any specific run.
What I Learned
The first thing I noticed was that one particular tool—a document parser—was being called way more often than expected. The agent was retrying failed parses repeatedly. I added better error handling to the tool, and my costs dropped by 40%.
Without traces, I never would have found that issue.
Another surprise: the LLM calls were faster than I thought. The bottlenecks were actually in my tool implementations, not the model. I spent time optimizing the wrong things before I had visibility.
Common Mistakes to Avoid
I made a few mistakes along the way:
Not starting the collector first. I added the OpenTelemetry feature but forgot to start the collector. The agent ran fine, but traces went nowhere. Always verify the collector is running before testing.
Ignoring cost spikes. I saw the token counts but didn’t act on them until the bill arrived. Set up alerts for unusual cost patterns early.
Not reviewing traces regularly. Traces are only useful if you look at them. I set up a weekly review to check the slowest runs and most expensive operations.
When This Approach Works Best
OpenTelemetry with Koog is ideal when:
- You have multiple agents running in production
- Costs are unpredictable or rising
- You need to debug why an agent made a particular decision
- Compliance requires an audit trail of agent actions
For a simple prototype running locally, it might be overkill. But once you’re in production, you’ll want this visibility.
Final Words + More Resources
My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!
Comments