Observability
Agents are non-deterministic. Without strong observability you cannot debug failures, understand decisions, monitor cost, or improve behavior.
Why observability is non-optional for agents
Traditional applications are deterministic - same input, same output. Agents are not. They make decisions, choose tools, take multi-step actions. Without tracing you cannot answer simple questions like "why did the agent call this tool with these arguments" or "where did the $400 in token spend come from".
The four metrics that matter most:
- Tool success/failure rates per package
- Latency per tool call (p50, p95, p99)
- Token and dollar cost per session
- Agent decision paths (which tools were chosen, in what order, why)
Recommended stack
MCP Gateway emits OpenTelemetry spans for every tool call. Export to your observability platform of choice. APAI ships pre-configured integrations with the major options:
| Platform | Best for | Open source? |
|---|---|---|
| Langfuse | Production LLM apps, cost tracking, user feedback collection | Yes |
| LangSmith | Teams already using LangChain / LangGraph | No (managed by LangChain) |
| Arize Phoenix | Evaluation-heavy workflows, both research and production | Yes |
| Helicone | Lightweight LLM API observability, fast integration | Yes |
| W&B, PromptLayer, HoneyHive | Experiment tracking + prompt management | Mixed |
Configuration
Configure the Gateway to export traces to your platform of choice:
# Langfuse APAI_OBSERVABILITY_PROVIDER=langfuse APAI_LANGFUSE_PUBLIC_KEY=pk_... APAI_LANGFUSE_SECRET_KEY=sk_... APAI_LANGFUSE_HOST=https://cloud.langfuse.com # LangSmith APAI_OBSERVABILITY_PROVIDER=langsmith APAI_LANGSMITH_API_KEY=ls_... APAI_LANGSMITH_PROJECT=my-agent # Arize Phoenix APAI_OBSERVABILITY_PROVIDER=phoenix APAI_PHOENIX_ENDPOINT=https://app.phoenix.arize.com APAI_PHOENIX_API_KEY=phx_... # Generic OpenTelemetry (any OTel-compatible backend) APAI_OBSERVABILITY_PROVIDER=otel APAI_OTEL_ENDPOINT=https://otel-collector.example.com:4317 APAI_OTEL_AUTH=Bearer your-token
What to trace
The MCP Gateway emits these spans automatically. Configure your platform to alert on outliers in each:
apai.tool.call- one span per tool invocation. Attributes: tool name, arguments hash, agent identity, workspace, package, package version, decision result (allow / block / require_approval), latency.apai.install- one span per install event. Attributes: package, version, source, target, install mode, permissions requested, permissions granted, scanner findings, status.apai.policy.decision- one span per policy evaluation. Attributes: policy slug, rule id, action, on_match outcome.apai.passport.read- one span per time a Capability Passport is consumed by an agent.
Privacy and redaction
Tool call arguments may contain sensitive data. Configure redaction at the Gateway before traces leave the workspace:
# .apai/observability.yaml redact: - "Authorization" - "X-API-Key" - "/secrets/.*" - "ssn" - "credit_card" sample_rate: 1.0 # 1.0 = trace every call. Drop to 0.1 for high-volume workspaces.
Audit logs (kept locally) always include full data for compliance. Traces (shipped to your observability platform) are redacted per this config.