Observe, evaluate,
and govern AI agents
Capture every thought, action, and LLM call. Enforce quality
policies, detect safety violations, and audit your entire agent
fleet —
all in real-time, no cloud required.
Capture everything
your agent does
From first thought to final output, AgentTrace records every span with full fidelity. Never wonder what your agent actually did again.
-
✓ Thoughts and reasoning chains,
including
thinking_text - ✓ LLM calls — model, token count, cost, latency
- ✓ Tool invocations with full input / output payloads
- ✓ Agent handoffs and sub-agent delegation
- ✓ Error recovery, retries, and quality scores
AI advisory.
No LLM required.
Six specialist advisors analyze your traces in milliseconds using pure statistical heuristics — no API calls, no latency, no external dependencies.
Every finding links directly to the offending trace and span. P99 advisory latency <10 ms.
Zero setup.
Works with Copilot.
Add one block to your VS Code MCP config and every Copilot Agent session is automatically traced — no code changes in your agent needed.
{
"servers": {
"agenttrace": {
"type": "stdio",
"command": "uv",
"args": ["run", "python",
"enterprise/mcp/server.py"]
}
}
}
- ✓ 8 MCP tools — thoughts, actions, handoffs, errors
- ✓ Works with VS Code Copilot and Claude CLI
- ✓ Python SDK for custom instrumentation
-
✓
@trace_spandecorator for zero-boilerplate wrapping
Metrics and trends
across your fleet
Track cost, quality, tokens, and latency over time. Spot regressions before they become incidents. Understand which agents deliver value and which don't.
- ✓ Time-series bar charts for cost, spans, tokens, latency
- ✓ Cross-trace session analysis and timeline
- ✓ Per-model token and cost breakdown
- ✓ Quality trends — only meaningful spans counted
From monitoring to control
The capabilities that turn agent observability into a compliance-ready governance framework
Role-Based Access Control
Plan-gated capability matrix (Community / Pro / Enterprise) enforced on every API call. Org-scoped data isolation prevents cross-tenant leakage.
Complete Audit Logging
Every API invocation — read, write, delete, login, org switch — is persisted with actor identity, timestamp, resource, and outcome. Admin-only viewer with cursor-paginated export.
Automated Quality & Safety Scoring
Every span is evaluated by an SLM-as-a-Judge against a rubric of conciseness, accuracy, and safety. Violations — hallucinations, PII leaks, policy breaches — are flagged immediately with scores and evidence.
Configurable Advisory Policies
Per-org and per-project threshold tuning for every advisor: latency multipliers, cost concentration limits, quality score floors, error rate ceilings. Changes take effect instantly without redeployment.
Human Feedback Integration
Operators can attach thumbs-up/down ratings and free-text comments to any trace. Feedback is persisted alongside automated quality scores, creating a labelled dataset for model improvement.
Security Watchdog Advisor
One of six automated advisors exclusively focused on safety posture: detects prohibited-content patterns, data-leak indicators, and safety-policy violations across the entire trace corpus.
Policy Enforcement & Circuit Breakers
Block or halt agents in real-time when a policy threshold is breached — not just alert after the fact. Configurable enforcement modes: warn, throttle, or reject.
Human-in-the-Loop Approval Gates
Define checkpoints where an agent must pause and await human approval before proceeding. Approval decisions are audit-logged with identity and rationale.
Real-Time Alert Integrations
Push safety violations and policy breaches to Slack, PagerDuty, or any webhook endpoint the moment they fire — not on the next dashboard refresh.
SSO & Enterprise IdP
SAML 2.0 and OIDC integration with your corporate identity provider. Groups-to-roles mapping, JIT provisioning, and automated de-provisioning.
Up and running in minutes
Three steps from installation to production insights
Install
Clone the repo and start the FastAPI backend with one command. No cloud account or API keys required.
git clone … AgentTrace
uv run uvicorn backend.main:app
npm run dev
Instrument
Add the MCP server to your editor config for zero-code tracing, or use the Python SDK for custom control.
agenttrace run my_agent.py
# or configure .vscode/mcp.json
Analyze
Open the dashboard. View DAGs, timelines, advisory findings, and metrics — live, with no page refresh needed.
open http://localhost:5173
# Dashboard + Advisory Board
Simple, transparent tiers
Self-host for free or unlock advanced analytics for your team
- Unlimited traces & spans
- Execution log
- Metrics summary
- Python SDK + MCP server
- OpenTelemetry export
- Trace & session visualization
- Advisory board
- InsightBot chat
- Audit logs
- Everything in Community
- DAG + Timeline visualization
- Advisory board (read)
- InsightBot chat
- Advanced metrics
- Session cross-trace analysis
- Advisory policy tuning
- Audit logs
- Everything in Pro
- Advisory policy tuning
- Complete audit logging
- Multi-tenant deployment
- SLM judge integration
- Causal discovery graph (Neo4j)
- Priority support + custom SLA
Start in under 2 minutes
{
"servers": {
"agenttrace": {
"type": "stdio",
"command": "uv",
"args": ["run", "python",
"enterprise/mcp/server.py"]
}
}
}
Restart VS Code — Copilot tracks every agent session
automatically. Open http://localhost:5173 to view
the live dashboard.
from agenttrace import start_trace, start_span, end_span, end_trace
# Wrap one agent run as a trace
trace = start_trace("research-agent", session_name="Q&A Session")
# Record a span
span = start_span(trace, "reasoning",
agent="planner", span_type="thought")
# ... your agent code ...
end_span(span, status="success", result_summary="Plan created")
end_trace(trace)
Or use @trace_span decorator for zero-boilerplate
wrapping of any function.
# Zero code changes — wrap any script
agenttrace run my_agent.py
# With custom session grouping
AGENTTRACE_SESSION_NAME="nightly" \
agenttrace run pipeline.py "Summarise Q1 results"
# Sample output
→ Trace started: t_3fa91c2b
→ Auto-instrumented: openai, anthropic, litellm
→ 7 spans flushed · $0.014 · quality 4.1/5
Monkey-patches OpenAI, Anthropic, and LiteLLM at process start — no changes to your agent code.
Start observing your agents today
Open source · Self-hosted · Production-ready