Core Concepts

Understanding AgentCost's data model and how cost tracking works.

Traces

A trace is a single LLM API call. Every time your application calls OpenAI, Anthropic, or any other provider through an AgentCost-wrapped client, a trace is recorded with:

Field	Description
`trace_id`	Unique identifier
`project`	Logical grouping (e.g., "customer-support")
`model`	Model name (e.g., "gpt-4o")
`provider`	Provider name (e.g., "openai")
`input_tokens`	Tokens sent to the model
`output_tokens`	Tokens received from the model
`cost`	Computed cost in USD
`latency_ms`	Round-trip time in milliseconds
`status`	`success` or `error`
`agent_id`	Optional: which agent made the call
`session_id`	Optional: session grouping
`timestamp`	When the call was made

Projects

A project is a logical namespace for traces. Use projects to separate different applications, environments, or teams:

# Different projects for different use cases
support_client = trace(OpenAI(), project="customer-support")
pipeline_client = trace(OpenAI(), project="data-pipeline")
research_client = trace(OpenAI(), project="research")

Agents & Sessions

Agents are identifiers for specific AI components within a project:

client = trace(OpenAI(), project="support", agent_id="ticket-classifier")

Sessions group multiple calls into a conversation or workflow:

client = trace(OpenAI(), project="support", session_id="conv-12345")

Cost Calculation

AgentCost calculates costs using a vendored pricing database of 2,610+ models from 40+ providers (sourced from LiteLLM's community-maintained dataset, synced weekly):

cost = (input_tokens × input_price + output_tokens × output_price) / 1,000,000

The vendored data lives in agentcost/cost/model_prices.json and is the single source of truth. Custom pricing overrides can be added via overrides.json or at runtime with register_model(). Cache-aware pricing is supported for Anthropic prompt caching and OpenAI cached tokens.

Cost Tiers

Every model is automatically classified into a cost tier based on input pricing:

Tier	Price Range (per 1M input tokens)	Examples
Economy	< $0.50	gpt-4o-mini, Claude 3 Haiku
Standard	$0.50 – $5.00	gpt-4o, Claude Sonnet
Premium	> $5.00	o1, Claude Opus
Free	$0.00	Ollama/local models

Tiers integrate with the policy engine (restrict agents to specific tiers), budget gates (block premium when budget is low), and the complexity router.

Complexity Router

The complexity router auto-classifies each prompt and routes to the appropriate cost tier:

Level	Routes To	Triggers
SIMPLE	Economy	Short factual questions, yes/no, lookups
MEDIUM	Standard	Summarization, moderate generation
COMPLEX	Standard	Code review, architecture design, analysis
REASONING	Premium	Mathematical proofs, chain-of-thought, logic

Budget Gates

Pre-execution budget checks at each workflow step:

ALLOW — Budget is healthy, proceed
WARN (80%) — Budget warning, proceed but emit alert
DOWNGRADE (90%) — Auto-switch to cheaper model (e.g., gpt-4o → gpt-4o-mini)
BLOCK (100%) — Budget exhausted, deny the call

Cost Intelligence

AgentCost provides five intelligence modules on top of raw trace data:

Forecasting — Predicts future costs using linear regression, exponential moving average (EMA), and ensemble methods. Includes budget exhaustion prediction.

Optimizer — Analyzes your usage patterns and recommends cheaper models that could handle the same workloads. Shows estimated savings.

Analytics — Breakdowns by model, project, agent, and time. Token efficiency metrics and chargeback reports.

Estimator — Pre-call cost estimation. Before making an expensive LLM call, estimate what it will cost across 2,610+ models.

Token Analyzer — Context efficiency scoring (0–100). Detects wasteful patterns: excessive system prompts, under-utilized context windows, and low output ratios.

Plugin Architecture

AgentCost uses an 8-slot plugin system. Every integration point is swappable:

Slot	Plugin Class	Purpose
1. Notifier	`NotifierPlugin`	Alerts (Slack, email, webhook, PagerDuty)
2. Policy	`PolicyPlugin`	Custom policy evaluation rules
3. Exporter	`ExporterPlugin`	Export traces (S3, Snowflake, Datadog)
4. Provider	`ProviderPlugin`	Cost calculation for custom LLM providers
5. Tracker	`TrackerPlugin`	Cost tracking backends (in-memory, DB, Langfuse)
6. Reactor	`ReactorPlugin`	Custom reaction action handlers
7. Runtime	`RuntimePlugin`	Model routing, rate limiting, feature flags
8. Agent	`AgentPlugin`	Agent lifecycle management, workspace config

Built-in plugins ship out of the box: 4 notifiers, InMemoryTracker, AgentLifecycle, PagerDutyReactor.

Data Storage

Community edition: SQLite database (zero configuration, file-based).

Enterprise edition: PostgreSQL with connection pooling for production workloads.

The database stores trace_events and benchmark_runs tables. Enterprise adds orgs, users, cost_centers, policies, approval_requests, and more.

Editions

Feature	Community (MIT)	Enterprise (BSL 1.1)
Tracing SDK	✅	✅
Dashboard + Models Explorer	✅	✅
2,610+ Model Pricing	✅	✅
Cost Tiers & Complexity Router	✅	✅
Budget Gates	✅	✅
Token Analyzer	✅	✅
Forecasting	✅	✅
Optimizer	✅	✅
Analytics	✅	✅
Estimator	✅	✅
8-Slot Plugin System	✅	✅
Reactions Engine (YAML)	✅	✅
CLI	✅	✅
OTel/Prometheus	✅	✅
SSO (any OIDC/SAML provider)	—	✅
Organizations	—	✅
Budget Enforcement	—	✅
Policy Engine	—	✅
Approval Workflows	—	✅
Notifications	—	✅
Anomaly Detection	—	✅
AI Gateway	—	✅