Cost Intelligence
AgentCost's intelligence layer provides cost-aware decision making for AI workloads. It classifies models into tiers, analyzes token efficiency, gates budget overruns, and auto-routes prompts to the right cost tier.
Cost Tiers
Every model in the 2,610+ vendored pricing database is automatically classified into a tier:
from agentcost.intelligence import get_tier_registry
reg = get_tier_registry()
print(reg.classify("gpt-4o")) # CostTier.STANDARD
print(reg.classify("gpt-4o-mini")) # CostTier.ECONOMY
print(reg.classify("o1")) # CostTier.PREMIUM
print(reg.tier_summary()) # {'economy': 949, 'standard': 925, 'premium': 160, 'free': 229}
Tier Thresholds
| Tier | Input Cost (per 1M tokens) |
|---|---|
| Economy | < $0.50 |
| Standard | $0.50 – $5.00 |
| Premium | > $5.00 |
| Free | $0.00 |
Tier Policies
Restrict agents to specific tiers:
result = reg.check_tier_policy("o1", allowed_tiers=["economy", "standard"])
# {'allowed': False, 'tier': 'premium', 'suggested_alternative': 'gpt-4o-mini'}
Complexity Router
Auto-classify prompts and route to the appropriate cost tier:
from agentcost.intelligence import ComplexityRouter
router = ComplexityRouter()
# Simple question → economy tier
result = router.classify("What is the capital of France?")
# level=SIMPLE, tier=economy, model=gpt-4o-mini
# Complex reasoning → premium tier
result = router.classify("Prove that sqrt(2) is irrational by contradiction")
# level=REASONING, tier=premium, model=o1
# One-shot routing
model = router.route("Summarize this report", provider="anthropic")
# "claude-3-5-sonnet-20241022"
Classification Levels
| Level | Routes To | Triggers |
|---|---|---|
| SIMPLE | Economy | Short questions, lookups, yes/no |
| MEDIUM | Standard | Summarization, moderate generation |
| COMPLEX | Standard | Code review, architecture, analysis |
| REASONING | Premium | Proofs, chain-of-thought, math |
Budget Gates
Pre-execution budget checks that automatically downgrade or block expensive calls:
from agentcost.intelligence import BudgetGate
gate = BudgetGate(budget=10.00)
# Fresh budget → allow
decision = gate.check("gpt-4o", estimated_tokens=5000)
# action=allow, model=gpt-4o
# Record spend
gate.record_spend(8.50) # 85% used
# Now warns
decision = gate.check("gpt-4o")
# action=warn, reason="Budget warning: 85.0% used"
# At 95% → auto-downgrade
gate.spent = 9.50
decision = gate.check("gpt-4o", provider="openai")
# action=downgrade, model=gpt-4o-mini
# At 100% → block
gate.spent = 10.00
decision = gate.check("gpt-4o")
# action=block, reason="Budget exhausted"
Downgrade Chains
| Provider | Chain |
|---|---|
| OpenAI | gpt-4o → gpt-4o-mini → gpt-3.5-turbo |
| Anthropic | claude-3-5-sonnet → claude-3-haiku |
Token Analyzer
Measure how efficiently agents use their context windows:
from agentcost.intelligence import TokenAnalyzer
analyzer = TokenAnalyzer()
# Record LLM calls
analyzer.record_call(
model="gpt-4o", input_tokens=50000, output_tokens=200,
max_context=128000, system_tokens=40000, project="my-app",
)
# Get efficiency report
report = analyzer.analyze("my-app")
print(report.efficiency_score) # 0-100
print(report.warnings) # ["System prompts average 80% of input tokens"]
print(report.recommendations) # ["Consider shortening system prompts..."]
What It Detects
| Pattern | Threshold | Recommendation |
|---|---|---|
| Excessive system prompts | > 30% of input | Shorten prompts, use few-shot selectively |
| Under-utilized context | < 5% of window | Use smaller/cheaper model |
| Near context limit | > 90% of window | Summarize context or use larger model |
| Low output ratio | < 2% of total | Review if all input context is necessary |
Combining Components
The intelligence components work together:
from agentcost.intelligence import ComplexityRouter, TierRegistry, BudgetGate
router = ComplexityRouter()
tiers = TierRegistry()
gate = BudgetGate(budget=50.00)
# 1. Classify the prompt
prompt = "Analyze our Q3 revenue trends"
result = router.classify(prompt)
model = router.route(prompt, provider="openai")
# 2. Check tier policy
policy = tiers.check_tier_policy(model, allowed_tiers=["economy", "standard"])
if not policy["allowed"]:
model = policy["suggested_alternative"]
# 3. Budget gate
decision = gate.check(model, estimated_tokens=5000, provider="openai")
if decision.action == "downgrade":
model = decision.model
elif decision.action == "block":
raise Exception("Budget exhausted")
# 4. Make the call with the approved model
print(f"Using: {model}")