Cascade routing vs. direct pricing

Cascade routing starts every request at the cheapest provider. If the response quality is low, it escalates to the next tier. You only pay premium prices when the cheap answer was genuinely not good enough.

Provider costs per 1K tokens

tier	provider	model	cost / 1K tokens
1	Groq	Llama 3.3 70B	$0.0003
2	Gemini	Gemini 2.0 Flash	$0.0005
3	Anthropic	Claude Sonnet	$0.0030
4	OpenAI	GPT-4o	$0.0050

Tier 1 costs 17x less than tier 4. Cascade routing sends 70-80% of requests to tier 1.

1,000 requests: cascade vs. direct

Simulated distribution based on typical agent workloads (~800 tokens/request average).

tier	provider	requests	% of total	cost
1	Groq	750	75%	$0.1800
2	Gemini	150	15%	$0.0600
3	Anthropic	70	7%	$0.1680
4	OpenAI	30	3%	$0.1200

CASCADE$0.5280

ALL GPT-4o$4.000087% more expensive

ALL CLAUDE$2.400078% more expensive

How cascade routing decides

After each tier responds, four signals determine whether to accept or escalate.

signal	weight	what it checks
Refusal detection	30%	"As an AI...", "I cannot help with...", content policy triggers
Completeness	30%	Did the response actually address the question?
Hedging language	25%	"It depends", "I'm not sure", "generally speaking"
Length ratio	15%	Response length vs. expected for the task complexity

Combined score above 0.7 = accept. Below 0.7 = escalate to the next tier. Maximum 3 escalations.

Try it

One env var change. OpenAI SDK compatible.

pythonpython

from openai import OpenAI

client = OpenAI(
    base_url="https://api.pura.xyz/v1",
    api_key="pura_YOUR_KEY"  # free for 5,000 requests
)

response = client.chat.completions.create(
    model="auto",  # cascade routing picks the model
    messages=[{"role": "user", "content": "What is 2+2?"}]
)

# Check response headers for cost info:
# X-Pura-Model, X-Pura-Cost, X-Pura-Tier

shellcurl

curl https://api.pura.xyz/v1/chat/completions \
  -H "Authorization: Bearer pura_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"What is 2+2?"}]}'

get an API key →quickstart docs →