Cascade routing vs. direct pricing
Cascade routing starts every request at the cheapest provider. If the response quality is low, it escalates to the next tier. You only pay premium prices when the cheap answer was genuinely not good enough.
Provider costs per 1K tokens
| tier | provider | model | cost / 1K tokens |
|---|---|---|---|
| 1 | Groq | Llama 3.3 70B | $0.0003 |
| 2 | Gemini | Gemini 2.0 Flash | $0.0005 |
| 3 | Anthropic | Claude Sonnet | $0.0030 |
| 4 | OpenAI | GPT-4o | $0.0050 |
Tier 1 costs 17x less than tier 4. Cascade routing sends 70-80% of requests to tier 1.
1,000 requests: cascade vs. direct
Simulated distribution based on typical agent workloads (~800 tokens/request average).
| tier | provider | requests | % of total | cost |
|---|---|---|---|---|
| 1 | Groq | 750 | 75% | $0.1800 |
| 2 | Gemini | 150 | 15% | $0.0600 |
| 3 | Anthropic | 70 | 7% | $0.1680 |
| 4 | OpenAI | 30 | 3% | $0.1200 |
CASCADE$0.5280
ALL GPT-4o$4.000087% more expensive
ALL CLAUDE$2.400078% more expensive
How cascade routing decides
After each tier responds, four signals determine whether to accept or escalate.
| signal | weight | what it checks |
|---|---|---|
| Refusal detection | 30% | "As an AI...", "I cannot help with...", content policy triggers |
| Completeness | 30% | Did the response actually address the question? |
| Hedging language | 25% | "It depends", "I'm not sure", "generally speaking" |
| Length ratio | 15% | Response length vs. expected for the task complexity |
Combined score above 0.7 = accept. Below 0.7 = escalate to the next tier. Maximum 3 escalations.
Try it
One env var change. OpenAI SDK compatible.
from openai import OpenAI
client = OpenAI(
base_url="https://api.pura.xyz/v1",
api_key="pura_YOUR_KEY" # free for 5,000 requests
)
response = client.chat.completions.create(
model="auto", # cascade routing picks the model
messages=[{"role": "user", "content": "What is 2+2?"}]
)
# Check response headers for cost info:
# X-Pura-Model, X-Pura-Cost, X-Pura-Tiercurl https://api.pura.xyz/v1/chat/completions \
-H "Authorization: Bearer pura_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"What is 2+2?"}]}'