The Pura gateway (api.pura.xyz) is an OpenAI-compatible HTTP endpoint that routes inference requests across providers using on-chain capacity weights. Swap your baseURL and everything else stays the same.
https://api.pura.xyz/v1All requests require a pura_ prefixed API key in the Authorization header:
Authorization: Bearer pura_abc123...Generate keys at pura.xyz/gateway.
Standard OpenAI chat completion. Supports streaming.
curl https://api.pura.xyz/v1/chat/completions \
-H "Authorization: Bearer pura_your_key" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "hello"}],
"stream": true
}'Request body follows the OpenAI API spec. The model field selects the downstream model. The gateway routes to whichever provider has spare capacity for that model.
Pass an optional routing object in the request body to influence provider selection:
{
"messages": [{"role": "user", "content": "Explain backpressure routing"}],
"routing": {
"quality": "high",
"prefer": "anthropic"
}
}| Field | Type | Description |
|---|---|---|
quality | "low" / "balanced" / "high" | Shifts the complexity tier up or down. "high" bumps cheap tasks to mid-tier models and mid tasks to premium. "low" does the reverse. Default: "balanced" (no shift). |
prefer | string | Soft preference for a provider name (e.g. "anthropic"). Doubles that provider's capacity weight during selection. Not a hard lock — use model for that. |
maxCost | number | Experimental. Max cost per 1K tokens in USD. Filters out providers above this rate. |
maxLatency | number | Experimental. Max average latency in ms (5-minute window). Filters out providers above this threshold. |
excludeProviders | string[] | Experimental. Provider names to exclude from routing. |
Experimental fields are accepted and honored. The response includes an X-Pura-Experimental header listing which experimental fields were used.
Returns gateway health and available providers.
{
"status": "ok",
"service": "pura-gateway",
"version": "0.1.0",
"chain": "base-sepolia",
"timestamp": "2026-07-01T00:00:00.000Z"
}Every response includes routing metadata:
| Header | Description |
|---|---|
X-Pura-Provider | Which provider handled the request (e.g. openai, anthropic) |
X-Pura-Request-Id | Unique request identifier for on-chain receipt lookup |
X-Pura-Tier | Complexity tier used for routing (cheap, mid, premium) |
X-Pura-Cost | Estimated request cost in USD |
X-Pura-Budget-Remaining | Daily budget remaining in USD |
X-Pura-Quality | Quality bias applied, if any (low, balanced, high) |
X-Pura-Explored | true when the router explored a non-preferred provider |
X-Pura-Experimental | Comma-separated list of experimental routing fields used |
X-RateLimit-Remaining | Requests remaining in the current rate limit window |
On rate limit (429), the response includes a Retry-After header with seconds until the window resets.
30 requests per minute per API key. Sliding window.
The gateway scores each request for complexity (cheap/mid/premium) and reads capacity weights from the on-chain CapacityRegistry. Provider selection combines three signals:
The routing.quality hint shifts the complexity tier up or down. The routing.prefer hint doubles a named provider's weight. Experimental filters (maxCost, maxLatency, excludeProviders) trim the candidate set before selection.
Adaptive exploration sends ~5% of requests to a non-preferred provider to discover performance changes. The exploration rate doubles when any provider shows degraded performance (>20% error rate or >5s latency in the 5-minute window).
If the selected provider fails, the gateway falls back to the next available provider. Each completion is recorded on-chain via the CompletionLedger.
import { route } from '@puraxyz/sdk'
const result = await route({
apiKey: 'pura_your_key',
messages: [{ role: 'user', content: 'What is BPE?' }],
})
console.log(result.content)
console.log(result.provider) // "openai" or "anthropic"
console.log(result.requestId) // on-chain receipt idOr use the OpenAI SDK directly:
import OpenAI from 'openai'
const client = new OpenAI({ baseURL: 'https://api.pura.xyz/v1', apiKey: 'pura_your_key' })
const completion = await client.chat.completions.create({
model: 'gpt-4o-mini',
messages: [{ role: 'user', content: 'hello' }],
})The gateway supports CORS for browser-based clients. Allowed origins, methods, and headers are configured to support standard fetch API usage.