Error handling
Stable error envelope + full retry matrix
Every SkyAIApp error maps to a stable code (won't change across versions) so clients can handle precisely. Each code is labeled with retry semantics. SDKs map codes to typed Error classes — see the idiomatic handlers below.
Error envelope
Every error uses this shape (returned whenever HTTP status > 399):
{
"error": {
"code": "router.budget_exceeded", // stable across versions
"type": "validation_error", // category for SDK class mapping
"message": "All candidate models exceed budget.maxCostUsd=0.001",
"request_id": "req_01JFGYZ7K8M2N3P4Q5R6S7T8U9", // for support tickets
"trace_id": "tr_01JFGYZ7K8M2N3P4Q5R6S7T8U9", // open in console
"detail": {
// shape varies by code — see Detail shapes below
"rejected_candidates": [
{ "model": "gpt-5.5-pro", "estimated_cost_usd": 0.012 },
{ "model": "claude-opus-4.7", "estimated_cost_usd": 0.015 }
],
"suggestion": "Increase budget.maxCostUsd or include cheaper models in the policy."
}
}
}Hard rule: trust code, never parse message. The message is human-readable and may be localized or adjusted across versions.
Retry matrix
| HTTP | Code | Cause | Fix | Retry |
|---|---|---|---|---|
| 400 | request.invalid_body | Body fails JSON schema validation. | Inspect error.detail.field to pinpoint the field. | Do not retry |
| 400 | request.unsupported_modality | Model doesn't support that modality (e.g. text-only model got an image). | Change models whitelist or pick a multimodal model. | Do not retry |
| 401 | auth.missing_key | No Authorization header. | Send Authorization: Bearer $KEY. | Do not retry |
| 401 | auth.invalid_key | Key doesn't exist or was revoked. | Check key status in console; rotate. | Do not retry |
| 402 | billing.insufficient_balance | Account balance is exhausted or monthly quota hit. | Top up or upgrade plan in console. | Do not retry |
| 403 | policy.blocked_by_rbac | User/team lacks route:write permission. | Grant the permission to that role in console. | Do not retry |
| 403 | policy.model_not_in_allowlist | Requested model isn't in the policy allowlist. | Update policy or drop models to let the router pick. | Do not retry |
| 422 | router.budget_exceeded | All candidates exceed budget.maxCostUsd. | Raise budget, soften quality goal, or add cheaper models. | Do not retry |
| 422 | router.no_candidates | All candidates filtered out by modality + budget + RBAC. | Relax the policy or check model registration. | Do not retry |
| 429 | rate_limit.account | Account-level RPM/TPM/concurrency hit. | Wait Retry-After; upgrade plan or self-throttle. | Retry w/ backoff |
| 429 | rate_limit.key | Key-level throttle (set in console). | Lower concurrency or raise the key limit. | Retry w/ backoff |
| 429 | rate_limit.upstream_provider | Upstream provider throttled SkyAIApp (rare). | Router auto-falls back; safe to retry. | Retry w/ backoff |
| 499 | client.canceled | Client disconnected before response (incl. explicit abort). | Increase timeout; prefetch; consider streaming. | Conditional |
| 500 | router.internal_error | Router internal exception (paged on our side). | Retry; if persistent, email founders. | Retry w/ backoff |
| 502 | upstream.provider_failed | Primary and all fallbacks returned 5xx. | Back off + retry; check status page. | Retry w/ backoff |
| 504 | router.timeout | End-to-end timeout (timeout_ms). | Increase timeout_ms or pick a faster model. | Retry now |
| 409 | idempotency.key_reused_with_diff_body | Same idempotency key with a different body. | Generate a new idempotency key. | Do not retry |
Recommended retry algorithm
For yes-backoff errors, use truncated exponential backoff with decorrelated jitter. The SDKs implement this by default; if you're calling REST yourself, do the same.
// TypeScript
async function callWithRetry<T>(fn: () => Promise<T>, opts = { maxAttempts: 4, baseMs: 200, capMs: 8000 }) {
let last: unknown;
for (let attempt = 1; attempt <= opts.maxAttempts; attempt++) {
try {
return await fn();
} catch (err) {
last = err;
const sky = err as { status?: number; retryAfterMs?: number; code?: string };
const status = sky.status ?? 0;
// Hard "do not retry" set — see retry matrix above.
const noRetry = new Set([400, 401, 402, 403, 409, 422]);
if (noRetry.has(status)) throw err;
// Server told us how long to wait?
if (sky.retryAfterMs) {
await sleep(sky.retryAfterMs);
continue;
}
// Exponential with decorrelated jitter (AWS pattern).
// sleep ∈ [baseMs, min(capMs, prevSleep * 3))
const prev = (sleepHistory[sleepHistory.length - 1] ?? opts.baseMs);
const next = Math.min(opts.capMs, randInt(opts.baseMs, prev * 3));
sleepHistory.push(next);
await sleep(next);
}
}
throw last;
}
const sleepHistory: number[] = [];
const sleep = (ms: number) => new Promise((r) => setTimeout(r, ms));
const randInt = (lo: number, hi: number) => lo + Math.floor(Math.random() * (hi - lo));Why decorrelated jitter (not plain random): it spreads concurrent retrying clients across time to avoid thundering herds. AWS's 'Exponential Backoff and Jitter' post is the classic background read.
SDK error class mapping
SDKs map error.code to typed Error subclasses. Catch by class, not by string.
import {
SkyAI,
RouterError, // base class — all SkyAIApp errors extend this
RouterTimeoutError, // 504 router.timeout
RouterBudgetError, // 422 router.budget_exceeded
AuthError, // 401/403 auth.* / policy.*
RateLimitError, // 429 rate_limit.* — exposes .retryAfterMs
UpstreamProviderError, // 502 upstream.provider_failed — exposes .upstreamProvider
isRouterError, // type-guard for unknown errors
} from "@skyaiapp/sdk";
try {
const res = await sky.route({ /* ... */ });
} catch (err) {
if (err instanceof RateLimitError) {
metrics.inc("router.rate_limit", { tier: err.tier });
await sleep(err.retryAfterMs ?? 1000);
return retry();
}
if (err instanceof RouterBudgetError) {
log.warn("budget exhausted", { suggestion: err.detail.suggestion });
return fallbackToCheaperPolicy();
}
if (err instanceof RouterTimeoutError) {
return null; // user-facing graceful degradation
}
if (err instanceof UpstreamProviderError) {
pager.notify("Upstream " + err.upstreamProvider + " failed", err.traceId);
throw err;
}
if (isRouterError(err)) {
log.error("Unhandled SkyAI error", { code: err.code, traceId: err.traceId });
}
throw err;
}Debugging tips
- Start with the trace_id. Search it in the console — you'll see candidate scores, cache events, and fallback reasons in one view.
- When opening tickets, attach request_id. It lets our oncall locate your specific request immediately.
- Capture error.detail. The code tells you what; detail tells you why and what to do.
- Reproduce locally with sk_test_. Sandbox mirrors production semantics without billing. Configure a mock-model to reproduce 5xx offline.
See also
Rate limits
Plan caps + response headers
Troubleshooting
Symptom-first triage
Webhooks
router.error_burst and other alert events
Was this page helpful?
Let us know how we can improve