Error handling

Stable error envelope + full retry matrix

Every SkyAIApp error maps to a stable code (won't change across versions) so clients can handle precisely. Each code is labeled with retry semantics. SDKs map codes to typed Error classes — see the idiomatic handlers below.

Error envelope

Every error uses this shape (returned whenever HTTP status > 399):

{
  "error": {
    "code":       "router.budget_exceeded",          // stable across versions
    "type":       "validation_error",                // category for SDK class mapping
    "message":    "All candidate models exceed budget.maxCostUsd=0.001",
    "request_id": "req_01JFGYZ7K8M2N3P4Q5R6S7T8U9", // for support tickets
    "trace_id":   "tr_01JFGYZ7K8M2N3P4Q5R6S7T8U9",   // open in console
    "detail": {
      // shape varies by code — see Detail shapes below
      "rejected_candidates": [
        { "model": "gpt-5.5-pro",     "estimated_cost_usd": 0.012 },
        { "model": "claude-opus-4.7", "estimated_cost_usd": 0.015 }
      ],
      "suggestion": "Increase budget.maxCostUsd or include cheaper models in the policy."
    }
  }
}

Hard rule: trust code, never parse message. The message is human-readable and may be localized or adjusted across versions.

Retry matrix

Retry nowRetry w/ backoffDo not retryConditional
HTTPCodeCauseFixRetry
400request.invalid_bodyBody fails JSON schema validation.Inspect error.detail.field to pinpoint the field.Do not retry
400request.unsupported_modalityModel doesn't support that modality (e.g. text-only model got an image).Change models whitelist or pick a multimodal model.Do not retry
401auth.missing_keyNo Authorization header.Send Authorization: Bearer $KEY.Do not retry
401auth.invalid_keyKey doesn't exist or was revoked.Check key status in console; rotate.Do not retry
402billing.insufficient_balanceAccount balance is exhausted or monthly quota hit.Top up or upgrade plan in console.Do not retry
403policy.blocked_by_rbacUser/team lacks route:write permission.Grant the permission to that role in console.Do not retry
403policy.model_not_in_allowlistRequested model isn't in the policy allowlist.Update policy or drop models to let the router pick.Do not retry
422router.budget_exceededAll candidates exceed budget.maxCostUsd.Raise budget, soften quality goal, or add cheaper models.Do not retry
422router.no_candidatesAll candidates filtered out by modality + budget + RBAC.Relax the policy or check model registration.Do not retry
429rate_limit.accountAccount-level RPM/TPM/concurrency hit.Wait Retry-After; upgrade plan or self-throttle.Retry w/ backoff
429rate_limit.keyKey-level throttle (set in console).Lower concurrency or raise the key limit.Retry w/ backoff
429rate_limit.upstream_providerUpstream provider throttled SkyAIApp (rare).Router auto-falls back; safe to retry.Retry w/ backoff
499client.canceledClient disconnected before response (incl. explicit abort).Increase timeout; prefetch; consider streaming.Conditional
500router.internal_errorRouter internal exception (paged on our side).Retry; if persistent, email founders.Retry w/ backoff
502upstream.provider_failedPrimary and all fallbacks returned 5xx.Back off + retry; check status page.Retry w/ backoff
504router.timeoutEnd-to-end timeout (timeout_ms).Increase timeout_ms or pick a faster model.Retry now
409idempotency.key_reused_with_diff_bodySame idempotency key with a different body.Generate a new idempotency key.Do not retry

Recommended retry algorithm

For yes-backoff errors, use truncated exponential backoff with decorrelated jitter. The SDKs implement this by default; if you're calling REST yourself, do the same.

// TypeScript
async function callWithRetry<T>(fn: () => Promise<T>, opts = { maxAttempts: 4, baseMs: 200, capMs: 8000 }) {
  let last: unknown;
  for (let attempt = 1; attempt <= opts.maxAttempts; attempt++) {
    try {
      return await fn();
    } catch (err) {
      last = err;

      const sky = err as { status?: number; retryAfterMs?: number; code?: string };
      const status = sky.status ?? 0;

      // Hard "do not retry" set — see retry matrix above.
      const noRetry = new Set([400, 401, 402, 403, 409, 422]);
      if (noRetry.has(status)) throw err;

      // Server told us how long to wait?
      if (sky.retryAfterMs) {
        await sleep(sky.retryAfterMs);
        continue;
      }

      // Exponential with decorrelated jitter (AWS pattern).
      // sleep ∈ [baseMs, min(capMs, prevSleep * 3))
      const prev = (sleepHistory[sleepHistory.length - 1] ?? opts.baseMs);
      const next = Math.min(opts.capMs, randInt(opts.baseMs, prev * 3));
      sleepHistory.push(next);
      await sleep(next);
    }
  }
  throw last;
}

const sleepHistory: number[] = [];
const sleep = (ms: number) => new Promise((r) => setTimeout(r, ms));
const randInt = (lo: number, hi: number) => lo + Math.floor(Math.random() * (hi - lo));

Why decorrelated jitter (not plain random): it spreads concurrent retrying clients across time to avoid thundering herds. AWS's 'Exponential Backoff and Jitter' post is the classic background read.

SDK error class mapping

SDKs map error.code to typed Error subclasses. Catch by class, not by string.

import {
  SkyAI,
  RouterError,                  // base class — all SkyAIApp errors extend this
  RouterTimeoutError,           // 504 router.timeout
  RouterBudgetError,            // 422 router.budget_exceeded
  AuthError,                    // 401/403 auth.* / policy.*
  RateLimitError,               // 429 rate_limit.* — exposes .retryAfterMs
  UpstreamProviderError,        // 502 upstream.provider_failed — exposes .upstreamProvider
  isRouterError,                // type-guard for unknown errors
} from "@skyaiapp/sdk";

try {
  const res = await sky.route({ /* ... */ });
} catch (err) {
  if (err instanceof RateLimitError) {
    metrics.inc("router.rate_limit", { tier: err.tier });
    await sleep(err.retryAfterMs ?? 1000);
    return retry();
  }
  if (err instanceof RouterBudgetError) {
    log.warn("budget exhausted", { suggestion: err.detail.suggestion });
    return fallbackToCheaperPolicy();
  }
  if (err instanceof RouterTimeoutError) {
    return null; // user-facing graceful degradation
  }
  if (err instanceof UpstreamProviderError) {
    pager.notify("Upstream " + err.upstreamProvider + " failed", err.traceId);
    throw err;
  }
  if (isRouterError(err)) {
    log.error("Unhandled SkyAI error", { code: err.code, traceId: err.traceId });
  }
  throw err;
}

Debugging tips

  1. Start with the trace_id. Search it in the console — you'll see candidate scores, cache events, and fallback reasons in one view.
  2. When opening tickets, attach request_id. It lets our oncall locate your specific request immediately.
  3. Capture error.detail. The code tells you what; detail tells you why and what to do.
  4. Reproduce locally with sk_test_. Sandbox mirrors production semantics without billing. Configure a mock-model to reproduce 5xx offline.

See also

Need a human?Email the founders

Was this page helpful?

Let us know how we can improve

Error Handling | SkyAIApp Docs — SkyAIApp