API Reference

REST API v1 · interface may receive non-breaking adjustments before public beta

Complete REST API reference: parameter tables, request/response schemas, status codes, error patterns, rate limits, idempotency, and pagination for every endpoint. All examples use cURL — equivalent TypeScript / Python / Go versions live under SDKs.

Base information

Base URL (production)https://api.skyaiapp.com

Base URL (sandbox)https://api-sandbox.skyaiapp.com

API versionv1

AuthenticationBearer token

Request formatapplication/json; UTF-8

Timestamp formatRFC 3339 (ISO 8601)

EncodingUTF-8 only

TLSTLS 1.2+ required

Authentication Model routing Agent runtime Traces Analytics Webhooks Rate limits Error codes

Authentication

Every request needs a Bearer token. Keys are created in the console with live / test prefixes.

Authorization: Bearer sk_live_01JEXAMPLE...
# or for sandbox / CI:
Authorization: Bearer sk_test_01JEXAMPLE...

sk_live_…

Production keys — billed, real model calls, real trace retention.

sk_test_…

Sandbox keys — hit api-sandbox.skyaiapp.com, never billed, returns canned model output. Use these in CI.

If a key leaks

Revoke in the console — propagation < 5s globally. All traces from that key over the last 24h are flagged as compromised. Already-billed requests are not auto-refunded; file a ticket for manual review.

Full flow: Security guide · key rotation。

Model routing

POST/v1/route

Route one request. The router picks a primary and fallback from 50+ models, executes the call, and returns the response plus a decision trace.

Request body

Field	Type	Required	Description
`messages`	`Message[]`	required	OpenAI-compatible chat messages (system / user / assistant / tool).
`goal`	`string`	optional	cost / quality / stability. Defaults to cost under balanced strategy.
`strategy`	`string`	optional	balanced (default) / cost-optimized / quality-first / latency-optimized.
`policy_id`	`string`	optional	Pre-created policy ID from console. Overrides inline goal/strategy when present.
`models`	`string[]`	optional	Constrain candidates to this whitelist.
`fallback`	`Fallback`	optional	{ models: string[], maxRetries: number }. Overrides the default fallback chain.
`budget`	`Budget`	optional	{ maxCostUsd?, maxTokens? }. Hard constraint — candidates exceeding it are rejected.
`cache`	`boolean \| CacheOpts`	optional	true enables default semantic cache; CacheOpts customizes threshold + TTL.
`stream`	`boolean`	optional	true returns an SSE stream; defaults to false.
`tools`	`Tool[]`	optional	Tool/function declarations (OpenAI tools compatible). Chat context only.
`tool_choice`	`string \| object`	optional	auto / none / required / specific tool name.
`metadata`	`object`	optional	Arbitrary K/V tags, indexed into traces and billing reports.
`timeout_ms`	`number`	optional	End-to-end timeout (default 60s); on timeout returns 504 + RouterTimeoutError.
`idempotency_key`	`string`	optional	Idempotency key (≤ 64 bytes). Repeat requests within 24h return the same result with no double-billing.

Example

curl https://api.skyaiapp.com/v1/route \
  -H "Authorization: Bearer $SKYAIAPP_API_KEY" \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: req_user42_2026-05-14T10:30Z" \
  -d '{
    "goal": "quality",
    "strategy": "quality-first",
    "messages": [
      { "role": "system", "content": "You are a senior travel planner." },
      { "role": "user",   "content": "Plan a 7-day European city-hopping trip in October." }
    ],
    "fallback": {
      "models": ["claude-opus-4.7", "gemini-3.1-pro"],
      "maxRetries": 2
    },
    "budget":     { "maxCostUsd": 0.05 },
    "cache":      true,
    "timeout_ms": 30000,
    "metadata":   { "tenant": "acme-corp", "workflow": "trip-planner", "env": "prod" }
  }'

Response

{
  "trace_id": "tr_01JFGYZ7K8M2N3P4Q5R6S7T8U9",
  "output": "Day 1: Land in Lisbon...",
  "routing": {
    "selected_model":  "gpt-5.5-pro",
    "fallback_chain":  ["claude-opus-4.7", "gemini-3.1-pro"],
    "policy_version":  "policy_prod_v3@2026-05-12",
    "decision_reason": "quality-first + quality goal: gpt-5.5-pro wins on benchmark composite within budget.",
    "cost_usd":        0.0421,
    "latency_ms":      1820,
    "cache_hit":       false,
    "tokens": {
      "input":  256,
      "output": 1840,
      "cached": 0
    }
  },
  "usage": {
    "rate_limit_remaining":  4998,
    "rate_limit_reset_unix": 1715680800
  }
}

Status codes

Code	Meaning	Retry?
200	Success (including fallback-rescued)	—
400	Request validation failed — see error.detail	No (fix request)
401	Missing or invalid key	No (fix key)
402	Insufficient balance / quota exhausted	No (top up)
403	Blocked by policy or RBAC	No (change policy)
422	All candidates filtered by budget/constraints	No (relax budget)
429	Rate-limited — see Retry-After header	Yes (per Retry-After)
500	Router internal error	Yes (exp backoff)
502	Upstream provider failed — fallback also failed	Yes (exp backoff + jitter)
504	End-to-end timeout (timeout_ms)	Yes (increase timeout or pick faster model)

Full error envelope and SDK class mapping: Error handling。

Available models

GET/v1/models

List currently routable models. Each entry has capability, context window, pricing, and provider availability status.

curl https://api.skyaiapp.com/v1/models \
  -H "Authorization: Bearer $SKYAIAPP_API_KEY"

{
  "models": [
    {
      "id": "gpt-5.5-pro",
      "provider": "openai",
      "context_window": 1000000,
      "modalities": ["text", "image", "tool"],
      "pricing_per_1m": { "input_usd": 5.00, "output_usd": 30.00 },
      "availability": { "status": "available", "regions": ["us", "eu"] },
      "deprecated_at": null
    },
    { "id": "claude-opus-4.7", "provider": "anthropic", "...": "..." },
    { "id": "gemini-3.1-pro", "provider": "google",    "...": "..." }
  ]
}

Agent runtime

POST/v1/agents/run

Execute a multi-step agent: it picks tools, maintains state, with timeouts and retries handled by the runtime.

Request body (key fields)

Field	Type	Description
`task`	`string`	Natural-language task description (required).
`tools`	`Tool[]`	Tools the agent may call — built-in (web_search, calculator, code_exec) + custom.
`max_steps`	`number`	Hard cap (default 8). On overrun returns partial result + max_steps_reached.
`per_step_timeout_ms`	`number`	Per-step timeout (default 30s).
`total_budget_usd`	`number`	Cumulative cost cap for the whole run.
`model_strategy`	`object`	Underlying LLM calls reuse the /v1/route goal/strategy semantics.
`stream_steps`	`boolean`	true streams per-step events as SSE.

curl https://api.skyaiapp.com/v1/agents/run \
  -H "Authorization: Bearer $SKYAIAPP_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "task": "Find this month overdue invoices and email a polite reminder to each.",
    "tools": [
      "web_search",
      {
        "name": "lookup_invoice",
        "description": "Fetch invoice by ID from internal billing system.",
        "parameters": { "type": "object", "properties": { "id": { "type": "string" } }, "required": ["id"] }
      }
    ],
    "max_steps": 10,
    "per_step_timeout_ms": 30000,
    "total_budget_usd": 0.50,
    "model_strategy": { "goal": "quality", "strategy": "balanced" },
    "metadata": { "tenant": "acme-corp", "workflow": "billing-followup" }
  }'

Response (excerpt)

{
  "trace_id": "tr_01JFAGENT2N3P4Q5R6S7T8U9",
  "status": "completed",                       // completed | partial | failed | budget_exceeded
  "output": "Sent reminders to 7 customers; logged failures for 2.",
  "steps": [
    {
      "number": 1,
      "action": "tool_call",
      "tool":   "lookup_invoice",
      "input":  { "id": "inv_2901" },
      "output": { "status": "overdue", "amount_usd": 1820 },
      "duration_ms": 220,
      "cost_usd": 0
    },
    { "number": 2, "action": "llm_call",  "...": "..." }
  ],
  "summary": {
    "total_steps":     9,
    "total_cost_usd":  0.0421,
    "total_latency_ms": 18420,
    "tools_used":      ["lookup_invoice", "send_email"]
  }
}

Traces

GET/v1/traces/{trace_id}

Fetch the full span tree of a single trace (with sub-traces, cache events, guardrail hits).

GET/v1/traces?from=…&to=…&filter=…&limit=100&cursor=…

Filter traces by time / metadata / error class. Cursor-based pagination — see pagination section below.

# Filter on metadata.tenant + only failed traces in the last hour
curl -G https://api.skyaiapp.com/v1/traces \
  -H "Authorization: Bearer $SKYAIAPP_API_KEY" \
  --data-urlencode "from=2026-05-14T09:00:00Z" \
  --data-urlencode "to=2026-05-14T10:00:00Z" \
  --data-urlencode "filter=metadata.tenant=acme-corp,status=failed" \
  --data-urlencode "limit=100"

Analytics

Low-cardinality aggregate endpoints for dashboards. For raw data use /v1/traces instead.

GET /v1/analytics/usageUsage, cost, token counts; by time-bucket + groupBy (model / tenant / workflow).

GET /v1/analytics/cacheHit rate, dollars saved, TTL distribution.

GET /v1/analytics/fallbackFallback trigger counts, reason histogram, per-candidate success rate.

GET /v1/analytics/latencyP50 / P95 / P99 latency by model and strategy.

curl -G https://api.skyaiapp.com/v1/analytics/usage \
  -H "Authorization: Bearer $SKYAIAPP_API_KEY" \
  --data-urlencode "start_date=2026-05-01" \
  --data-urlencode "end_date=2026-05-14" \
  --data-urlencode "group_by=model" \
  --data-urlencode "bucket=day"

Webhooks

Events like router.fallback_triggered, router.budget_exceeded, agent.run_completed, trace.error_burst can be pushed to your endpoint. Full event catalog, signature verification, retry policy, and replay tools live on the dedicated page.

Full Webhooks docs

Rate limits

Two-tier limiting: account-level (per plan) + key-level (configured in console). Both use sliding windows (not fixed buckets) — smoother and harder to spike.

Plan	RPM (req/min)	TPM (tok/min)	Concurrency
Free	60	60k	5
Pro	600	600k	25
Team	3,000	3M	100
Enterprise	Per contract	Per contract	Per contract

Response headers

X-RateLimit-Limit:        600
X-RateLimit-Remaining:    582
X-RateLimit-Reset:        1715680800     # unix epoch
X-RateLimit-Resource:     requests       # or "tokens"
Retry-After:              4              # seconds; only on 429

Best practice: self-throttle from Remaining; on 429, back off per Retry-After (do not retry immediately). SDKs do this automatically.

Idempotency

For side-effectful endpoints (route / agents/run), send an Idempotency-Key header. Repeating the same key within 24h returns the original result with no double-billing. Lets your client safely retry on network blips.

Idempotency-Key: req_<your-id>_<timestamp>
# Length ≤ 64 bytes; recommended format: <client>_<userId>_<utc-millis>
# Stored 24h then evicted.

Reusing a key with a different body returns 409 idempotency_key_reused_with_different_body.

Pagination (cursor)

All list endpoints use cursor pagination. First page omits cursor; if next_cursor is non-null in the response, there's another page. Cursors are valid for 24h.

# First page
GET /v1/traces?limit=100

# Subsequent
GET /v1/traces?limit=100&cursor=eyJ0Ijoi...

# Response
{
  "data": [ ... ],
  "next_cursor": "eyJ0IjoiMjAyNi0wNS0xNFQwOToyMDoxMVoifQ==",
  "has_more": true
}

Error codes & handling

Errors share a stable envelope and code field. SDKs map each code to a typed Error class for precise handling. Full list and retry matrix on the dedicated page.

{
  "error": {
    "code":     "router.budget_exceeded",
    "message":  "All candidate models exceed budget.maxCostUsd=0.001",
    "type":     "validation_error",
    "request_id": "req_01JFGYZ7K8M2N3P4Q5R6S7T8U9",
    "trace_id":   "tr_01JFGYZ7K8M2N3P4Q5R6S7T8U9",
    "detail": {
      "rejected_candidates": [
        { "model": "gpt-5.5-pro",       "estimated_cost_usd": 0.012 },
        { "model": "claude-opus-4.7",   "estimated_cost_usd": 0.015 }
      ],
      "suggestion": "Increase budget.maxCostUsd or include cheaper models in the policy."
    }
  }
}

Full error code reference + retry matrix

Was this page helpful?

Let us know how we can improve