Claude Fable 5 / Sonnet 5 · GPT-5.6 Sol/Terra/Luna · Opus 4.8 · DeepSeek

Agent orchestration · hundreds of AI models

Harness orchestration · Nexus routing · session budgets & compaction · transparent channel pricing

ModelAPI is an agent & multi-model orchestration cloud — routing, budgets, compaction, and cache. You pay for orchestration software with tiered AKL / Rsc / Bedrock / direct channel pricing.

Get API key Try Playground Compare models →

Python · OpenAI SDK1-line migration

# Before (OpenAI official)

client = OpenAI(base_url="https://api.openai.com/v1")

# After — ModelAPI, same SDK, every top model

client = OpenAI(base_url="https://api.aimodelapi.ai/v1")

✓ Harness Control✓ Nexus routing✓ OpenAI-compatible

400+

AI Models

60+

Providers

Harness

Agent orchestration

Top-up service fee

Connecting you to the world's leading AI providers

OpenAIGPT-5.6 Sol/Terra/Luna, GPT-5.5 AnthropicSonnet 5, Opus 4.8 (Fable 5 限制许可)GoogleGemini 3.1 Pro/Flash DeepSeekV4 Pro, V4 Flash, R1 xAIGrok 4.3, Grok 4.1 Fast MetaLlama 4 Maverick/Scout MistralLarge, Nemo, Codestral AlibabaQwen3.7 Max/Plus, Qwen3.6 Plus MoonshotKimi K2.7 Code (K3 待开通)ZhipuGLM-5.1 / GLM-5.2 TencentHy3 Preview (Hunyuan 3)ByteDanceSeedance 2.0 (CN + International)PerplexitySonar Pro, Sonar

Trending This Week

Models ranked by usage

Full rankings

#ModelInput /MContextWeekly

Claude Sonnet 5Latest Sonnet

Anthropic

$1.901M

+51%

GPT-5.6 SolGPT-5.6 Flagship

OpenAI

$5.001M

+47%

Claude Opus 4.8Flagship

Anthropic

$4.751M

+38%

GPT-5.6 TerraBalanced

OpenAI

$2.501M

+33%

DeepSeek V4 FlashHottest

DeepSeek

$0.141M

+344%

Gemini 3.1 Pro1M context

Google

$2.001.048576M

+12%

GPT-5.6 LunaFast GPT-5.6

OpenAI

$1.001M

+28%

DeepSeek V4 ProBest Value

DeepSeek

$0.501M

+29%

Claude Fable 5Restricted

Anthropic

$10.001M

—

Based on real request volume over the past 7 days · updated hourly

Harness · claude-code-optimal

Claude Code + server-side hybrid routing~$2.20/M output in typical agents

Harness sends tests, lint, and file scans to nexus/cheapest-west while plan/reasoning stays on frontier tiers. Verify via X-Harness-Task — see /harness.

Explore Harness Claude Code setup

Cost per 1M output tokens

All-Opus

Claude Opus 4.8 · 4.7 live

$25.00

baseline

Hybrid Agent

Opus 5% + nexus/cheapest-west 95%

$2.20

−91%

Savings91%

Output quality: near full-Opus

Sound familiar?

3 blockers that stop most developers

Account friction, cross-border routing, and billing — we offer a compliant SaaS path (see /compliance).

🚫

Account rejected or banned

Overseas phone + credit card required. Accounts banned without warning; balance lost.

✓ Our solution

📶

Unstable network & timeouts

Cross-border API calls over long routes suffer latency spikes and timeouts — hard to meet production SLAs.

✓ Our solution

Multi-region PoPs and smart routing. Primary endpoint in US East; regional BASE_URLs as PoPs ship.

💸

Payment walls & no invoices

Foreign card only. No RMB payment, no VAT invoice, no corporate billing.

✓ Our solution

WeChat Pay, Alipay, bank transfer in RMB. VAT invoices available. Corporate billing supported.

Problems we solve

Ten keys, inflated bills, flaky routing — condensed into one control plane.

Sound familiar?

10 provider accounts · 10 API keys · 10 dashboards

OpenAI, Anthropic, Google, DeepSeek — each with its own account, billing, key rotation, and quota tracking.

ModelAPI

One key. 50+ models. Unified billing. Central control.

Sound familiar?

API feels 10× more expensive · hidden token inflation

Per-token API billing adds up. Some gateways add markup, inflate counts, or swap models.

ModelAPI

0% per-token markup. Official prices. What you call is what you get.

Sound familiar?

Cheap models underperform · flagship costs · manual switching hurts flow

Trading cost vs quality manually breaks unattended workflows.

ModelAPI

nexus/* routes by task; standard IDs always hit the named model.

Sound familiar?

Complex docs · hard to debug · slow support

Non-developers get stuck; ticket cycles are long.

ModelAPI

Docs tested with non-experts; AI support available 7×24.

Frontier paths and savings in one envelope

Same credential surface for experimentation and scaled traffic.

Frontier models

Claude Fable 5 / Sonnet 5, GPT-5.6 Sol/Terra/Luna, Opus 4.8, Gemini 3.1 Pro — same integration surface.

Claude Fable 5Claude Sonnet 5GPT-5.6 SolGPT-5.6 TerraOpus 4.8Gemini 3.1 Pro

High-value models

DeepSeek V4 Flash, Qwen, Kimi, GLM — built for volume and everyday workloads.

DeepSeek V4 FlashQwen3.6 PlusKimi K2.6GLM-5 FlashGemini 3.1 Flash-Lite

Watch a request end-to-end

Representative routing: code pane, streamed body, latency and credits.

request.ts

Anthropic

Loading...

response.json

Awaiting response…

Ecosystem

Works where you already build

Point Cursor, Windsurf, VS Code, Claude Code, or Codex CLI at ModelAPI.

All guides

Cursor

Custom OpenAI endpoint

Windsurf

Custom model config

VS Code

via Continue extension

Trae

OpenAI-compatible mode

Codex CLI

OPENAI_BASE_URL override

Universal config — works in every tool

Base URL

https://api.aimodelapi.ai/v1

API Key

sk-ama-YOUR_KEY

View All Integration Guides

Resilience

Backup hosts when routes change

Reuse the exact same key — swap hostname only.

GLGlobal (Primary gateway)PRIMARY

api.aimodelapi.ai/v1

Primary · nearest PoP

USVirginiaSoon

us.api.aimodelapi.ai/v1

US East

SGSingapore

sg.api.aimodelapi.ai/v1

Asia Pacific PoP

JPTokyoSoon

jp.api.aimodelapi.ai/v1

Japan PoP

THBangkokSoon

th.api.aimodelapi.ai/v1

Southeast Asia

EUFrankfurtSoon

eu.api.aimodelapi.ai/v1

Europe PoP

BKAlt Primary DomainSoon

api2.aimodelapi.ai/v1

Backup if .ai is blocked

Switching endpoints takes one config change

Replace api.aimodelapi.ai with any backup host in your BASE_URL. Same API key. Same code.

Live Status

Catalog

Featured models

50+ frontier and value tiers · one unified API envelope

Global / frontierCN / value tiers

View all

Anthropic

Capabilities

What sits behind one endpoint

Harness, Nexus, and failover algorithms — one endpoint, not another passthrough shop.

Harness Control

Server-side agent orchestration: tiered routing, session budgets, auto-compaction, and cache tuning via HTTP headers — proprietary algorithm layer.

Nexus Smart Routing

nexus/* aliases resolve to the best model by cost, latency, or capability. Software routing with fallbacks — not blind passthrough.

Universal Model Access

GPT-5.5, Claude Opus 4.8 / 4.7, Gemini 3.1, DeepSeek V4, Grok 4.3 and 50+ more — all through one API key, no restrictions.

Drop-in OpenAI SDK

Change 2 lines of code. Keep your existing SDK, prompts, and tools — zero refactoring.

Global Edge Network

US · Singapore · Tokyo · Frankfurt. Regional PoPs roll out after US primary launch.

Lowest Total Cost

Official API prices, zero markup. 5% infrastructure fee at top-up only — below industry average. Bring Your Own Key (BYOK) at $0.50/M for expensive models.

Enterprise Security

Key isolation · RBAC · Audit logs · Spend caps · SOC 2 compliant infrastructure.

Agent cloud · not a token relay

Harness Control: orchestration for agents

ModelAPI sells an agent service: tiered task routing, session budgets, compaction, and cache optimization. Model calls are billed at upstream channel rates — you buy orchestration, not a simple token hop.

Harness Control

Server-side agent orchestration: tiered task routing (plan / bulk / compress), session USD budgets, auto context compaction, and prompt-cache optimization.

Nexus Routing Engine

Cost-, latency-, and capability-aware model selection via nexus/* aliases. Live catalog resolution — your code stays stable as frontier SKUs update.

Multi-tier Fallback

Profile-driven fallbacks (e.g. Opus → Sonnet → nexus/cheapest-west) with provider health checks — intelligent agent routing.

Response cache & Memory API

Exact-match response cache for repeated non-stream calls; transient memory consolidate API for Hermes/OpenClaw agent workflows.

Harness request exampleAlgorithm-layer headers

curl https://api.aimodelapi.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-ama-..." \
  -H "x-harness-profile: claude-code-optimal" \
  -H "x-harness-session-id: my-project-1" \
  -H "x-harness-auto-compact: true" \
  -d '{"model":"nexus/best-coding","messages":[...]}'

Response headers X-Harness-Profile / X-Harness-Model / X-Harness-Task expose routing decisions. See /docs#harness and integrations.

Harness Control docs vs Claude Code

For Claude Code power users

What Anthropic direct cannot do — Harness does server-side

Claude Code is a great terminal agent. ModelAPI Harness is orchestration software on our gateway — it does not replace Claude Code, but auto-picks models, enforces budgets, and compacts long context. All features below are live and verifiable via response headers.

You keep Claude Code

Set ANTHROPIC_BASE_URL to api.aimodelapi.ai and add x-harness-profile: claude-code-optimal.

Harness reads each request

"Run all tests" → bulk tier → nexus/cheapest-west. Architecture question → plan tier → nexus/best-reasoning.

You see the decision

X-Harness-Task and X-Harness-Model tell you which model ran — auditable, billable at official rates.

Capability	Anthropic direct	Claude Code + Harness
Multi-vendor models in one agent session Opus for plan, DeepSeek Flash for bulk — automatic via tiers	No	Yes
Server-side task tier routing (plan / bulk / compress) Rules + optional hybrid LLM classifier on every request	No	Yes
Session USD spend cap with HTTP 429 Redis-backed; claude-code-optimal default $50/session	No	Yes
Auto context compaction before expensive calls Server summarizes older turns; non-streaming requests today	Partial	Yes
Live model aliases (nexus/*, claude-sonnet-latest)	No	Yes
Provider prompt-cache optimization (Bedrock) Dashboard shows cacheRead / cacheSavingsUsd	Partial	Yes
Exact-match response cache Non-stream, temp ≤ 0.1, no tools; X-ModelAPI-Cache header	No	Yes
Agent memory consolidate API POST /v1/harness/memory/consolidate	No	Yes
Transparent routing in response headers X-Harness-Model, X-Harness-Task, X-Harness-Classifier	No	Yes
Native Claude Code UX (queue, steer, IDE) Keep Claude Code — point ANTHROPIC_BASE_URL at ModelAPI + Harness headers	Yes	Partial

Honest notes: Harness does not change Claude Code's client UI. Auto-compact and response cache apply mainly to non-streaming calls today. "Response cache" is exact payload match, not embedding similarity. Hybrid cost ($2.20/M output at 5% Opus / 95% cheap) is a typical agent workload estimate — your mix may differ.

Claude Code + Harness setup Docs & curl verification Get free API key

Start in under 60 seconds

Three steps to your first call

🔐

Create account

💳

Add credits

Top up from $5. Supports WeChat, Alipay, Stripe, Apple Pay, crypto.

View payment options

⚡

Get your API key

Create a key, swap the base URL. Your existing OpenAI code works as-is.

Read quickstart

Change 2 lines — everything else staysPython

# Before (OpenAI direct)

base_url = "https://api.openai.com/v1"

api_key = "sk-openai-..."

# After (ModelAPI — same SDK, any model)

base_url = "https://api.aimodelapi.ai/v1"

api_key = "sk-ama-YOUR_KEY"

client = OpenAI(base_url=base_url, api_key=api_key)

resp = client.chat.completions.create(

model="claude-opus-latest", # or AKL-claude-opus-4-8

messages=[{"role": "user", "content": "Hello"}]

)

Pricing

Usage-based tiers without guesswork

Upstream model passes through with a spelled-out infra surcharge — zero surprise fees.

Free

$0/month

For exploration and testing

20 req/min · 50K req/day
10+ free models
Community support
Basic analytics

Start Free

Pay-as-you-go

5% infra fee

Official API prices · zero markup · min $5 top-up

500 req/min · 50K req/day
All 50+ models
0% per-token markup
5% infra fee (below industry avg)
Bring Your Own Key · $0.50/M
Navigator 1 AI support · 7×24
WeChat · Alipay · Stripe · Apple Pay

Get Started

Enterprise

Custom

For production workloads

Unlimited rate limits
Reduced service fee (negotiated)
Dedicated infrastructure
SLA guarantee
24/7 dedicated support
VAT invoice available

Contact Sales

Accepted payment methods worldwide

WeChat Pay

CNY

Alipay

CNY

Visa

Card

Mastercard

Card

Apple Pay

Wallet

Google Pay

Wallet

PayPal

Global

Amex

Card

256-bit TLS encryptionPCI DSS compliant via StripeCredits never expireMulti-currency support

Loved by developers

Real users, real results

"Used to juggle 7–8 API accounts. Now one ModelAPI key covers everything, billing is crystal clear, and switching between Claude Opus and DeepSeek takes zero effort."

Chen Yuxiang

AI indie developer

"The Singapore PoP is legitimately fast. I was skeptical about a gateway adding latency, but median response time is actually lower than hitting the OpenAI API directly from SEA. Solid routing."

Jason K.

Full-stack engineer, Singapore

"We needed one gateway for multiple model providers. ModelAPI unified billing and routing; WeChat Pay works for our eligible accounts. Deployed to production in one afternoon."

Lin Xiaowen

Startup CTO

"The PPP pricing for India is real — I'm paying ~40% less than teammates in the US for the same models. Switching my existing codebase took literally 2 lines of code."

Priya M.

ML Engineer, Bangalore

"Looked at a lot of gateways — ModelAPI is one of the few with truly 0% per-token markup and a transparent fee structure. The nexus/cheapest route cut our batch inference cost substantially."

Wang Boyuan

Algorithm engineer

"Building multi-tenant where users have different model preferences. The Compare page saved hours of manual benchmarking. BYOK at $0.50/M flat is the most cost-effective routing I've found."

Alex T.

SaaS founder, London

Recent Updates

What we've shipped lately

Full changelog

2026-05-29

Live

Claude Opus 4.8 — live via ArkLin (OEM + AKL)

Use claude-opus-latest or AKL-claude-opus-4-8. Save cost with AKL-*-rsc. Hermes dropdown "Opus 4 7" may fail — see docs.

2026-05-14

New Model

DeepSeek V4 Flash added — 344% weekly surge

Now routing via nexus/cheapest. $0.14/M input, 1M context.

2026-05-08

Feature

Human-in-the-loop tool calls now supported

Pause agentic workflows mid-execution and inject human decisions via the API.

2026-04-29

Performance

nexus/cheapest-west routing now 40% faster with edge caching

Prompt-level cache hits on repeated context now reduce latency by ~40% at no extra cost.

Access every top AI. Right now.

Create Free Account Read Docs