Product Updates & Announcements
New models, features, and performance improvements — all in one place.
DeepSeek V4 Flash is now live — 344% weekly surge
DeepSeek's fastest and cheapest V4-series model is now available at $0.16/M input and $0.32/M output with a 1M context window. It's automatically included in the nexus/cheapest routing group and has become the #2 model by volume within days of launch.
View modelHuman-in-the-loop tool calls for agentic workflows
Pause agentic workflows mid-execution and inject human decisions via the API. Set `human_in_the_loop: true` in your tool definition and we'll hold the execution context until you respond — up to 10 minutes.
Read docsConsistent web search and fetch across all models
We've unified the web_search and web_fetch tools across every model in our catalog. Any model — including those without native tool support — can now use these via our middleware layer. No SDK changes needed.
nexus/cheapest-west routing is now 40% faster with prompt caching
Prompt-level cache hits on repeated context reduce latency by ~40% and are not charged. This is enabled automatically for all requests going through nexus/cheapest-west, nexus/cheapest, and nexus/best-reasoning routing groups.
Claude Opus 4.7 — now on ModelAPI with Batch API support
The latest flagship from Anthropic is available at $5.75/M input. Batch API is supported — submit large jobs at 50% off the regular price. Extended thinking mode is available via the `thinking` parameter.
View modelTeam spend caps and per-member API key scoping
Team owners can now set monthly spend caps per member and restrict each key to specific model groups. Finance exports now include per-member breakdowns, making invoice reconciliation easier.
Manage teamGPT-5.5 added — OpenAI's latest, at pass-through pricing
GPT-5.5 is now available at $5.75/M input and $34.50/M output — the same pricing as the OpenAI platform with just our 5% infrastructure fee applied at top-up time.
Zero-retention mode for enterprise and HIPAA workloads
Enterprise customers can now opt into zero-retention mode: prompts and completions are never logged, cached, or used for any downstream purpose. Available with a signed DPA.
Learn moreBangkok (TH) edge node online — Southeast Asia P50 latency down 45%
A new PoP in Bangkok serves Southeast Asian traffic. Users in Thailand, Vietnam, Indonesia, Malaysia, and Singapore now get requests routed locally, reducing P50 latency from 180ms to ~98ms.
BYOK now at $0.50/M flat — unchanged as models get more expensive
Bring-Your-Own-Key routing is $0.50/M tokens processed — a flat fee that doesn't change regardless of which provider your key is for. As frontier models like GPT-5.5 Pro raise prices, your BYOK cost stays fixed.
Streaming response reliability fix for Claude models
A bug causing occasional dropped chunks in streaming responses for Claude models under high load has been fixed. Affected <0.02% of requests but caused noticeable truncation in long outputs.