Product Updates & Announcements

New models, features, and performance improvements — all in one place.

Stay updated
Get notified when new models drop
May 14, 2026
New Model

DeepSeek V4 Flash is now live — 344% weekly surge

DeepSeek's fastest and cheapest V4-series model is now available at $0.16/M input and $0.32/M output with a 1M context window. It's automatically included in the nexus/cheapest routing group and has become the #2 model by volume within days of launch.

View model
May 8, 2026
Feature

Human-in-the-loop tool calls for agentic workflows

Pause agentic workflows mid-execution and inject human decisions via the API. Set `human_in_the_loop: true` in your tool definition and we'll hold the execution context until you respond — up to 10 minutes.

Read docs
May 7, 2026
Feature

Consistent web search and fetch across all models

We've unified the web_search and web_fetch tools across every model in our catalog. Any model — including those without native tool support — can now use these via our middleware layer. No SDK changes needed.

Apr 29, 2026
Performance

nexus/cheapest-west routing is now 40% faster with prompt caching

Prompt-level cache hits on repeated context reduce latency by ~40% and are not charged. This is enabled automatically for all requests going through nexus/cheapest-west, nexus/cheapest, and nexus/best-reasoning routing groups.

Apr 22, 2026
New Model

Claude Opus 4.7 — now on ModelAPI with Batch API support

The latest flagship from Anthropic is available at $5.75/M input. Batch API is supported — submit large jobs at 50% off the regular price. Extended thinking mode is available via the `thinking` parameter.

View model
Apr 15, 2026
Feature

Team spend caps and per-member API key scoping

Team owners can now set monthly spend caps per member and restrict each key to specific model groups. Finance exports now include per-member breakdowns, making invoice reconciliation easier.

Manage team
Apr 8, 2026
New Model

GPT-5.5 added — OpenAI's latest, at pass-through pricing

GPT-5.5 is now available at $5.75/M input and $34.50/M output — the same pricing as the OpenAI platform with just our 5% infrastructure fee applied at top-up time.

Mar 30, 2026
Security

Zero-retention mode for enterprise and HIPAA workloads

Enterprise customers can now opt into zero-retention mode: prompts and completions are never logged, cached, or used for any downstream purpose. Available with a signed DPA.

Learn more
Mar 20, 2026
Performance

Bangkok (TH) edge node online — Southeast Asia P50 latency down 45%

A new PoP in Bangkok serves Southeast Asian traffic. Users in Thailand, Vietnam, Indonesia, Malaysia, and Singapore now get requests routed locally, reducing P50 latency from 180ms to ~98ms.

Mar 10, 2026
Pricing

BYOK now at $0.50/M flat — unchanged as models get more expensive

Bring-Your-Own-Key routing is $0.50/M tokens processed — a flat fee that doesn't change regardless of which provider your key is for. As frontier models like GPT-5.5 Pro raise prices, your BYOK cost stays fixed.

Feb 28, 2026
Bug Fix

Streaming response reliability fix for Claude models

A bug causing occasional dropped chunks in streaming responses for Claude models under high load has been fixed. Affected <0.02% of requests but caused noticeable truncation in long outputs.