Blog

The 100 ms auction: why private SLMs are replacing frontier APIs in programmatic advertising

IAB OpenRTB allows 100 milliseconds end-to-end. A 200 ms frontier API round-trip is a non-starter. Here is why the adtech bid-path is the first major commercial surface for small language models.

Ammar Doosh2 May 20263 min read

The IAB OpenRTB specification is the clock that governs the internet's commercial layer. It allows exactly 100 milliseconds for a bid request to travel from an exchange, through a bidder, and back with a response. Inside that window, a Demand-Side Platform (DSP) has roughly 50 milliseconds to decide what a user is worth to a brand.

If you are building an AI-powered bidder, the arithmetic of general-purpose frontier APIs is a non-starter.

Typical frontier API p50 latency

~200 ms

Network round-trip plus inference. Inherently incompatible with real-time bidding.

The latency wall

A frontier model like Claude 3.5 Sonnet or GPT-4o is a miracle of general reasoning, but it is a architectural misfit for the bid-stream. Even with the fastest inference providers, the network round-trip alone often consumes the entire decision budget.

When you add the variance of a multi-tenant API — where a "cold" request or a noisy neighbour can spike latency to 500 ms — the result is a massive rate of bid timeouts. In adtech, a timeout is not just a slow experience; it is a lost opportunity and a waste of infrastructure cost.

Why SLMs fit the envelope

This is why the adtech bid-path is moving to Small Language Models (SLMs). A 1B to 3B parameter model, post-trained on proprietary bidstream data and domain-specific evals, can perform narrow reasoning tasks that general models skip.

Brand Safety in <10ms: Instead of generic policy checks, a specialist model trained on your brand safety labels can classify a URL or page context in single-digit milliseconds on consumer-grade GPUs.
Bid Shading with Nuance: Moving beyond linear regression to a model that understands the relationship between contextual signals and win-probability.
MFA Filtering: Detecting "Made-for-Advertising" sites by reasoning about layout, ad-density, and content-originality in real-time.

Owned infrastructure is the prerequisite

To hit the 100 ms target, you cannot leave the building. The model must run on the same local network (or the same machine) as the bidder. This requires owned or customer-controlled infrastructure.

Modelsmith is designed to facilitate this transition. By connecting private benchmarks to an automated post-training loop, we allow adtech teams to:

Evaluate: Prove that a 1B specialist meets the accuracy of a 70B baseline on your specific labels.
Train: Iterate on adapters inside your security boundary.
Deploy: Export a production-ready artifact to your inference substrate (vLLM, TensorRT-LLM) without external API dependencies.

The future of the bid-stream is not a single "god model" in the cloud. It is a fleet of governed, private specialists running exactly where the decision happens.

If you are an adtech operator facing the 100 ms wall, apply to the design-partner programme to start building your specialist fleet.