System · Accepted state
Roadmap · 05docs/plans/AGENTIC_PLATFORM_ROADMAP.md

From wedge to fleet. In five phases.

Narrow proof before horizontal expansion. Discipline over ambition. Compounding moats over static ones.

We do not build a broad platform before proving a single wedge. That is the commonest failure mode of enterprise AI companies. The sequence below is the one we actually follow. Status is live; updated per merged pull request.

01

Prove one wedge first

We do not promise you a fleet of specialists. We prove the loop on one narrow, strategic use case.

02

Compounding over static

When forced to choose, we favour work that compounds — failure analysis, safety-net accumulation, fleet intelligence — over static sophistication.

03

The moat lives above the substrate

We do not try to out-Groq Groq. We sit above the substrate where infrastructure improvements benefit us without commoditising us.

Phase 0

Proof discipline

Complete

Goal

Convert promising results into evidence you can trust.

Deliverables

  • Canonical head-to-head benchmark suite for the chosen wedge
  • Production-like latency and cost benchmarks on the target serving substrate
  • Rubric review for leakage, overfitting, and narrow-task distortion
  • Explicit baseline set: frontier flagship, frontier lightweight, current internal
  • Competitive benchmark framing vs Groq, Fireworks, Cerebras
  • Explicit build-vs-buy framing

Success measures

  • You can explain exactly why the specialist wins
  • Benchmark results reproducible by someone other than the author
  • Clear separation between interesting experiment and validated signal

Phase 1

Wedge selection and data readiness

Complete

Goal

Choose one specialist that is commercially important and operationally feasible.

Deliverables

  • Ranked shortlist of candidate specialists with explicit selection criteria
  • Data-readiness review for the winning wedge
  • Initial deployment plan in your product flow
  • Substrate strategy: owned training, hybrid, or vendor-backed execution
  • Primary agent workflow: what the coding agent sees, decides, triggers
  • Internal build-vs-buy memo for the wedge
  • Adtech domain knowledge base with authoritative sources

Success measures

  • One narrow wedge chosen and defended
  • Training and eval data sources identified and accessible
  • Product owner and technical owner aligned on the same use case
  • Adtech KB seeded with ≥500 documents from authoritative sources

Phase 2

First specialist to production

In progress

Goal

Ship your first specialist as a real product capability, not an isolated model demo.

Deliverables

  • Retraining loop for the selected specialist
  • Accepted eval and promotion gates
  • Explicit promotion ladder from candidate to production-accepted
  • Stable serving interface
  • Monitoring for latency, failure modes, and quality drift
  • Fallback path to existing model/runtime
  • Substrate abstraction separating training and production inference
  • Operating surface for your coding agents: resources, commands, policy boundaries
  • Human-in-the-loop approval points for judgement gates

Success measures

  • Specialist beats the agreed frontier baseline on trusted evals
  • Latency low enough to unlock the target workflow
  • Specialist actually used in a bounded production or pilot environment
  • Promotion decisions explicit, auditable, tied to evidence artefacts

Phase 3

Commercial validation

Upcoming

Goal

Prove that the specialist creates real business value.

Deliverables

  • A/B or shadow-mode measurement plan
  • Feedback loop for your operators
  • Commercial KPI mapping for the wedge
  • Failure review process that turns misses into training data
  • Dataset, rubric, and policy promotion process
  • Value narrative legible to stakeholders, not just builders

Success measures

  • Measurable improvement in a business-relevant KPI
  • Acceptable operational burden
  • Feedback captured as structured evidence, not anecdote
  • Leadership confidence beyond research asset

Phase 4

Control plane

Upcoming

Goal

Turn your first specialist workflow into a repeatable platform capability.

Deliverables

  • Typed orchestration for eval, training, promotion, rollback, recovery
  • Accepted model lineage and evidence history
  • Dataset and reward provenance
  • Dashboards and machine-readable health surfaces
  • Policy boundaries for autonomous operation
  • Runtime abstraction separating control logic from substrate
  • Stable MCP and agent-tool interfaces
  • Coherent access modes: CLI, API, MCP, dashboard

Success measures

  • You can create another specialist without reinventing the operating model
  • Promotion is explicit, auditable, reversible
  • Key decisions no longer depend on undocumented human memory

Phase 5

Specialist fleet expansion

Upcoming

Goal

Expand from one proven wedge to a coordinated set of specialists.

Deliverables

  • Expansion plan across adjacent specialist contexts
  • Specialist registry and deterministic routing logic
  • Shared interfaces for invocation, fallback, composition
  • Domain-specific eval packs for each new specialist
  • Partner/substrate strategy for external inference when useful

Success measures

  • At least two additional specialists reach the same proof standard
  • Specialists compose cleanly in larger agent workflows
  • Infrastructure remains manageable as fleet size grows

Decision rule

How we choose what to build next.

When choosing between roadmap items, we prefer the one that most improves, in order:

  1. 01Trusted evidence that the moat is real
  2. 02Time to first production specialist
  3. 03Repeatability of retraining and promotion
  4. 04Institutionalisation of knowledge
  5. 05Expansion readiness for a specialist fleet

If a task makes the system more sophisticated but does not improve those five, it is probably not a priority.

Prioritise compounding moats over static moats.
Roadmap principle 8