Roadmap · 05docs/plans/AGENTIC_PLATFORM_ROADMAP.md

From wedge to fleet. In five phases.

Narrow proof before horizontal expansion. Discipline over ambition. Compounding moats over static ones.

We do not build a broad platform before proving a single wedge. That is the commonest failure mode of enterprise AI companies. The sequence below is the one we actually follow. Status is live; updated per merged pull request.

Prove one wedge first

We do not promise you a fleet of specialists. We prove the loop on one narrow, strategic use case.

Compounding over static

When forced to choose, we favour work that compounds — failure analysis, safety-net accumulation, fleet intelligence — over static sophistication.

The moat lives above the substrate

We do not try to out-Groq Groq. We sit above the substrate where infrastructure improvements benefit us without commoditising us.

/ phases

Phase 0

Proof discipline

Complete

Goal

Convert promising results into evidence you can trust.

Deliverables

›Canonical head-to-head benchmark suite for the chosen wedge
›Production-like latency and cost benchmarks on the target serving substrate
›Rubric review for leakage, overfitting, and narrow-task distortion
›Explicit baseline set: frontier flagship, frontier lightweight, current internal
›Competitive benchmark framing vs Groq, Fireworks, Cerebras
›Explicit build-vs-buy framing

Success measures

✓You can explain exactly why the specialist wins
✓Benchmark results reproducible by someone other than the author
✓Clear separation between interesting experiment and validated signal

Phase 1

Wedge selection and data readiness

Complete

Goal

Choose one specialist that is commercially important and operationally feasible.

Deliverables

›Ranked shortlist of candidate specialists with explicit selection criteria
›Data-readiness review for the winning wedge
›Initial deployment plan in your product flow
›Substrate strategy: owned training, hybrid, or vendor-backed execution
›Primary agent workflow: what the coding agent sees, decides, triggers
›Internal build-vs-buy memo for the wedge
›Adtech domain knowledge base with authoritative sources

Success measures

✓One narrow wedge chosen and defended
✓Training and eval data sources identified and accessible
✓Product owner and technical owner aligned on the same use case
✓Adtech KB seeded with ≥500 documents from authoritative sources

Phase 2

First specialist to production

In progress

Goal

Ship your first specialist as a real product capability, not an isolated model demo.

Deliverables

›Retraining loop for the selected specialist
›Accepted eval and promotion gates
›Explicit promotion ladder from candidate to production-accepted
›Stable serving interface
›Monitoring for latency, failure modes, and quality drift
›Fallback path to existing model/runtime
›Substrate abstraction separating training and production inference
›Operating surface for your coding agents: resources, commands, policy boundaries
›Human-in-the-loop approval points for judgement gates

Success measures

✓Specialist beats the agreed frontier baseline on trusted evals
✓Latency low enough to unlock the target workflow
✓Specialist actually used in a bounded production or pilot environment
✓Promotion decisions explicit, auditable, tied to evidence artefacts

Phase 3

Commercial validation

Upcoming

Goal

Prove that the specialist creates real business value.

Deliverables

›A/B or shadow-mode measurement plan
›Feedback loop for your operators
›Commercial KPI mapping for the wedge
›Failure review process that turns misses into training data
›Dataset, rubric, and policy promotion process
›Value narrative legible to stakeholders, not just builders

Success measures

✓Measurable improvement in a business-relevant KPI
✓Acceptable operational burden
✓Feedback captured as structured evidence, not anecdote
✓Leadership confidence beyond research asset

Phase 4

Control plane

Upcoming

Goal

Turn your first specialist workflow into a repeatable platform capability.

Deliverables

›Typed orchestration for eval, training, promotion, rollback, recovery
›Accepted model lineage and evidence history
›Dataset and reward provenance
›Dashboards and machine-readable health surfaces
›Policy boundaries for autonomous operation
›Runtime abstraction separating control logic from substrate
›Stable MCP and agent-tool interfaces
›Coherent access modes: CLI, API, MCP, dashboard

Success measures

✓You can create another specialist without reinventing the operating model
✓Promotion is explicit, auditable, reversible
✓Key decisions no longer depend on undocumented human memory

Phase 5

Specialist fleet expansion

Upcoming

Goal

Expand from one proven wedge to a coordinated set of specialists.

Deliverables

›Expansion plan across adjacent specialist contexts
›Specialist registry and deterministic routing logic
›Shared interfaces for invocation, fallback, composition
›Domain-specific eval packs for each new specialist
›Partner/substrate strategy for external inference when useful

Success measures

✓At least two additional specialists reach the same proof standard
✓Specialists compose cleanly in larger agent workflows
✓Infrastructure remains manageable as fleet size grows

/ decision

Decision rule

How we choose what to build next.

When choosing between roadmap items, we prefer the one that most improves, in order:

01Trusted evidence that the moat is real
02Time to first production specialist
03Repeatability of retraining and promotion
04Institutionalisation of knowledge
05Expansion readiness for a specialist fleet

If a task makes the system more sophisticated but does not improve those five, it is probably not a priority.

Prioritise compounding moats over static moats.

— Roadmap principle 8

Start an engagement→Further reading