Introduction

Architecture Overview

How Modelsmith integrates with your stack as a specialisation control plane.

Modelsmith is the specialisation control plane that sits between your application logic and your compute substrate. It complements inference engines and agent frameworks rather than replacing them: inference engines run weights, agent frameworks orchestrate calls, and Modelsmith governs the specialist-model lifecycle those layers consume.

The three-layer stack

Application layer: Domain-specific product logic and agentic workflows. This layer consumes promoted specialist models.
Control plane: The governed Modelsmith layer. It manages benchmark authoring, iteration state, evidence bundles, promotion gates, lineage, and rollback posture.
Substrate layer: The customer-controlled training and inference environment. This may be on-premise, private cloud, or another approved deployment target.

The Modelsmith operating boundary.

The specialisation loop

Modelsmith runs a governed loop to ensure every model change is measured and reviewable.

Phase	Responsibility
Evaluate	Score the current model against the governed benchmark and held-out set.
Diagnose	Identify failure clusters, regressions, and weak scenario coverage.
Specialise	Train against the targeted failure evidence inside the customer boundary.
Assess	Decide whether to continue, halt, roll back, or stage a candidate.
Promote	Advance only when score, evidence, rollback posture, and approvals satisfy the gate.

Fleet boundary

Operators define workloads as clusters: independent specialisation paths with their own benchmark target, model family, execution boundary, and promotion threshold.

The public surface should show capacity state, current phase, score movement, and blocked conditions. It should not expose raw host topology, private logs, or filesystem-level runtime details.

Functional separation

Evaluation harness: The public, reproducible layer for scoring scenarios against rubrics.
Modelsmith control plane: The commercial layer that acts on the evidence: iterate ledger, scenario expansion, promotion governance, rollback, and owner-safe evidence publication.