Skip to main content

Introduction

Architecture Overview

How Modelsmith integrates with your stack as a specialisation control plane.

Modelsmith is the specialisation control plane that sits between your application logic and your compute substrate. It complements inference engines and agent frameworks rather than replacing them: inference engines run weights, agent frameworks orchestrate calls, and Modelsmith governs the specialist-model lifecycle those layers consume.

The three-layer stack

  1. Application layer: Domain-specific product logic and agentic workflows. This layer consumes promoted specialist models.
  2. Control plane: The governed Modelsmith layer. It manages benchmark authoring, iteration state, evidence bundles, promotion gates, lineage, and rollback posture.
  3. Substrate layer: The customer-controlled training and inference environment. This may be on-premise, private cloud, or another approved deployment target.
The Modelsmith operating boundary.

The specialisation loop

Modelsmith runs a governed loop to ensure every model change is measured and reviewable.

PhaseResponsibility
EvaluateScore the current model against the governed benchmark and held-out set.
DiagnoseIdentify failure clusters, regressions, and weak scenario coverage.
SpecialiseTrain against the targeted failure evidence inside the customer boundary.
AssessDecide whether to continue, halt, roll back, or stage a candidate.
PromoteAdvance only when score, evidence, rollback posture, and approvals satisfy the gate.

Fleet boundary

Operators define workloads as clusters: independent specialisation paths with their own benchmark target, model family, execution boundary, and promotion threshold.

The public surface should show capacity state, current phase, score movement, and blocked conditions. It should not expose raw host topology, private logs, or filesystem-level runtime details.

Functional separation

  • Evaluation harness: The public, reproducible layer for scoring scenarios against rubrics.
  • Modelsmith control plane: The commercial layer that acts on the evidence: iterate ledger, scenario expansion, promotion governance, rollback, and owner-safe evidence publication.