Introduction
Architecture Overview
How Modelsmith integrates with your stack as a specialisation control plane.
Modelsmith is the specialisation control plane that sits between your application logic and your compute substrate. It complements inference engines and agent frameworks rather than replacing them: inference engines run weights, agent frameworks orchestrate calls, and Modelsmith governs the specialist-model lifecycle those layers consume.
The three-layer stack
- Application layer: Domain-specific product logic and agentic workflows. This layer consumes promoted specialist models.
- Control plane: The governed Modelsmith layer. It manages benchmark authoring, iteration state, evidence bundles, promotion gates, lineage, and rollback posture.
- Substrate layer: The customer-controlled training and inference environment. This may be on-premise, private cloud, or another approved deployment target.
The specialisation loop
Modelsmith runs a governed loop to ensure every model change is measured and reviewable.
| Phase | Responsibility |
|---|---|
| Evaluate | Score the current model against the governed benchmark and held-out set. |
| Diagnose | Identify failure clusters, regressions, and weak scenario coverage. |
| Specialise | Train against the targeted failure evidence inside the customer boundary. |
| Assess | Decide whether to continue, halt, roll back, or stage a candidate. |
| Promote | Advance only when score, evidence, rollback posture, and approvals satisfy the gate. |
Fleet boundary
Operators define workloads as clusters: independent specialisation paths with their own benchmark target, model family, execution boundary, and promotion threshold.
The public surface should show capacity state, current phase, score movement, and blocked conditions. It should not expose raw host topology, private logs, or filesystem-level runtime details.
Functional separation
- Evaluation harness: The public, reproducible layer for scoring scenarios against rubrics.
- Modelsmith control plane: The commercial layer that acts on the evidence: iterate ledger, scenario expansion, promotion governance, rollback, and owner-safe evidence publication.