Governance
Evidence bundles
What a Modelsmith promotion evidence bundle should contain before a specialised model is approved for production.
An evidence bundle is the review packet attached to a candidate model. It gives domain owners, engineering owners, and compliance reviewers the same factual record before a promotion decision is made.
Required contents
Benchmark record
The bundle should name the benchmark version, scenario coverage, outcome-type balance, held-out set, and scoring threshold used for the run.
Score movement
The bundle should show the candidate score, previous accepted score, frontier baseline comparison, regression count, and the failure clusters that changed most during the run.
Operational measurements
The bundle should include latency, cost-per-decision, serving target, fleet capacity notes, and any blocked or degraded infrastructure state that could change the deployment decision.
Lineage
The bundle should identify the base model, specialised candidate, benchmark version, training method, and approval chain. It should be possible to explain how the candidate was produced without exposing raw private artefacts.
Rollback posture
The bundle should state the previous accepted model, rollback trigger, rollback owner, and the expected recovery path if the candidate underperforms after promotion.
Redaction boundary
Public and shared evidence should use summaries, metrics, scorecards, and review labels. Raw datasets, prompts, completions, logs, filesystem paths, hostnames, and opaque artefact identifiers stay inside the customer-controlled execution boundary.