Governance

Evidence bundles

What a Modelsmith promotion evidence bundle should contain before a specialised model is approved for production.

An evidence bundle is the review packet attached to a candidate model. It gives domain owners, engineering owners, and compliance reviewers the same factual record before a promotion decision is made.

Required contents

Benchmark record

The bundle should name the benchmark version, scenario coverage, outcome-type balance, held-out set, and scoring threshold used for the run.

Score movement

The bundle should show the candidate score, previous accepted score, frontier baseline comparison, regression count, and the failure clusters that changed most during the run.

Operational measurements

The bundle should include latency, cost-per-decision, serving target, fleet capacity notes, and any blocked or degraded infrastructure state that could change the deployment decision.

Lineage

The bundle should identify the base model, specialised candidate, benchmark version, training method, and approval chain. It should be possible to explain how the candidate was produced without exposing raw private artefacts.

Rollback posture

The bundle should state the previous accepted model, rollback trigger, rollback owner, and the expected recovery path if the candidate underperforms after promotion.

Redaction boundary

Public and shared evidence should use summaries, metrics, scorecards, and review labels. Raw datasets, prompts, completions, logs, filesystem paths, hostnames, and opaque artefact identifiers stay inside the customer-controlled execution boundary.