Governance
Human approval built into every promotion path.
Modelsmith does not self-approve model promotions. The promotion state machine encodes which transitions require a human judgement gate. The harness enforces those gates. No agent, script, or automated process can bypass them.
Promotion state machine
Six states. Three require a human.
The state machine is typed and enforced by the harness. Transitions that require human approval cannot be triggered programmatically. Automated transitions cannot be blocked by a human holding the queue; they advance when the criteria are met.
Candidate
A model iteration that has completed a training run and is queued for evaluation. Not yet scored.
Eval accepted
Core composite, held-out composite, and regression count all cleared their thresholds in a single eval run.
Artifact exported
The adapter weights have been exported to the model registry in a format suitable for serving.
Customer deployed
The model is live in a customer-facing serving environment. Not yet production-validated against live traffic.
Production validated
The model has been validated against a live-traffic or held-out sample by the domain expert and the evidence bundle has been approved.
Deprecated
Superseded by a newer production-validated iteration or retired by the operator. Rollback procedure is attached to the evidence bundle.
HITL gates
What each gate requires.
Each human-in-the-loop gate specifies who is required to review, what they are reviewing, and what evidence the bundle must contain for the review to be valid. The harness refuses to advance the state without a logged approval from each required party.
The two gates on the production-validated transition can be satisfied by the same person only if your governance policy permits it. Tier C licences typically require separation of duties.
Operator sign-off before deployment
artifact-exported → customer-deployed- Reviewer
- Platform operator or team lead
- What they review
- Review the promotion record, eval scores, and model card. Confirm the adapter is the intended version. Acknowledge the rollback procedure.
- Required evidence
- promotion-record.jsoneval-transcripts/model-card.mdrollback-procedure.md
Domain expert validation
customer-deployed → production-validated- Reviewer
- Domain expert with access to live traffic or held-out sample
- What they review
- Validate the model against a representative sample of real or held-out inputs. Confirm that outputs meet the domain standard. The expert's decision is logged with a timestamp and reason.
- Required evidence
- eval-transcripts/rubric.jsonapproval-log.json
Governance approver
customer-deployed → production-validated- Reviewer
- Designated governance approver (separate from domain expert)
- What they review
- Review the full evidence bundle for completeness and policy compliance. Confirm that the evaluation was conducted against the locked rubric and that no threshold was modified after training began.
- Required evidence
- promotion-record.jsonrubric.jsonapproval-log.json
See the receipts
Read how Modelsmith-built specialists stack up against frontier APIs in your vertical.
Every Agentsia Labs benchmark is a real end-to-end run through Modelsmith: synthetic scenarios, eval harness, post-trained specialist, promotion record. Open methodology, published datasets, reproducible numbers.