Governance

Human approval built into every promotion path.

Modelsmith does not self-approve model promotions. The promotion state machine encodes which transitions require a human judgement gate. The harness enforces those gates. No agent, script, or automated process can bypass them.

Promotion state machine

Six states. Three require a human.

The state machine is typed and enforced by the harness. Transitions that require human approval cannot be triggered programmatically. Automated transitions cannot be blocked by a human holding the queue; they advance when the criteria are met.

candidate

Candidate

A model iteration that has completed a training run and is queued for evaluation. Not yet scored.

No gate
eval-accepted

Eval accepted

Core composite, held-out composite, and regression count all cleared their thresholds in a single eval run.

Automated: eval scores vs locked thresholds
artifact-exported

Artifact exported

The adapter weights have been exported to the model registry in a format suitable for serving.

No gate
customer-deployed

Customer deployed

The model is live in a customer-facing serving environment. Not yet production-validated against live traffic.

Human: operator sign-off before deployment
production-validated

Production validated

The model has been validated against a live-traffic or held-out sample by the domain expert and the evidence bundle has been approved.

Human: domain expert + governance approver
deprecated

Deprecated

Superseded by a newer production-validated iteration or retired by the operator. Rollback procedure is attached to the evidence bundle.

Human: explicit operator action
Human approval required Automated transition

HITL gates

What each gate requires.

Each human-in-the-loop gate specifies who is required to review, what they are reviewing, and what evidence the bundle must contain for the review to be valid. The harness refuses to advance the state without a logged approval from each required party.

The two gates on the production-validated transition can be satisfied by the same person only if your governance policy permits it. Tier C licences typically require separation of duties.

Operator sign-off before deployment

artifact-exported → customer-deployed
Reviewer
Platform operator or team lead
What they review
Review the promotion record, eval scores, and model card. Confirm the adapter is the intended version. Acknowledge the rollback procedure.
Required evidence
promotion-record.jsoneval-transcripts/model-card.mdrollback-procedure.md

Domain expert validation

customer-deployed → production-validated
Reviewer
Domain expert with access to live traffic or held-out sample
What they review
Validate the model against a representative sample of real or held-out inputs. Confirm that outputs meet the domain standard. The expert's decision is logged with a timestamp and reason.
Required evidence
eval-transcripts/rubric.jsonapproval-log.json

Governance approver

customer-deployed → production-validated
Reviewer
Designated governance approver (separate from domain expert)
What they review
Review the full evidence bundle for completeness and policy compliance. Confirm that the evaluation was conducted against the locked rubric and that no threshold was modified after training began.
Required evidence
promotion-record.jsonrubric.jsonapproval-log.json

See the receipts

Read how Modelsmith-built specialists stack up against frontier APIs in your vertical.

Every Agentsia Labs benchmark is a real end-to-end run through Modelsmith: synthetic scenarios, eval harness, post-trained specialist, promotion record. Open methodology, published datasets, reproducible numbers.