How are models promoted to production?

Modelsmith uses a six-state promotion path. Transitions between critical states (like eval-accepted to customer-deployed) require human approval based on technical evidence.

What is HITL in AI governance?

Human-in-the-Loop (HITL) ensures that commercial and safety judgement remains with human operators while routine evaluation and training are automated.

Governance

Human approval built into every promotion path.

Modelsmith does not self-approve model promotions. The promotion state machine encodes which transitions require a human judgement gate. The harness enforces those gates. No agent, script, or automated process can bypass them.

Promotion state machine

Six states. Three require a human.

The state machine is designed to be typed and enforced by the harness. Transitions that require human approval are intended to require sign-off rather than routine programmatic promotion. Automated transitions are intended to advance when the criteria are met rather than when someone manually releases a queue.

candidate

Candidate

A model iteration that has completed a training run and is queued for evaluation. Not yet scored.

No gate

eval-accepted

Eval accepted

Core composite, held-out composite, and regression count all cleared their thresholds in a single eval run.

Automated: eval scores vs locked thresholds

artifact-exported

Artifact exported

The adapter weights have been exported to the model registry in a format suitable for serving.

No gate

customer-deployed

Customer deployed

The model is live in a customer-facing serving environment. Not yet production-validated against live traffic.

Human: operator sign-off before deployment

production-validated

Production validated

The model has been validated against a live-traffic or held-out sample by the domain expert and the evidence bundle has been approved.

Human: domain expert + governance approver

deprecated

Deprecated

Superseded by a newer production-validated iteration or retired by the operator. Rollback procedure is attached to the evidence bundle.

Human: explicit operator action

● Human approval required○ Automated transition

HITL gates

What each gate requires.

Each human-in-the-loop gate specifies who is required to review, what they are reviewing, and what evidence the bundle must contain for the review to be valid. The harness refuses to advance the state without a logged approval from each required party.

The two gates on the production-validated transition can be satisfied by the same person only if your governance policy permits it. Tier C licences typically require separation of duties.

Request an evidence bundle exemplar

Operator sign-off before deployment

artifact-exported → customer-deployed

Reviewer: Platform operator or team lead
What they review: Review the promotion record, eval scores, and model card. Confirm the adapter is the intended version. Acknowledge the rollback procedure.
Required evidence: promotion-record.jsoneval-transcripts/model-card.mdrollback-procedure.md

Domain expert validation

customer-deployed → production-validated

Reviewer: Domain expert with access to live traffic or held-out sample
What they review: Validate the model against a representative sample of real or held-out inputs. Confirm that outputs meet the domain standard. The expert's decision is logged with a timestamp and reason.
Required evidence: eval-transcripts/rubric.jsonapproval-log.json

Governance approver

customer-deployed → production-validated

Reviewer: Designated governance approver (separate from domain expert)
What they review: Review the full evidence bundle for completeness and policy compliance. Confirm that the evaluation was conducted against the locked rubric and that no threshold was modified after training began.
Required evidence: promotion-record.jsonrubric.jsonapproval-log.json

See the receipts

Read how Modelsmith-built specialists stack up against frontier APIs in your vertical.

Every Agentsia Labs benchmark is a real end-to-end run through Modelsmith: synthetic scenarios, eval harness, post-trained specialist, promotion record. Open methodology, published datasets, reproducible numbers.

Book a discovery call Visit Agentsia Labs