Blog

Synthetic data, human judgement: why the training flywheel needs both

Synthetic data accelerates model training but introduces the risk of behavioral drift. Discover how the Modelsmith flywheel uses failure traces as seeds to synthesize high-precision training sets while maintaining quality through human-governed promotion gates.

Ammar Doosh5 May 20264 min read

The promise of synthetic data is infinite scale. For AI operators and ML engineers, the ability to generate millions of high-quality training tokens without manual labeling is the ultimate lever. However, synthetic data remains a double-edged sword. Without a grounding mechanism, models trained on purely synthetic distributions eventually drift away from real-world utility, creating a feedback loop of increasing hallucination and decreasing precision.

At Agentsia, we solve this through the Modelsmith flywheel: a closed-loop system that combines the speed of the Modelsmith Generator with the definitive anchor of human judgement.

Dataset synthesis speed

>10x

The Modelsmith Generator produces high-precision training sets an order of magnitude faster than manual curation.

The Forge: Synthesis from failure

The Modelsmith workflow begins at the "Forge." We do not believe in generating synthetic data from thin air. Instead, the Modelsmith Generator uses real-world failure traces (the p99 cases where your production models regressed or failed) as the initial seed.

By focusing on these edge cases, the Generator synthesizes high-precision training sets that are mathematically targeted at the model's specific weaknesses. This is a surgical approach to post-training. We refine the weights where they are most brittle rather than simply increasing data volume.

The 13-phase state machine

The transition from a failure trace to a promoted specialist follows a rigorous 13-phase state machine. This cycle handles everything:

Signal ingestion and deduplication.
Synthetic expansion of the failure mode.
Automated rubric generation.
Multi-stage evaluation.
Final promotion evidence bundling.

Each phase is governed by the orchestrator, ensuring that every adapter update is traceable back to the specific production trace that triggered it.

Smith at the Forge, hammering out the edge cases identified in production traces.

The Flywheel: Closing the loop

The flywheel is the momentum that drives continuous improvement. When a model fails in production, that trace becomes the seed for the next iteration. The Generator expands that seed into a full training set, the state machine executes the training run, and the specialist is refined.

This loop must be autonomous to be fast, but it must be governed to be safe. Synthetic data alone leads to drift because the model begins to optimize for the Generator's biases rather than the user's requirements.

The Anchor: Human judgement at the gates

Human-in-the-Loop (HITL) is the essential component that prevents the flywheel from spinning out of control. In the Modelsmith architecture, humans do not do the heavy lifting of data generation or training. Instead, they act as the "Anchor" at the promotion gates.

Every significant model update generates an evidence bundle. This bundle contains the original failure trace, the synthesized dataset lineage, and the resulting evaluation scores. An operator reviews this evidence to ensure the specialist has truly improved without regressing on core capabilities.

Smith inspecting the final blade before it is promoted to production.

Implementation: The Scenarios tab

The primary interface for this judgement is the Scenarios tab within the Owner Dashboard. This is where operators inspect the p99 cases and approve the rubrics used by the Generator. By governing the scenarios, you govern the model.

When an operator approves a scenario in the dashboard, they are effectively setting the ground truth for the next turn of the flywheel. This ensures the synthetic data remains aligned with human intent and business policy.

Scaling specialist quality

The goal of the training flywheel is to move beyond general-purpose models. We are building fleets of private specialists, each honed for a specific, high-value workflow. By combining synthetic scale with human oversight, we achieve a level of specialist quality that neither humans nor machines could reach alone.

The Modelsmith Generator handles the volume. The human handles the value. Together, they turn the forge.

Ready to start your own training flywheel? Explore the Modelsmith dashboard or apply to our design-partner programme.