Blog
Writing.
On domain-specialist language models, post-training, open-weights deployment, and the structural shift in enterprise AI away from frontier APIs.
From a health-nutrition agent to a specialisation platform: our pivot
The story of why we stopped building reclaimed.health and started building the Forge. When generalist models failed on clinical safety, we realized the problem wasn't the agent logic, but the lack of controlled model specialisation.
Read ›Open weights or open source? Why the licence matters when you train on your own data
For enterprise teams, model weights are a core business asset. Understanding the distinction between Open Weights and Open Source is critical for maintaining sovereign control of your AI infrastructure.
Read ›Synthetic data, human judgement: why the training flywheel needs both
Synthetic data accelerates model training but introduces the risk of behavioral drift. Discover how the Modelsmith flywheel uses failure traces as seeds to synthesize high-precision training sets while maintaining quality through human-governed promotion gates.
Read ›When fine-tuned open-weights LLMs outperform frontier models on narrow workflows
Accuracy in narrow domains is about policy alignment and domain-specific knowledge, not parameter count. Here is how Qwen3-32B matched Claude Opus 4.6 on a 166-scenario adtech benchmark.
Read ›Post-training at 3 a.m.: inside a closed-loop agent harness for open-weights models
Manual fine-tuning is a dead end for enterprise AI. We built Modelsmith as a closed-loop agent harness where evals, diagnosis, and post-training happen autonomously while humans stay at the gates.
Read ›The 100 ms auction: why private SLMs are replacing frontier APIs in programmatic advertising
IAB OpenRTB allows 100 milliseconds end-to-end. A 200 ms frontier API round-trip is a non-starter. Here is why the adtech bid-path is the first major commercial surface for small language models.
Read ›Fine-tuning vs RAG: combine them
The framing that a team must pick between fine-tuning an open-weights model and building a RAG pipeline is wrong. The two techniques do different jobs. Here is when you need each, and how they fit together in a real production workflow.
Read ›Opening Agentsia Labs
Independent benchmarks for the commercial verticals no leaderboard tests. Why we are building a research surface, what it publishes, and the cadence we are promising.
Read ›Why enterprise AI is moving from frontier LLMs to small language models
The benchmark story and the deployment story are moving in different directions. Here is why the models that matter in production are getting smaller.
Read ›Subscribe via RSS.