Blog

Writing.

On domain-specialist language models, post-training, open-weights deployment, and the structural shift in enterprise AI away from frontier APIs.

7 May 20264 min read

From a health-nutrition agent to a specialisation platform: our pivot

The story of why we stopped building reclaimed.health and started building the Forge. When generalist models failed on clinical safety, we realized the problem wasn't the agent logic, but the lack of controlled model specialisation.

Read ›

6 May 20264 min read

Open weights or open source? Why the licence matters when you train on your own data

For enterprise teams, model weights are a core business asset. Understanding the distinction between Open Weights and Open Source is critical for maintaining sovereign control of your AI infrastructure.

Read ›

5 May 20264 min read

Synthetic data, human judgement: why the training flywheel needs both

Synthetic data accelerates model training but introduces the risk of behavioral drift. Discover how the Modelsmith flywheel uses failure traces as seeds to synthesize high-precision training sets while maintaining quality through human-governed promotion gates.

Read ›

4 May 20264 min read

When fine-tuned open-weights LLMs outperform frontier models on narrow workflows

Accuracy in narrow domains is about policy alignment and domain-specific knowledge, not parameter count. Here is how Qwen3-32B matched Claude Opus 4.6 on a 166-scenario adtech benchmark.

Read ›

3 May 20267 min read

Post-training at 3 a.m.: inside a closed-loop agent harness for open-weights models

Manual fine-tuning is a dead end for enterprise AI. We built Modelsmith as a closed-loop agent harness where evals, diagnosis, and post-training happen autonomously while humans stay at the gates.

Read ›

2 May 20263 min read

The 100 ms auction: why private SLMs are replacing frontier APIs in programmatic advertising

IAB OpenRTB allows 100 milliseconds end-to-end. A 200 ms frontier API round-trip is a non-starter. Here is why the adtech bid-path is the first major commercial surface for small language models.

Read ›

19 April 20266 min read

Fine-tuning vs RAG: combine them

The framing that a team must pick between fine-tuning an open-weights model and building a RAG pipeline is wrong. The two techniques do different jobs. Here is when you need each, and how they fit together in a real production workflow.

Read ›

19 April 20266 min read

Opening Agentsia Labs

Independent benchmarks for the commercial verticals no leaderboard tests. Why we are building a research surface, what it publishes, and the cadence we are promising.

Read ›

18 April 20266 min read

Why enterprise AI is moving from frontier LLMs to small language models

The benchmark story and the deployment story are moving in different directions. Here is why the models that matter in production are getting smaller.

Read ›

Subscribe via RSS.