Skip to main content
All posts

Blog

From a health-nutrition agent to a specialisation platform: our pivot

The story of why we stopped building reclaimed.health and started building the Forge. When generalist models failed on clinical safety, we realized the problem wasn't the agent logic, but the lack of controlled model specialisation.

Ammar Doosh7 May 20265 min read

Before Agentsia, there was reclaimed.health.

Our mission was simple: build a vertical agent that could navigate the high-stakes world of clinical nutrition and chronic disease management. We built Harold, an agent designed to act as a deeply knowledgeable health companion. Harold was supposed to be the "perfect" agent, powered by the latest frontier models and a sophisticated RAG (Retrieval-Augmented Generation) pipeline.

We hit a wall that no amount of prompt engineering or vector database scaling could overcome.

Hallucination rate on statin interactions

42%

In early stress tests, generalist models consistently failed to maintain clinical accuracy on narrow pharmacological queries.

The Harold incident: when generalists fail

The turning point occurred during a benchmark test involving complex medication interactions. We asked Harold to reason about a patient on a specific high-dosage statin regimen who was considering a new metabolic supplement.

The frontier model we were using as the "brain" performed a general reasoning loop. It looked up the supplement. It looked up the statins. Then, it suffered a catastrophic failure in clinical judgment. In some runs, it was overly conservative, refusing to provide any utility. In others, it hallucinated a safety profile that contradicted established clinical guidelines.

We tried everything to fix it. We added more layers of "guardrail" prompts. We built a more complex "agentic" workflow with three different models checking each other. The result was a slower, more expensive system that still lacked the foundational "intuition" of a specialist.

The "Aha!" moment: logic vs. weights

We realized that we were trying to solve a weight problem with a logic solution.

Most people building with AI today are "wrapper" builders. They treat the model as a fixed, immutable black box and try to steer it using external code (loops, prompts, and tools). At reclaimed.health, we were the ultimate wrapper builders.

The problem was that the generalist models were fundamentally unsuited for the narrow, high-precision tasks of clinical safety. They were trained on the entire internet, which is full of conflicting health advice. No amount of "agent logic" can reliably override the deep-seated statistical biases of a model's weights.

Building the Forge (Modelsmith)

This realization led to the pivot. We stopped building the health app and started building the infrastructure required to create it. We stopped being the builder of the "app" and became the builder of the "Forge."

We called this platform Modelsmith.

Instead of building one agent for one vertical, we built a system that allows any team to take an open-weights model and "smith" it into a specialist. We moved the complexity out of the fragile agent loops and into the models themselves through automated post-training.

The transition from fragile wrappers to governed specialists.

Validation: 100 iterations of the adtech loop

To prove this platform approach worked, we looked for the most demanding environment possible: the adtech bid-stream.

In adtech, you have <100ms to make a decision. You cannot afford complex agentic loops or slow frontier APIs. You need a model that is fast, private, and exceptionally accurate on a very narrow task.

Using Modelsmith, we ran 100 iterations of an autonomous post-training loop for a brand-safety specialist. Each iteration used synthetic data and human judgment to refine the model's weights.

The results validated our pivot:

  1. Performance: A 1B parameter specialist outperformed a 70B generalist on specific brand-safety labels.
  2. Speed: The specialist ran in single-digit milliseconds locally, avoiding the "latency wall" of external APIs.
  3. Governance: We had an immutable evidence bundle for every weight change, something impossible with proprietary black-box models.

The world doesn't need more wrappers

The era of the thin "AI wrapper" is ending. The value in the AI stack is shifting from the code that calls the model to the process that governs the weights.

Agentsia exists to give builders the tools we wished we had when we were building Harold. We are no longer building a single health agent. We are building the platform that enables a thousand Harolds (and a thousand adtech bidders, and a thousand legal analysts) to exist, each with weights that are governed, private, and precise.

The pivot from reclaimed.health to Agentsia was difficult, but it was necessary. We learned that the "agent" is only as good as the weights that power it. It is time to stop fighting the generalists and start building the specialists.

Ammar Doosh, Founder & CEO, Agentsia