On-device

Sub-4B specialists for mobile, edge, and privacy-first inference.

On-device inference means the model runs on the end-user's hardware: a mobile phone, a laptop, an embedded controller. The constraints are tight in memory, compute, and battery, but so is the value: no network call, no latency overhead, no data leaving the device. Modelsmith trains and promotes specialists in the 1B to 4B parameter range that fit these constraints without sacrificing domain capability.

Why a specialist model

Three reasons frontier models do not fit.

No data leaves the device: For consumer applications handling sensitive personal data, on-device inference is the only architecture that makes a genuine privacy claim. There is no API call, no transmission, no third-party processor. The model runs entirely on hardware the user controls.
No network dependency means no failure mode: On-device models work offline, in low-connectivity environments, and without the latency of a round-trip. For mobile applications where experience depends on immediacy, this is a structural advantage over any cloud-backed approach.
Small parameter counts require domain focus to be useful: A 3B generalised model performs mediocrely across most tasks. A 3B specialist trained on a narrow domain is competitive with models several times its size on that domain. The hardware constraint forces the specialisation that makes on-device inference worthwhile.

Use cases

Concrete workflows, not a category claim.

Each use case below maps to a real workflow a design-partner team would bring to Modelsmith. The specialist model is trained on your data, evaluated against your rubric, and promoted through your governance gate.

On-device document assistant
A sub-4B specialist trained to answer questions about and summarise local documents. Runs entirely in the application sandbox with no network call. Data stays on the device; nothing is sent to cloud storage or a third-party API.
Local content classification
Train a specialist to classify user-generated content against your application's taxonomy at input time. Enables real-time moderation and routing without sending content to a third-party API or incurring per-request cost.
Privacy-preserving personalisation
Fine-tune a small specialist on device-local behavioural signals to personalise recommendations without centralising user data. The personalisation model never leaves the device; only aggregated, anonymised signals are optionally shared.
Edge routing agent
A 1B to 2B specialist that classifies user intent and routes requests to the appropriate on-device handler or cloud service. Reduces unnecessary cloud calls by handling the majority of requests locally.

Get started

Bring a on-device workflow to the design-partner cohort.

Apply to the design-partner programme with your workflow in mind. We will scope the Synthetic POC together, run a complete specialisation cycle on synthetic domain scenarios, and hand you a validated model with a full evidence bundle before any licence commitment.

Apply as a design partner Book a discovery call

Sub-4B specialists for mobile, edge, and privacy-first inference.

Three reasons frontier models do not fit.

Concrete workflows, not a category claim.

On-device document assistant

Local content classification

Privacy-preserving personalisation

Edge routing agent

Bring a on-device workflow to the design-partner cohort.