On-device
Sub-4B specialists for mobile, edge, and privacy-first inference.
On-device inference means the model runs on the end-user's hardware: a mobile phone, a laptop, an embedded controller. The constraints are tight in memory, compute, and battery, but so is the value: no network call, no latency overhead, no data leaving the device. Modelsmith is designed to train and govern specialists in the 1B to 4B parameter range that fit these constraints without sacrificing domain capability.
Future candidate
On-device remains candidate scaffolding, adjacent to adtech gaming-SDK evidence but not a released vertical of its own.
Buyer constraint
The useful model must fit memory, battery, and local privacy constraints.
Claude, ChatGPT, and Gemini are strong remote baselines, but the product cannot depend on a network call or a per-use meter.
A sub-4B specialist could be trained for one narrow product workflow and exported to the customer-selected edge target once the edge-fit evidence path is active.
Edge deployment
User device
Signed specialist artefact runs locally.
Cloud dependency
The model must fit local memory, privacy, and battery constraints.
Evidence posture
Edge-fit candidate set
There is no released on-device leaderboard today. Future proof should measure task quality, model size, latency, memory, battery profile, and privacy boundary together.
Future candidate, not active evidence
Smith note
Smith is watching the evidence posture: adtech has the active public proof today. Candidate domains stay labelled as future work until their own rubrics, datasets, and results are ready.
Deployment topology
T2 is the default pattern for on-device.
- T1
Co-located training and inference
The same customer-controlled hardware can train the specialist and serve the bounded workflow.
- T2
Trained centrally, deployed to the edge
The signed artefact is trained centrally, then shipped to vehicle, mobile, embedded, or edge hardware.
Recommended for this vertical
- T3
Trained centrally, deployed to a commercial runtime
Designed for customer-hardware training with a signed specialist served through a managed inference vendor when the customer does not want to run its own inference layer.
Why a specialist model
Three reasons frontier models do not fit.
- No data leaves the device
- For consumer applications handling sensitive personal data, on-device inference keeps the privacy boundary simple. There is no inference API call and no transmission to a remote model endpoint. The model runs entirely on hardware the user controls.
- No network dependency means no failure mode
- On-device models work offline, in low-connectivity environments, and without the latency of a round-trip. For mobile applications where experience depends on immediacy, this is a structural advantage over any cloud-backed approach.
- Small parameter counts require domain focus to be useful
- A 3B generalised model performs mediocrely across most tasks. A 3B specialist trained on a narrow domain is competitive with models several times its size on that domain. The hardware constraint forces the specialisation that makes on-device inference worthwhile.
Use cases
Concrete workflows, not a category claim.
Each use case below maps to a real workflow a design-partner team would bring to Modelsmith. The specialist model can be trained on your data, evaluated against your rubric, and promoted through your governance path where that operating loop is enabled.
On-device document assistant
A sub-4B specialist trained to answer questions about and summarise local documents. Runs entirely in the application sandbox with no network call. Data stays on the device. Nothing is sent to cloud storage or a third-party API.
Local content classification
Train a specialist to classify user-generated content against your application's taxonomy at input time. Enables real-time moderation and routing without sending content to a third-party API or incurring per-request cost.
Privacy-preserving personalisation
Adapt a small specialist to device-local behavioural signals to personalise recommendations without centralising user data. The personalisation path keeps user-level signals on the device. Only aggregated, anonymised signals are optionally shared.
Edge routing agent
A 1B to 2B specialist that classifies user intent and routes requests to the appropriate on-device handler or cloud service. Reduces unnecessary cloud calls by handling the majority of requests locally.
Where it does not fit
A specialist is the wrong answer unless the workflow is bounded.
The strongest buyers know what they are trying to control: latency, data movement, auditability, model size, or edge deployment. If the work is broad, casual, or unconstrained, a frontier-lab model is usually the simpler answer.
A general-purpose mobile assistant expected to know every topic.
A product where cloud inference is acceptable and cheaper to operate.
An edge deployment without a bounded workflow and acceptance rubric.
Get started
Bring a on-device workflow to a future design-partner cohort.
Book a discovery call with your workflow in mind. We will scope whether a Synthetic POC or a later design-partner cohort is the right route.