On-device AI
for interactive toys
LLM, speech recognition & voice synthesis compiled to ONNX — running offline on edge hardware and consumer devices. Manufacturers pay nothing for inference.
Full inference pipeline.
Entirely offline.
Wake word to spoken response — everything runs on the device. No cloud, no API costs, no latency penalties.
ONNX-Native
Models compiled to ONNX IR with INT4/INT8 quantization. Single-file deployment with custom operator fusion for embedded targets.
Full Voice Pipeline
End-to-end ASR, LLM, TTS on-device. Wake word, noise suppression, and child-friendly voice synthesis. Real-time streaming.
Personality Engine
Character-consistent dialogue with emotional state tracking. Persistent memory stored on local flash across sessions.
OTA Model Updates
Delta-compressed updates over WiFi or BLE. Hot-swap weights without device restart. Staged rollouts with automatic fallback.
COPPA Compliant
On-device content filtering. No audio or conversation data leaves the device. Privacy by architecture, not just policy.
Analytics SDK
Privacy-preserving insights via federated aggregation. Session metrics, engagement scoring, and crash telemetry.
from neohumans import EdgeRuntime
# fully offline — no cloud, no cost
rt = EdgeRuntime(
format="onnx",
device="auto",
models=[
"neo-llm",
"neo-asr",
"neo-tts"
]
)
# listen → think → speak
audio = rt.listen()
reply = rt.think(audio)
rt.speak(reply)Your users' devices
become the compute
The ALIVE module connects to nearby iPhones, Android phones, or Macs over a local mesh. Model weights live on the consumer device. Inference happens there. The toy just talks.
iPhone / iPad
iOS App
Weights stored locally
Android
Android App
Weights stored locally
Mac
macOS App
Weights stored locally
ALIVE Module
Minimal compute
Mic + Speaker
Receives inference results
$0
Inference cost
per conversation
100%
Offline capable
no internet needed
3
Platforms
iOS, Android & macOS
Install the app
User downloads the NeoHumans companion app on their phone, tablet, or Mac.
Download weights
ONNX model weights are downloaded once to the device. Around 480 MB total.
Mesh connect
ALIVE module pairs over BLE or WiFi Direct. No internet needed.
Talk
Audio routes to the device for inference. Response streams back to the toy.
Small models.
Real conversations.
Each model ships as a single ONNX file — quantized and optimized for embedded deployment.
Language
neo-llm
Speech Recognition
neo-asr
Text to Speech
neo-tts
Total on-device footprint
~480 MB on disk
Peak RAM
~800 MB
E2E Latency
under 200ms
Runs on
real hardware
Validated across production edge platforms. Plus consumer devices via the companion app.
Qualcomm
QCS6490 / QCS4490
Hexagon NPU
~30 tok/s
NVIDIA Jetson
Orin Nano / NX
CUDA + TensorRT
~55 tok/s
Raspberry Pi
CM4 / Pi 5
ARM NEON SIMD
~10 tok/s
Custom Silicon
Purpose-built ASIC
Designed for toy-grade AI
TBD
+ Consumer Device Offload
When the toy lacks compute, it offloads inference to the nearest device running the NeoHumans app.
Zero inference cost.
Infinite conversations.
Every conversation runs on hardware your customers already own. No servers. No per-token billing. No scaling nightmares. Ship the toy.
$0
per conversation
per device, forever
0 ms
network latency
everything is local
0 data
sent to the cloud
COPPA by architecture
Ready to ship
smarter toys?
We work with toy manufacturers and hardware partners to bring on-device AI to production. Let's talk.