Edge AI · Zero cloud · Zero inference cost

On-device AI
for interactive toys

LLM, speech recognition & voice synthesis compiled to ONNX — running offline on edge hardware and consumer devices. Manufacturers pay nothing for inference.

How it works View models

Edge AI Platform

Full inference pipeline.
Entirely offline.

Wake word to spoken response — everything runs on the device. No cloud, no API costs, no latency penalties.

Runtime

ONNX-Native

Models compiled to ONNX IR with INT4/INT8 quantization. Single-file deployment with custom operator fusion for embedded targets.

Voice

Full Voice Pipeline

End-to-end ASR, LLM, TTS on-device. Wake word, noise suppression, and child-friendly voice synthesis. Real-time streaming.

Character

Personality Engine

Character-consistent dialogue with emotional state tracking. Persistent memory stored on local flash across sessions.

Deploy

OTA Model Updates

Delta-compressed updates over WiFi or BLE. Hot-swap weights without device restart. Staged rollouts with automatic fallback.

Safety

COPPA Compliant

On-device content filtering. No audio or conversation data leaves the device. Privacy by architecture, not just policy.

SDK

Analytics SDK

Privacy-preserving insights via federated aggregation. Session metrics, engagement scoring, and crash telemetry.

inference.py

from neohumans import EdgeRuntime

# fully offline — no cloud, no cost
rt = EdgeRuntime(
    format="onnx",
    device="auto",
    models=[
        "neo-llm",
        "neo-asr",
        "neo-tts"
    ]
)

# listen → think → speak
audio = rt.listen()
reply = rt.think(audio)
rt.speak(reply)

Device Offload

Your users' devices
become the compute

The ALIVE module connects to nearby iPhones, Android phones, or Macs over a local mesh. Model weights live on the consumer device. Inference happens there. The toy just talks.

iPhone / iPad

iOS App

Weights stored locally

Android

Android App

Weights stored locally

Mac

macOS App

Weights stored locally

BLE + WiFi Direct · local mesh

ALIVE Module

Minimal compute
Mic + Speaker

Receives inference results

Inference cost
per conversation

100%

Offline capable
no internet needed

Platforms
iOS, Android & macOS

Install the app

User downloads the NeoHumans companion app on their phone, tablet, or Mac.

Download weights

ONNX model weights are downloaded once to the device. Around 480 MB total.

Mesh connect

ALIVE module pairs over BLE or WiFi Direct. No internet needed.

Talk

Audio routes to the device for inference. Response streams back to the toy.

Models

Small models.
Real conversations.

Each model ships as a single ONNX file — quantized and optimized for embedded deployment.

Language

neo-llm

1.3B

ONNX size~400 MB

QuantizationINT4

Context4K tokens

Throughput~30 tok/s

Speech Recognition

neo-asr

82M

ONNX size~45 MB

QuantizationINT8

StreamingYes

Latency~50ms

Text to Speech

neo-tts

48M

ONNX size~35 MB

QuantizationINT8

Voices32 presets

First chunk~30ms

Total on-device footprint

~480 MB on disk

Peak RAM

~800 MB

E2E Latency

under 200ms

Hardware

Runs on
real hardware

Validated across production edge platforms. Plus consumer devices via the companion app.

Qualcomm

QCS6490 / QCS4490

Hexagon NPU

~30 tok/s

NVIDIA Jetson

Orin Nano / NX

CUDA + TensorRT

~55 tok/s

Raspberry Pi

CM4 / Pi 5

ARM NEON SIMD

~10 tok/s

Soon

Custom Silicon

Purpose-built ASIC

Designed for toy-grade AI

TBD

+ Consumer Device Offload

iPhoneiPadAndroidMacBookiMac

When the toy lacks compute, it offloads inference to the nearest device running the NeoHumans app.

For Manufacturers

Zero inference cost.
Infinite conversations.

Every conversation runs on hardware your customers already own. No servers. No per-token billing. No scaling nightmares. Ship the toy.

per conversation
per device, forever

0 ms

network latency
everything is local

0 data

sent to the cloud
COPPA by architecture

Let's build

Ready to ship
smarter toys?

We work with toy manufacturers and hardware partners to bring on-device AI to production. Let's talk.

Request a dev kit Partnership inquiry

On-device AIfor interactive toys

Full inference pipeline.Entirely offline.

ONNX-Native

Full Voice Pipeline

Personality Engine

OTA Model Updates

COPPA Compliant

Analytics SDK

Your users' devicesbecome the compute

Install the app

Download weights

Mesh connect

Talk

Small models.Real conversations.

neo-llm

neo-asr

neo-tts

Runs onreal hardware

Qualcomm

NVIDIA Jetson

Raspberry Pi

Custom Silicon

Zero inference cost.Infinite conversations.

Ready to shipsmarter toys?

On-device AI
for interactive toys

Full inference pipeline.
Entirely offline.

Your users' devices
become the compute

Small models.
Real conversations.

Runs on
real hardware

Zero inference cost.
Infinite conversations.

Ready to ship
smarter toys?