· Tinyml  · 4 min read

Building a Voice-Controlled Toy Car with TinyML

A voice-controlled toy car is a practical and engaging way to explore TinyML. By training a lightweight machine learning model to recognize specific commands and deploying it to a microcontroller, you can make a car respond to your voice without cloud services or external modules. This guide walks through the complete process from data collection to deployment, showing how to integrate AI, embedded hardware, and robotics into one project.

Introduction

TinyML makes it possible to embed artificial intelligence directly into microcontrollers, allowing devices to understand and respond to the world without an internet connection. One of the most exciting applications is voice control. By training a small machine learning model to recognize specific commands, you can make a toy car respond to spoken instructions like “go,” “stop,” or “reverse” — all running locally on a low-power microcontroller.

This project blends hardware, embedded programming, and AI, and offers a great opportunity to explore how keyword spotting works in constrained environments.

Concept Overview

The idea is simple: a microphone captures audio, the microcontroller processes it to extract features, and a TinyML model classifies the speech into one of several predefined commands. The microcontroller then drives the motors according to the recognized instruction.

Unlike using a pre-built voice recognition module, this approach allows you to choose the words, adapt to different languages, and fine-tune the system for your environment.

Hardware Requirements

A capable microcontroller is essential. The ESP32 is a strong choice, offering a dual-core processor, generous RAM for audio buffering, and I²S support for digital microphones. An INMP441 MEMS microphone provides clean audio input. A TB6612FNG motor driver efficiently controls the DC motors of a small 2WD or 4WD chassis. Power comes from a rechargeable Li-ion battery pack, with a voltage regulator if needed.

Having a stable and well-structured chassis makes the integration easier, so starting with a robot car kit is often a good idea.

Data Collection

Voice recognition begins with data. You will need multiple recordings for each command, such as “go,” “stop,” “left,” “right,” and “reverse.” Including background noise and unrelated speech samples improves robustness. Recording can be done on a computer with a decent microphone, then downsampled to 16 kHz mono WAV files.

For better accuracy, capture voices from different speakers and in different environments. The more varied your dataset, the better your model will perform in real-world conditions.

Model Training

Training takes place in Python using TensorFlow Lite for Microcontrollers as the target framework. The raw audio is split into short frames, typically 30–40 milliseconds, and transformed into Mel-frequency cepstral coefficients (MFCCs), which are a compact representation of the speech features.

A small convolutional neural network (CNN) or depthwise-separable CNN (DS-CNN) is well-suited for this task. After training, the model should be quantized to INT8 to reduce memory usage and improve inference speed. The result is a .tflite file small enough to run on an ESP32 without exceeding RAM limits.

Model Deployment

Once the model is trained and quantized, it must be converted into a C array so it can be embedded directly in the firmware. The Arduino IDE or PlatformIO can be used to integrate the TensorFlow Lite for Microcontrollers library along with the model file.

The firmware continuously samples audio from the I²S microphone, performs feature extraction, and feeds the MFCCs to the model. The predicted label is then mapped to motor control actions, so that “go” moves the car forward, “left” turns it, and “stop” halts all motion.

Integration and Testing

It’s important to verify each component before combining them. Start by testing motor control manually to ensure the driver and motors are wired correctly. Next, verify the microphone input by printing the audio waveform or MFCC values to the serial monitor. Finally, run the TinyML model on live audio and check if the recognized words match your speech.

In early tests, work in a quiet environment to minimize false detections. Once the system works reliably indoors, test in noisier spaces and add more varied training data to improve performance.

Optimizations

Performance can be improved in several ways. Using a wake word, such as “car,” before issuing commands reduces accidental triggers. Adjusting the microphone gain helps balance sensitivity and noise resistance. Collecting extra data in the target environment allows the model to adapt to specific background sounds.

On the motor side, pulse-width modulation (PWM) can be added to control speed smoothly, and obstacle sensors can prevent collisions.

Power Management

Since the ESP32 will be continuously listening, it consumes more power than a sleeping microcontroller. Strategies like light sleep between audio processing windows or an external wake-up trigger can help extend battery life. Choosing an efficient motor driver also reduces overall consumption.

Conclusion

A TinyML-powered voice-controlled toy car is a rewarding project that combines embedded systems, machine learning, and robotics into one hands-on build. Beyond the fun factor, it demonstrates how AI can run entirely on small devices without relying on the cloud, enabling responsive, private, and portable solutions.

By mastering the process — from data collection to deployment — you not only create an interactive toy but also gain skills that can be applied to a wide range of edge AI applications. This is the essence of TinyML: bringing intelligence directly to the devices that interact with the physical world.

  • tinyml
  • voice-controlled-car

Related articles

View All Articles »

TinyML vs. Dedicated Voice Recognition Modules for Embedded Projects

Voice control in embedded systems can be implemented in two main ways: running a custom TinyML model directly on your microcontroller or using a dedicated voice recognition module. Each approach offers distinct advantages in flexibility, performance, and development effort. This post explores how both methods work, compares their strengths and limitations, and helps you decide which is best for your next project.

Mastering the Training Phase in TinyML: Foundations for Embedded AI

TinyML is transforming the way AI interacts with everyday devices by enabling machine learning models to run directly on microcontrollers and other resource-constrained hardware. While much attention goes to deployment and inference, the training phase is where a model’s real capabilities are forged. Understanding the training process, from dataset preparation to optimization for embedded devices, is essential for building high-performance TinyML solutions.

TinyML Software Stacks Overview: Tools for Running AI on Microcontrollers

TinyML is bringing artificial intelligence to the smallest of devices — microcontrollers with only kilobytes of RAM and ultra-low power budgets. At the heart of this movement are specialized software stacks that bridge the gap between cloud-trained models and the realities of embedded hardware. From TensorFlow Lite for Microcontrollers and Edge Impulse to CMSIS-NN and Nordic Semiconductor’s newly acquired Neuton platform, these tools provide the optimization, runtime efficiency, and hardware integration needed to run AI at the edge. This post explores the leading TinyML stacks, how they work, and how developers can choose and combine them for maximum impact.