Build XiaoZhi AI ESP32-S3 Voice Chatbot — Custom Wake Word, Face Animation & Full Tutorial 2026
ESP32-S3 · XiaoZhi AI Pro · 2026

Build Your Own
XiaoZhi AI Voice Chatbot
with Custom Wake Word

Build a fully featured AI voice assistant on ESP32-S3 — featuring animated OLED face expressions, custom voice wake word detection using the ESP-SR MultiNet model, NTP-synchronized time, OTA firmware updates, Wi-Fi diagnostics via MCP protocol, and remote reboot. No coding required. Flash from browser via ESP Web Tools in minutes. Supports Hindi, Hinglish, English and multilingual voice conversations powered by Qwen and DeepSeek LLMs on the xiaozhi.me platform.

~55 min build No coding needed Free firmware Voice wake word Face animation Chrome / Edge

What's New in XiaoZhi AI Pro

This is a complete upgrade over the original. Same core components — ESP32-S3 replaces the standard ESP32 for hardware wake word support — with seven major new capabilities spanning display, audio, and connectivity.

Face Animation Custom Wake Word NTP Clock Sync Wi-Fi Diagnostics MCP Remote Reboot MCP Hide/Show Face Toggle OTA Theme Flash Voice LED Control

Introduction — What is XiaoZhi AI?

Open-source voice-first AI on ESP32 — fully customizable

XiaoZhi AI (小智) is an open-source ESP32 firmware project (MIT license, 26K+ GitHub stars) that transforms an ESP32-S3 microcontroller into a cloud-connected AI voice assistant. It handles real-time streaming speech recognition (ASR), natural language processing via large language models like Qwen and DeepSeek, and expressive text-to-speech (TTS) output — all tied to the xiaozhi.me platform. Unlike closed ecosystems like Amazon Alexa or Google Assistant, XiaoZhi gives you complete control over your AI's personality, language, custom wake word, and behavior through an intuitive web console.

Version 2 of this tutorial upgrades to the ESP32-S3, which unlocks hardware-accelerated wake word detection through the Espressif ESP-SR MultiNet model — the standard ESP32 cannot run this. It also adds an animated OLED face with blinking and talking expressions, NTP time synchronization, voice-controlled face/text toggle, and two new MCP (Model Context Protocol) tools — live Wi-Fi diagnostics and remote device reboot. This is now a genuinely powerful, production-ready open source voice AI you can build at home for under $20 in components. Perfect for students, makers, hobbyists, and IoT enthusiasts.

What's New in XiaoZhi AI Pro

Major upgrades over the previous ESP32 version

Version 2 is not just a component swap — it brings eight significant new features that make the experience far more natural and capable. Here's what changed and why it matters.

Animated Face on OLED

The 128×64 display now shows expressive animated eye sprites — blinking, talking, and idle states — giving your AI a physical personality.

Custom Wake Word

No more button press to activate. Just say your chosen wake phrase — like "Hey Maxon" — and the AI wakes instantly. Powered by ESP32-S3's MultiNet model.

NTP Time Sync

The device synchronizes with internet time servers automatically. The AI now knows the correct time and date for every interaction.

Wi-Fi Diagnostics MCP

Ask "What's my Wi-Fi strength?" or "What's my IP address?" and the AI responds with live network details fetched via a custom MCP tool.

Remote Reboot MCP

Say "Reboot yourself" and the AI confirms, then triggers a firmware restart. No need to physically reach the device for a reset.

Face Toggle by Voice

Say "Hide your face and show text" or "Show your face only" — the display mode switches between face-only and text-only layouts on demand.

Voice-Activated LED Control

Say "Turn on the light" or "Set LED to blue" — control an external RGB LED strip or indicator via voice commands through a custom MCP GPIO tool.

OTA Theme Updates

Customize wake word, face animation sprites, and display themes wirelessly. Generate assets.bin in the console and flash over Wi-Fi — no USB cable required.

Features & Voice Commands

Everything your XiaoZhi AI V2 can do

Your XiaoZhi AI Pro responds to natural voice commands for nearly every function. Here is everything it can do, organized by capability.

Custom Wake Word

Say your phrase to activate — no button needed. Powered by MultiNet on ESP32-S3.

Animated Face

Expressive OLED animations: idle, talking, blinking states that respond to conversation.

Face Toggle by Voice

"Show only your face" / "Switch to text mode" — toggled with a voice command.

NTP Time Sync

"What time is it?" returns the accurate current time, synced automatically over internet.

Wi-Fi Diagnostics

"What's my IP?" / "How strong is my Wi-Fi?" — answered via the network MCP tool.

Remote Reboot

"Reboot yourself" — the AI confirms then triggers a firmware restart via MCP tool.

Volume Control

"Set volume to 50%" or "Louder" — adjusts output level mid-conversation.

Live Weather

"What's the weather in Mumbai?" fetches current conditions via MCP weather service.

Music Playback

"Play Hindi music" or "Play some jazz" — triggers MCP music service.

Conversation Memory

Short-term memory keeps context across turns for natural multi-step conversations.

Multilingual

Hindi, English, Hinglish — adapts to whichever language you speak in.

OTA Updates

Update firmware and wake word settings wirelessly via the XiaoZhi console. No USB needed.

Voice Command Quick Reference
"Set volume to 50%"Adjust audio output level mid-conversation
"Reboot yourself"Remotely restart the device firmware via MCP tool
"Turn on the light"Control RGB LED strip or GPIO devices by voice
"Show your face only"Toggle OLED between face animation and text mode
"What time is it?"Returns accurate time via NTP sync
"What's my IP?"Shows Wi-Fi diagnostics: SSID, signal strength, IP address
"Weather in Delhi"Fetches live conditions via MCP weather service
"Play some music"Triggers MCP music playback service
"Tell me a joke"Built-in humor, facts, and general knowledge responses
"Update your theme"OTA firmware and wake word updates via cloud console
Expandable via MCP (Model Context Protocol) XiaoZhi supports custom MCP tools, meaning you can add GPIO control, relay switching, sensor reading, and more — all triggerable by voice. The Wi-Fi diagnostics and reboot tools in this version are examples of custom MCP integrations.

Full Build Walkthrough — Watch & Build Along

Complete video guide from breadboard to talking AI assistant

Follow along with the complete video build guide. The video covers every step: component overview, breadboard wiring, firmware flashing via browser, Wi-Fi configuration, account setup, device pairing, AI personality configuration, and custom wake word setup on the ESP32-S3.

Video chapters 0:00 — Intro & Components · 2:30 — Breadboard Circuit Assembly · 7:15 — Flashing Firmware via Browser · 9:40 — Wi-Fi Configuration · 11:20 — Account Setup & Pairing · 13:00 — AI Personality Configuration · 16:30 — Custom Wake Word Setup · 19:15 — Testing & Demo

Required Components

Everything needed — all available online or locally

The component list is nearly identical to the previous version. The only change is swapping the standard ESP32 dev board for an ESP32-S3 dev board, which provides the hardware acceleration needed for on-device wake word detection.

ComponentDescriptionQtyNote
ESP32-S3 Dev Board 38-pin variant · Main controller1 V2 UPGRADE
INMP441 I2S Mic Digital I2S microphone · Captures voice1
MAX98357A Amplifier I2S DAC + amplifier module1
2W 4Ω Speaker Small speaker for audio output1
0.96" OLED (128×64) I2C display · Face animations + text1 NEW USE
Breadboard 400-tie For prototyping the circuit2
Jumper Wires M-M Male-to-male · Assorted colors~25
USB-C Data Cable Must support data, not charge-only1
Computer (Chrome/Edge) Required for Web Serial flashing1
Why ESP32-S3 specifically? The ESP32-S3 includes dedicated hardware for running the MultiNet speech model used by XiaoZhi for custom wake word detection. Standard ESP32 cannot run this model reliably.

Get the Ready-Made XiaoZhi AI S3 Kit

Don't want to source and assemble parts? We sell a fully assembled, 100% tested Version 2 kit with ESP32-S3. Power it on and start talking immediately.

Order on WhatsApp — 8535889926
100% Tested Ready to Use Fast Delivery Support Included Wake Word Pre-configured

Circuit Diagram

ESP32-S3 + INMP441 + MAX98357A + OLED + Speaker

The circuit is virtually identical to the previous version — the only hardware difference is using ESP32-S3 instead of the standard ESP32. All four modules connect the same way: I2S microphone, I2S amplifier, I2C OLED, and speaker.

Circuit Diagram — ESP32-S3 Version
XiaoZhi AI ESP32-S3 Circuit Diagram
Full circuit — ESP32-S3 + INMP441 microphone + MAX98357A amplifier + 0.96" OLED display

Refer to the diagram above for all pin assignments. The image contains the complete wiring reference for every component.

Breadboard Assembly Guide

Step-by-step hardware build — takes about 15–20 minutes

Refer to the assembled build photo below. Follow the circuit diagram for all wiring — the image shows the final layout with all modules connected.

Assembled Build Reference
XiaoZhi AI ESP32-S3 Breadboard Build
Ready to power on Once assembled, plug the ESP32-S3 into your computer using a USB-C data cable and proceed to firmware flashing.
1

Flash the XiaoZhi Firmware

Browser-based · No drivers · No IDE · ~2 minutes

The XiaoZhi firmware is flashed directly from your browser using ESP Web Tools. You need Google Chrome or Microsoft Edge on a desktop or laptop. Mobile browsers will not work.

Read this before clicking Flash Do not plug in your ESP32-S3 yet. Click flash first, note which COM ports are listed, then plug in while holding the BOOT button. The new port that appears is your device.
1
Click "Start Flashing" below — do not plug in USB yet

The port selection dialog opens. Note currently listed ports.

2
Hold BOOT button on ESP32-S3 and plug in USB-C

Keep holding BOOT while plugging in. A new COM port will appear.

3
Select the new COM port that just appeared

That is your ESP32-S3 in bootloader mode. Click Connect.

4
Click "Install" → check "Erase Device" → Next → Install

Erasing is strongly recommended for a clean flash.

5
Wait approximately 2 minutes — do not unplug

Progress bar shows flashing status. Once complete, release BOOT and click Done.

One-Click XiaoZhi Firmware Flasher
No software to install — flashes directly from your browser via Web Serial
Chrome / Edge No Drivers ~2 Minutes Erase & Flash
Web Serial not supported. Please use Google Chrome or Microsoft Edge on a desktop computer.
No new COM port appearing? Your USB cable may be charge-only. Try a different cable that explicitly supports data transfer. If it still doesn't appear, install CP2102 or CH340 drivers.
2

Connect to Your Wi-Fi Network

Via captive portal at 192.168.4.1 · 2.4 GHz networks only

After successful flashing, the ESP32-S3 boots and creates a temporary Wi-Fi hotspot named XiaoZhi-XXXX. Connect to this hotspot from your phone or computer, then follow these steps to configure your home Wi-Fi credentials through the device's built-in web portal.

1
Connect to "XiaoZhi-XXXX" hotspot

Open your phone or laptop Wi-Fi settings and connect to the network named XiaoZhi-XXXX. No password is required — this is the ESP32-S3's temporary access point.

2
Open 192.168.4.1 in your browser

Once connected, open a browser and go to 192.168.4.1. The ESP32-S3 configuration portal loads. You will see the main dashboard with device status and available tabs.

Configuration Portal — Main Dashboard
ESP32-S3 Configuration Portal
3
Go to the Advanced tab and select your timezone

Tap the Advanced tab in the top navigation. Find the Timezone dropdown and select your local timezone (e.g. Asia/Kolkata for India). This ensures the device reports accurate time in responses.

Advanced Tab — Timezone Selection
ESP32-S3 Timezone Configuration
4
Click Save Configuration, then switch to WiFi Config

After selecting your timezone, click the Save Configuration button. Once saved, switch to the WiFi Config tab to set up your home network connection.

WiFi Config Tab — Select Network & Enter Password
ESP32-S3 WiFi Configuration
5
Select your network, enter password, and click Connect

In the WiFi Config tab, click your home Wi-Fi network from the available list (2.4 GHz only). Type your Wi-Fi password in the password field and click the Connect button. The device will attempt to connect.

6
Wait for the success confirmation message

Once connected successfully, the portal displays a green success message. The ESP32-S3 restarts automatically and joins your home network. The OLED display will show a pairing code — do not disconnect power during this process.

Connection Successful
ESP32-S3 WiFi Connected Success
2.4 GHz networks only ESP32-S3 does not support 5 GHz Wi-Fi. If your router broadcasts both bands with the same name, temporarily connect a device to confirm your 2.4 GHz SSID.
3

Create Your XiaoZhi Account

Free account at xiaozhi.me · Google login recommended

Before you can pair your device, you need a free account on the XiaoZhi AI platform. This is where you manage your agent's personality, language, voice, and advanced settings.

1
Reconnect to your home Wi-Fi network

Switch your device back from the XiaoZhi hotspot to your regular network.

2
Open xiaozhi.me in your browser

Click "Console" in the navigation.

3
Sign up using Google

The fastest method. Your account is active immediately.

XiaoZhi.me — Homepage
XiaoZhi.me Homepage
4

Pair Your ESP32-S3 Device

Enter the 6-digit code scrolling on the OLED display

After creating your account and signing into the console, you'll see the Agents page. Now link your physical ESP32-S3 to your account using the pairing code displayed on the OLED.

1
Click "+ Add Device" in the Agents console

An input dialog appears for the pairing code.

2
Read the 6-digit code scrolling on your OLED

The code refreshes every 30 seconds. Type it quickly.

3
Enter the code and click Confirm

The device links to your account.

4
Accept the agreement and click "Start Using"

Select the Open Source (Free) tier to continue.

Add Device Dialog — Console
Add Device Dialog
Device paired successfully Your ESP32-S3 now appears as an agent card in the console. You can see it listed as Online with a green indicator.
5

Configure Your AI Agent's Personality

Set name, voice, language, system prompt, MCP tools

Click "Configure Role" on your device card. This opens the full configuration panel where you design your AI's identity — its name, voice, language, and behavioral instructions.

Configure Role — XiaoZhi Console
Configure Role Screenshot
What is "Role Introduction"? This is the system prompt — the core instruction set that defines who your AI is, how it behaves, what language it speaks, and what it knows. It's the AI's personality blueprint.

Settings reference — what each option does:

SettingWhat It ControlsRecommended
Assistant NameWhat the AI calls itself in greetingsAny name — e.g., "Maxon" or "Jarvis"
Dialogue LanguagePrimary language for voice outputSwitch to preferred language
Voice RoleTTS voice and accent selectionTry several and pick the best fit
Role IntroductionFull personality and behavior system promptUse the generator tool below
Memory TypeHow the AI retains conversation contextShort-term Memory
Language ModelAI engine powering responsesXiaozhi Lite (free, fast)
Voice Recognition SpeedSpeech-to-text processing speedNormal
Character Speech SpeedHow fast the AI talksNormal or slightly slower
Official Services (MCP)Built-in tools: Weather, Music, JokesEnable Weather, Music

Use the interactive system prompt generator below to craft a detailed, personalized instruction set for your AI.

System Prompt Generator Build a detailed XiaoZhi role prompt — fully customizable
Quick Templates
1
Identity
2
Personality
3
Voice & Skills
About Your AI Assistant
About the User (optional)
generated_prompt.txt
0 / 2000 characters
6

Set Up Custom Wake Word

Say your own phrase to wake the AI — no button press needed

This is the most powerful upgrade in Version 2. Instead of pressing the BOOT button every time, you can wake your AI by saying a custom phrase — like "Hey Maxon". The ESP32-S3 runs the MultiNet model locally to detect your phrase.

The setup is done entirely through the XiaoZhi web console — no code, no flashing, just a few clicks. Here is the complete flow:

1
Manage Devices

Click this on your agent card in the console

2
Customize

Visible only when device is Online

3
Custom Wake Word

Select this tab in Theme Design

4
Generate assets.bin

Flash to device over Wi-Fi

Detailed step-by-step:

01
Click "Manage Devices" on your agent card

In the Agents section of the console, find your device card and click the "Manage Devices" button. This opens the device management panel.

02
Click "Customize" — only visible when device is Online

On the device entry, you'll see a "Customize" button next to Theme Settings. This button only appears when your ESP32-S3 is powered on and connected to Wi-Fi.

Manage Devices → Customize Button
Manage Devices and Customize button in XiaoZhi console
Click "Manage Devices" then "Customize" to open the Xiaozhi AI Customization tool
03
Step 1 — Chip & Screen Configuration loads automatically

The Xiaozhi AI Customization tool opens. If your device is connected and active, it auto-detects your chip model (ESP32-S3) and screen resolution (128×64). Click Next.

Step 1 — Chip Configuration Auto-detected
Chip model auto-detected: ESP32-S3, 128x64
Device configuration auto-loads: ESP32-S3, Screen 128×64px, RGB565 color format
Auto-detection not working? If the chip is not detected automatically, expand "Manual Configuration" and set Chip Model to ESP32-S3 and screen dimensions to 128×64 manually.
04
Step 2 — Theme Design: Click "Custom Wake Word"

The Theme Design page opens with four tabs: Wake Word Config, Font Config, Emoji Collection, Chat Background. Click the "Custom Wake Word" button.

Step 2 — Select Custom Wake Word
Theme Design page: Custom Wake Word button selected
Theme Design → Wake Word Config → Click "Custom Wake Word"
05
Enter your Wake Word Name and Wake Command

The Custom Wake Word Settings section expands. Fill in two fields:
Wake Word Name — a label for this wake word (e.g., Maxon)
Wake Command — the exact phrase you will speak (e.g., Hey Maxon)
You can name it anything you want. Keep the phrase 2–3 syllables for best recognition.

Custom Wake Word Settings
Custom wake word name: Maxon, wake command: Hey Maxon
Wake Word Name: Maxon · Wake Command: Hey Maxon · Recognition Model: MultiNet6 (English)
06
Select Recognition Model — choose English if not from China

In the "Select Recognition Model" dropdown, choose MultiNet6 (English) for English wake commands, or MultiNet6 (Chinese) for Chinese commands. The sensitivity threshold default of 20 is fine — lower value means more sensitive.

Select Recognition Model
MultiNet6 English selected in dropdown
Select MultiNet6 (English) for English wake words — available only on ESP32-S3
07
Click Next → Step 3 Preview → Click "Generate assets.bin"

You arrive at the Step 3 Preview page. The device preview shows a 128×64 OLED simulation. The Configuration Summary confirms your wake word setting. Click the green "Generate assets.bin" button.

Step 3 — Preview & Generate
Preview page showing Generate assets.bin button
Preview confirms wake word is "Maxon" — click Generate assets.bin to proceed
08
Confirm configuration → Click "Start Generate"

A confirmation dialog shows the full configuration summary: Chip Model ESP32-S3, Resolution 128×64, Wake Word Maxon, and the list of files to be included (index.json ~1KB, srmodels.bin ~1.2MB). Click "Start Generate".

Generate assets.bin — Confirmation Dialog
Generate assets.bin dialog showing configuration summary
Configuration summary confirms all settings — click "Start Generate" to build the binary
09
Wait for generation — then click "Flash to Device Online"

The assets.bin file generates in ~2 seconds (3.61 MB). When the success dialog appears with a green checkmark, make sure your ESP32-S3 is powered on and online, then click the blue "Flash to Device Online" button.

assets.bin Ready — Flash to Device
assets.bin file ready, 3.61 MB, Flash to Device Online button highlighted
3.61 MB assets.bin generated in 2.2s — click "Flash to Device Online" to send via OTA
10
OTA flashing — device says "Updating the System"

The progress bar shows the OTA upload in real time. Your ESP32-S3 will speak "Updating the System" and the OLED may flash. Do not power off the device during this process.

OTA Flashing in Progress
OTA flashing progress bar at 50%
Flashing in progress — do not close the window or power off the device
11
Wait 1–2 minutes — device restarts and wake word is active

After flashing completes, the device reboots automatically. Once it reconnects to Wi-Fi, your custom wake word is active. Say your phrase to test it — the AI should respond immediately without pressing any button.

Wake word is active Say "Hey Maxon" (or your custom phrase) — the AI wakes and responds. No button press needed. The OLED face animation activates on wake.
Tips for best wake word recognition Use a 2–3 syllable phrase. Speak clearly at normal conversational volume about 1–2 meters from the microphone. If false triggers occur, increase the sensitivity threshold slightly (higher = less sensitive).

Frequently Asked Questions & Troubleshooting

Fix common issues with ESP32-S3, wake word, display, audio, and more
ESP32-S3 not detected / no COM port appearing

If your computer does not detect the ESP32-S3 when connected via USB, the most common cause is a charge-only cable. Use a USB cable that explicitly supports data transfer. If using a known-good data cable, install or update the CP2102 / CH340 / CH9102 USB-to-UART drivers. On Windows, open Device Manager and check if the port appears under "Ports (COM & LPT)" — if it shows as an unknown device, the driver is missing. Also try a different USB port, preferably USB 2.0. Hold the BOOT button while plugging in and while clicking "Connect" in the web flasher.

Wake word not responding / not detected

First verify the device is powered on and connected to Wi-Fi (the OLED should show the face animation or status). Ensure the INMP441 microphone is wired correctly: VDD→3.3V, GND→GND, SD→GPIO32, WS→GPIO25, SCK→GPIO26, L/R→GND on the mic module itself. Speak clearly at 30–100 cm distance. If using an ESP32 (non-S3), wake word is NOT supported — you must use ESP32-S3 for MultiNet hardware wake word detection. Try rebooting the device. If still not working, re-generate assets.bin with the wake word and flash it again via OTA. Check that the recognition model is set to "MultiNet6 (English)" for English commands in the XiaoZhi Customize tool.

OLED display is blank / no face animation

Check the I2C wiring: OLED VCC→3.3V, GND→GND, SCL→GPIO15, SDA→GPIO4. Ensure the OLED address matches the firmware default (0x3C for most SSD1306 displays). Try adjusting the I2C contrast or enable the OLED reset pin in the firmware configuration. If the display worked before but stopped, check for loose Dupont wires on the breadboard. For 128×32 OLEDs, make sure you selected the correct screen resolution in the Customize tool — 128×64 is the standard for this tutorial. If the OLED shows data but no face animation, the emoji assets may not have been flashed correctly — regenerate and re-flash assets.bin.

No audio / distorted sound from speaker

Verify the MAX98357A wiring: VIN→3.3V or 5V (check module specs), GND→GND, LRC→GPIO33, DIN→GPIO27, BCLK→GPIO14. The speaker must be connected to SPK+ and SPK− terminals — not to GND. For distorted audio, reduce the volume by saying "Set volume to 30%" or lower the gain in the audio codec configuration. Ensure the power supply can deliver at least 1A — a weak power source causes audio crackling. Keep audio signal wires (DIN, BCLK, LRC) away from power wires to reduce interference. If there is static noise, try adding a 100µF capacitor across the amplifier VIN and GND.

Device keeps restarting / boot loop

Continuous reboots are usually caused by insufficient power or memory overflow. Use a USB port that can deliver at least 1A — avoid USB hubs and front-panel ports. If using an ESP32 (budget version) without PSRAM, disconnect the OLED display to free memory. For ESP32-S3, ensure PSRAM is properly configured: the board must have at least 8MB PSRAM (N16R8 module). Check that the partition table matches your firmware version — v2 firmware requires v2 partition tables (8MB or 16MB). If you see "Brownout detector was triggered" in the serial monitor, the power supply is insufficient. Try reflashing the firmware with the "Erase Device" option checked.

Cannot connect to Wi-Fi / "Network Error" message

ESP32-S3 supports only 2.4 GHz Wi-Fi — it cannot connect to 5 GHz networks. If your router broadcasts both bands under the same SSID, the device may try to connect to the 5 GHz band and fail. Temporarily disable the 5 GHz band on your router, or create a separate 2.4 GHz SSID. Ensure the Wi-Fi password is correct (there is no show/hide toggle in the portal). If the captive portal at 192.168.4.1 does not load, disable mobile data on your phone and reconnect to the XiaoZhi-XXXX hotspot. For persistent issues, try rebooting the router and the device. If the error says "Network Error" or "Unable to Connect", the cloud server at xiaozhi.me may be temporarily unreachable — check your internet connection.

Flash fails / "overlap at address" error

This error occurs when the srmodels.bin file is too large for the allocated partition space. It typically happens when using a 4 MB flash chip with v2 firmware — v2 requires 8 MB or 16 MB flash. If you are using the v1 firmware branch, make sure you selected the correct partition table (v1/4m.csv for 4 MB flash). For ESP32-S3 boards with 2 MB PSRAM (like Super Mini S3), disable the English Speech Commands Model in menuconfig and set PSRAM to Quad Mode. The simplest fix is to use the pre-compiled firmware from the GitHub releases page that matches your board type exactly.

Audio cuts off mid-sentence / TTS stops early

Audio interruptions during TTS playback are often network-related. The device uses streaming audio — if the Wi-Fi signal is weak or unstable, audio packets may arrive out of sequence or timeout. Move the ESP32-S3 closer to the router and ensure it is on a dedicated 2.4 GHz network. Check for MQTT audio packet sequence warnings in the serial monitor — these indicate packet loss. If using a custom MCP tool that returns large responses, the TTS buffer may overflow; try keeping responses concise. On rare occasions, the cloud TTS engine may have a transient issue — try asking the same question again. If the problem persists, test on a different network to rule out ISP issues.

OLED shows "Connecting" forever / stuck on Wi-Fi

If the OLED remains stuck on the connecting screen, the device is unable to establish a Wi-Fi connection. Reset the Wi-Fi configuration by holding the BOOT button for 5 seconds — this clears the stored credentials and restarts the captive portal. Reconnect to the XiaoZhi-XXXX hotspot and re-enter your Wi-Fi details. Make sure your router is broadcasting on the 2.4 GHz band and is within range. If you changed your Wi-Fi password recently, the device still has the old password stored — use the BOOT long-press reset to clear it. For enterprise networks (WPA2-Enterprise), XiaoZhi does not support captive portal login — use a personal hotspot instead.

Still having issues? Check the official XiaoZhi documentation at xiaozhi.dev/docs for detailed troubleshooting guides. You can also open an issue on the GitHub repository (26K+ stars, 130+ contributors) or join the community Discord for live help.

Final Setup — Activate Your AI

Save settings and bring your voice assistant to life

Your device is flashed, paired, and configured. Now bring everything together with this final activation sequence. Once completed, your XiaoZhi AI will respond to voice commands, display animated expressions, and be ready for daily use.

Save all settings in the XiaoZhi console

Click Save after configuring the role, personality, and MCP tools.

Hard reset the ESP32-S3

Press the physical RST or EN button on the board to apply all saved settings.

Wait for Wi-Fi and NTP sync

The OLED shows a connecting indicator. Once connected, the face animation appears.

Say your wake word

Speak your custom wake phrase — "Hey Maxon" or whatever you configured. The AI activates and is ready to listen.

Manual Activation (Backup)

Short press on BOOT also wakes the AI if you prefer not to use the wake word.

Auto-Sleep

After a few seconds of silence, the device sleeps to save power. Wake word or BOOT activates again.

Update Personality Anytime

Change role, voice, or language in the console at any time. Hit Save and hard reset to apply.

Change Wake Word Anytime

Repeat the Customize → Generate assets.bin → Flash process to use a different wake phrase.

Settings not applying after save? Always do a hard reset using the physical RST/EN button on the ESP32-S3 board after saving any configuration changes. A software reboot alone does not always apply new settings.

Your AI Voice Assistant is Ready

You've built, flashed, configured, and set up a custom wake word on a fully functional XiaoZhi AI V2. Say your phrase and start talking.

Save in Console Hard Reset Board Wait for Connection Speak Wake Word

Want It Pre-Built & Ready to Go?

Get a fully assembled, tested XiaoZhi AI S3 kit with wake word pre-configured. Power it on and start talking immediately.

Order on WhatsApp — +91 8535889926
Pre-Tested Fast Shipping Free Support Wake Word Configured
Full view

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top