How to Do Speech Recognition with Arduino | Digi-Key Electronics

Подписаться 160 тыс.

Просмотров 51 тыс.

50% 1

Speech recognition is the process of using computers to recognize and understand human speech. Being able to understand full sentences or questions requires a lot of processing power, as it often relies on the complex algorithms found in natural language processing (NLP).
Most microcontrollers (and Arduino boards) cannot run NLP due to their limited resources. However, we can train a neural network to perform basic keyword spotting, which still has many uses (such as enabling a smart speaker by saying “Alexa” or shouting “stop” to halt a machine).
In this video, we will use Edge Impulse to train a neural network to identify and classify a few custom keywords. We will then deploy this trained model to an Arduino Nano 33 BLE Sense to perform keyword spotting in real time.
To begin, we collect samples of the keywords we wish to identify. These can be collected on any number of recording devices and then edited using Audacity to create 1-second snippets. We recommend collecting at least 50 samples to start.
After, we run a custom Python script that mixes the samples with random snippets of background noise and curates the custom keywords along with keywords found in the Google Speech Commands dataset.
You can download the Google Speech Commands dataset here: storage.cloud....
The dataset curation Python script can be found here: github.com/Sha...
From there, we upload our curated dataset to Edge Impulse. We use Edge Impulse as a tool to extract features from the audio samples, which are the Mel frequency cepstral coefficients (MFCCs). We then use it to train a neural network to identify our target keywords. Once done, we can test the model and download it as part of an Arduino library.
We load the library into Arduino and use it to perform inference in real time. The Arduino example code continually captures audio data, extracts features (computes MFCCs), and uses those MFCCs as inputs to the trained model. The model returns (what is essentially) the probabilities that it thinks it heard our target keywords.
We can compare those output values to thresholds to take action whenever it hears the desired keyword! To start, we’ll blink a simple LED (because who doesn’t love an overly complicated blinky program?).
Product Links:
Arduino Nano 33 BLE Sense - www.digikey.co...
Related Videos:
What is Edge AI?
• Intro to Edge AI: Mach...
Intro to TensorFlow Lite Part 1: Wake Word Feature Extraction
• Intro to TensorFlow Li...
Intro to TensorFlow Lite Part 2: Speech Recognition Model Training
• Intro to TensorFlow Li...
Intro to TensorFlow Lite Part 3: Speech Recognition on Raspberry Pi
• Intro to TensorFlow Li...
Getting Started with TensorFlow Lite for Microcontrollers
• TinyML: Getting Starte...
Related Project Links:
How to Use Embedded Machine Learning to Do Speech Recognition on Arduino - www.digikey.co...
Related Articles:
What is Edge AI? - www.digikey.co...
TensorFlow Lite Tutorial Part 1: Wake Word Feature Extraction - www.digikey.co...
TensorFlow Lite Tutorial Part 2: Speech Recognition Model Training - www.digikey.co...
TensorFlow Lite Tutorial Part 3: Speech Recognition on Raspberry Pi - www.digikey.co...
Getting Started with TensorFlow Lite for Microcontrollers -
www.digikey.co...

Опубликовано:

5 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 51

@VA7AYG 3 года назад

I have come to greatly appreciate Digi’s dedication to education, not to mention how amazing teacher Shawn is! Keep up the good work and Cheers

@ShawnHymel 3 года назад

Thank you!

@5_inchc594 3 года назад

I just found GOLD

@janjongboom7561 3 года назад

With so many uncertain in test set, lower the minimum confidence rating to 0.6 to get much better results.

@honamyim 3 года назад

It's very interesting to see the original founder's comment here.

@userou-ig1ze 3 года назад

thank you! Great and timely content, great speed and information content (at 2x)

@MuhammadDaudkhanTV100 3 года назад

Fantastic idea and cool content bro

@rohanmanchanda5250 2 года назад

It'd be nice if you could show how to run an audio classifying tflite model on an Arduino Nano / Raspberry Pi Pico *using an Analog Microphone* . There's no proper video that I could find on the web that does that or even resembles this concept remotely.

@eloquentarduino5988 2 года назад

Very detailed tutorial, good job!

@harrytsai0420 3 года назад

This is really a good content!!! Thanks very much

@adhamelrouby6445 2 года назад

Can this method explained in the video be used to recognize a specific sound rather than specific text, e.g., a clap sound?

@resatyigen3430 3 года назад

Thank you dude. Very cool tutorial. Please make STM32F4 speech recognition example.

@ShawnHymel 3 года назад

Here's the workshop I did where I show the process end-to-end: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-IRa_SH-3MSI.html. Granted it's an STM32L4, but the same principles should apply.

@bertbrecht7540 3 года назад

Hi Shawn, Thanks for the great tutorial. I dusted off my BLE, went through every one of your steps and got it all working in near flawless fashion, six hours later. I am now in a good place to start experimenting. My keyword was 'shut-down' (only 48 samples) which turns the LED on and 'go' turns the LED off. This could be a light bulb controller or a TV power switch. A lot of sounds get confused for 'go' so increasing the threshold for 'go' to 85% worked well. Perhaps its just my OS but I had to replace all your '\' with '/' in the curation script command line. Oddly, my feature extraction time took less that half of yours (123ms) but I had the same BLE Sense as you. I want to try to expanding the # of wake words. What do you think held you back: processor speed, RAM, Edge Impulse, data....? Looking forward to your next videos

@ShawnHymel 3 года назад

That's awesome you were able to get it working! Thank you for the heads up on the '/' vs '\' in the script--I thought I had made it OS agnostic, but I guess I forgot a few spots. I was able to get the BLE Sense to do 4 key words. It has to do with processing speed and RAM. The more classes you look for, the more speed/RAM is required. At some point, the DSP and inferencing will take longer than the allotted 333 ms, and you'll start overflowing the audio buffer :-/

@janjongboom7561 3 года назад

The Edge Impulse SDK has some improvements recently which sped up dsp code by 200% on this target, that explains!

@mri3884 3 года назад

Can I use the data_curation for my dataset and profit out of the model I develop?

@sumitmamoria 3 года назад

Nicely done.

@ercost60 3 года назад

Fantastic! TYVM for this vid.

@eonoire 3 года назад

I need some help. While setting things up in Anaconda I'm getting this error: dataset-curation.py: error: the following arguments are required: d I really don't know what this could be and I would really apreciate any help, thanks

@hamishgrant2802 3 года назад

Hi Thanks for the tutorial. I want to use the method with an ESP32, how can I make the program use the audio data coming via I2C ?

@johnabonilla1266 2 года назад

Did you ever figure out how to do this?

@hamishgrant2802 2 года назад

@@johnabonilla1266 I got speech recognition working with about an 80% success rate. There’s is a video in my channel and a link to the code in the description.

@ashwinis7513 3 года назад

Thank You So Much. I have added the file as per Edge Impulse still facing an issue with The filename or extension is too long. Error compiling for board Arduino Nano 33 BLE.

@sureshtiwari2158 2 года назад

I want to use this method with an ESP32, how can I make the program use the audio data coming via I2S

@akkutyagi16 3 года назад

Is this using tflite micro in backend?

@minhphuongnguyen8117 4 месяца назад

hey bro, this can use to esp32 ?

@brayanaquino4727 Год назад

Buen dia, hay una guia como esta pero usando solo raspeberry pi pico y los canales analogicos conectados a un microfono

@nikonissinen6772 3 года назад

I could of tell you a lot faster way to do the audio samples, but you already did do it so

@F4LL__ 3 года назад

Well...?

@MilSimVipers 3 года назад

why do i not have app data under my user :(

@imsteven3044 3 года назад

Hi! I want to convert the speech to text and then work this the text in Python would this module work for me for this?

@withIn40 3 года назад

Hi sir, can i use it to Voice Recognition Module? thanks

@withIn40 3 года назад

and please create a video about it that would be so helpful, thank you

@djtomoy Год назад

"hand me my patching trowel, boy!"

@topgearIQ 2 года назад

Uno work or not

@dhupee 2 года назад

"I've got 68, which should work for this prototype" Ckckck, I'm disappointed Shawn

@cokeforever 3 года назад

the entire concept of doing speech recognition localy is outdated and non-effective; you can use google api to simply pass the sound sample and get the recognition result back as a string; 2021... cloud computing... sas... hello)

@dannyash3805 3 года назад

It's not non-effective if you don't have access to the internet. If you're not interested in the topic of the video then watch a different video!

@cokeforever 3 года назад

@@dannyash3805 what a stupid thing to say... how do you think people discuss things and eliminate obsolete knowledge and synthesize new knowledge?! p.s. your point is quite funny in the age of IoT (your fridge has internet access) and if you make "sesame, open" switch for your home/cave/volcano - wouldnt it have wifi? will you only power your voice activated device in your tree-house using batteries?!

@dannyash3805 3 года назад

@@cokeforever What enlightenment! Evidently you have already eliminated the obsolete knowledge of DSP and convolutional neural networks so why don't you eliminate yourself from this comment thread and go be too smart somewhere else.

@cokeforever 3 года назад

@@dannyash3805 why? because some inefficient low skill troll tells me to? )) not gonna happen, buddy... but you are free to participate in civilized, argumented discussion ;)