Introduction
Have you seen any movies in the Iron man series? If your answer is Yes, then you may have noticed that Tony Stark had a personal assistant named Jarvis. Jarvis is a computer program and it can recognize human voices. But how is this possible, right? At the end of this tutorial, you'll get your answer hopefully. So, be patient and keep reading the article.
Python offers a library called SpeechRecognition. It helps to convert speech to text using some engines or APIs. In this section, we will discuss how to convert speech to text in python.
This topic is an example of speech recognition in python. In artificial intelligence, data science, machine learning, or deep learning it's a widely useful content. For instance, you may heard about Alexa, Siri, or "Hey Google".
These tools are very intelligent in their field. We are not going to build such a tool here. But the reason I'm giving those examples is just because of the motive of learning this topic that we are going to learn in this tutorial.
🤫Visit Also: Send Messages Secretly using Python - The Art of Cryptography
1. Engine/API supports
- Google Speech Recognition
All these engines work online except CPU Sphinx and Snowboy Hotword Detection. In this tutorial, we will use Google Speech Recognition because it's easy to use for beginner level and free somehow.
Except for Google Speech Recognition, other APIs need either subscription or limited use or authentication.
Note: Google Speech Recognition uses the default API Key if the user doesn't enter a personal key.
2. Requirements
- python 2.6, python 2.7, or Python 3.3+
- PyAudio 0.2.11+ (For using the microphone)
- FLAC encoder (If your system is x86 architecture based, otherwise it's not required)
3. Installation
- pip install SpeechRecognition
4. The First thing to do after installing the SpeechRecognition library
Run this command to check if the library is working or not. Do this at the very first step to make sure the microphone of your system can read your voice. I hope you'll enjoy this.
Output
A moment of silence, please...
Set minimum energy threshold to 2798.913063769431
Say something!
Got it! Now to recognize it...
You said hi how are you all
Say something!
Got it! Now to recognize it...
Oops! Didn't catch that
Say something!
The line written in the yellow space I said to the program by the microphone of my laptop. You try too.
5. Convert speech to text in Python
5.1. Look before
SpeechRecognition doesn't support mp3 format. I strongly suggest you use the WAV format to get the result satisfied.
5.2. Audio File
Code
import speech_recognition as sr
r = sr.Recognizer()
text = sr.AudioFile('my_voice.wav')
with text as source:
# For reduce noise
r.adjust_for_ambient_noise(source, duration=0.5)
audio = r.record(source)
text = r.recognize_google(audio)
print(text)
Output
6. Capture a specific segment of a speech
You can convert a specific segment instead of the entire speech. You have to mention how long you want to capture, by passing the time length in seconds, to the duration parameter.
You can set an offset value to capture the speech from a specific time length.
Code
import speech_recognition as sr
r = sr.Recognizer()
text = sr.AudioFile('my_voice.wav')
with text as source:
# For reduce noise
r.adjust_for_ambient_noise(source, duration=0.5)
audio = r.record(source, offset=2, duration=2)
text = r.recognize_google(audio)
print(text)
Output
7. Capturing speech from microphone
To work with a microphone You must install PyAudio in your system as described earlier. Run this command to install this library.
🔹pip install PyAudio
7.1. List of all mic
Maybe You're using a desktop or an external microphone device. Run the code below to check the list of microphones installed in your system.
Code
import speech_recognition as sr
mic = sr.Microphone()
print(sr.Microphone.list_microphone_names())
Output
['HDA Intel MID: 92HD81B1X5 Analog (hw:0,0)', 'HDA Intel MID: HDMI 0 (hw:0,3)', 'sysdefault', 'hdmi', 'samplerate', 'speexrate', 'pulse', 'upmix', 'vdownmix', 'default']
Here, the output of a list of all microphones in my system(I'm using a laptop here). We'll use the default mic of our system. You can choose another option by passing the device_index= parameter to the Microphone() class. I suggest you use the default mic for laptop users.
If You're using a desktop, you need to install an external microphone device to run this code.
Now capture your voice from the microphone of your system. The python program will read your voice and convert that into a text.
The Code
import speech_recognition as sr
r = sr.Recognizer()
mic = sr.Microphone()
with mic as source:
print("Listening...")
# adjust for ambient noise
r.adjust_for_ambient_noise(source)
audio = r.listen(source)
try:
text = r.recognize_google(audio)
print("You Said " + text)
except Exception as es:
print(f"Error due to {es}")
Output
🎥Visit Also: Create a Screen Recorder📽 using Python - Very Easy to Use
Conclusion
I hope you have got a basic idea of how Jarvis or a computer program can read human voices. Google Speech Recognition is good for developing small projects because it can be used for free somehow.
But if you're developing a real-world project using speech recognition then it could be best to use this feature authentically. For example, you can use a API key or by using a username and password, etc.
In this tutorial, you learned how to convert speech to text using python SpeechRecognition tool. Build your virtual assistant using this feature of python. I hope you got the idea. Please share your love and leave comments below.
Thanks for reading!💙
nice post
ReplyDeleteHow to Capturing speech from speaker instead of microphone?
ReplyDeleteThere are two ways, you can convert speech to text. First, by using the microphone; second, from a audio file. Did you mean the second one?
Delete