How to Convert Speech to Text in Python

learn about python speech recognition

Introduction

Have you seen any movies in the Iron man series? If your answer is Yes, then you may have noticed that Tony Stark had a personal assistant named Jarvis. Jarvis is a computer program and it can recognize human voices. But how is this possible, right? At the end of this tutorial, you'll get your answer hopefully. So, be patient and keep reading the article.

Python offers a library called SpeechRecognition. It helps to convert speech to text using some engines or APIs. In this section, we will discuss how to convert speech to text in python

This topic is an example of speech recognition in python. In artificial intelligence, data science, machine learning, or deep learning it's a widely useful content. For instance, you may heard about Alexa, Siri, or "Hey Google".

It's a voice assistant device named Alexa, developed by Amazon. It can read human voices and answer their questions.

These tools are very intelligent in their field. We are not going to build such a tool here. But the reason I'm giving those examples is just because of the motive of learning this topic that we are going to learn in this tutorial.

🤫Visit Also: Send Messages Secretly using Python - The Art of Cryptography

1. Engine/API supports

  • Google Speech Recognition

All these engines work online except CPU Sphinx and Snowboy Hotword Detection. In this tutorial, we will use Google Speech Recognition because it's easy to use for beginner level and free somehow.

Except for Google Speech Recognition, other APIs need either subscription or limited use or authentication. 

Note: Google Speech Recognition uses the default API Key if the user doesn't enter a personal key.

2. Requirements

  • python 2.6, python 2.7, or Python 3.3+

  • PyAudio 0.2.11+ (For using the microphone)

  • FLAC encoder (If your system is x86 architecture based, otherwise it's not required)

3. Installation

Use pip3 if you are using Linux.

  • pip install SpeechRecognition

4. The First thing to do after installing the SpeechRecognition library

Run this command to check if the library is working or not. Do this at the very first step to make sure the microphone of your system can read your voice. I hope you'll enjoy this.

python -m speech_recognition

Output

A moment of silence, please...
Set minimum energy threshold to 2798.913063769431
Say something!
Got it! Now to recognize it...
You said hi how are you all
Say something!
Got it! Now to recognize it...
Oops! Didn't catch that
Say something!


The line written in the yellow space I said to the program by the microphone of my laptop. You try too.

5. Convert speech to text in Python

5.1. Look before

SpeechRecognition doesn't support mp3 format. I strongly suggest you use the WAV format to get the result satisfied.

5.2. Audio File

I have used the my-voice.wav file here. You can download it from my GitHub page(subhankar-rakshit). The link is here.

Code


import speech_recognition as sr

r = sr.Recognizer()

text = sr.AudioFile('my_voice.wav')

with text as source:
# For reduce noise
r.adjust_for_ambient_noise(source, duration=0.5)
audio = r.record(source)
text = r.recognize_google(audio)
print(text)

Output

converting an audio file to text using python speechrecognition library.

6. Capture a specific segment of a speech

You can convert a specific segment instead of the entire speech. You have to mention how long you want to capture, by passing the time length in seconds, to the duration parameter.

You can set an offset value to capture the speech from a specific time length.

Code


import speech_recognition as sr

r = sr.Recognizer()

text = sr.AudioFile('my_voice.wav')

with text as source:
# For reduce noise
r.adjust_for_ambient_noise(source, duration=0.5)
audio = r.record(source, offset=2, duration=2)
text = r.recognize_google(audio)
print(text)

Output

to do make sure it

7. Capturing speech from microphone

To work with a microphone You must install PyAudio in your system as described earlier. Run this command to install this library.

🔹pip install PyAudio

7.1. List of all mic

Maybe You're using a desktop or an external microphone device. Run the code below to check the list of microphones installed in your system.

Code


import speech_recognition as sr

mic = sr.Microphone()
print(sr.Microphone.list_microphone_names())

Output

['HDA Intel MID: 92HD81B1X5 Analog (hw:0,0)', 'HDA Intel MID: HDMI 0 (hw:0,3)', 'sysdefault', 'hdmi', 'samplerate', 'speexrate', 'pulse', 'upmix', 'vdownmix', 'default']

Here, the output of a list of all microphones in my system(I'm using a laptop here). We'll use the default mic of our system. You can choose another option by passing the device_index= parameter to the Microphone() class. I suggest you use the default mic for laptop users.

If You're using a desktop, you need to install an external microphone device to run this code.

Now capture your voice from the microphone of your system. The python program will read your voice and convert that into a text.

The Code


import speech_recognition as sr

r = sr.Recognizer()
mic = sr.Microphone()

with mic as source:
print("Listening...")
# adjust for ambient noise
r.adjust_for_ambient_noise(source)
audio = r.listen(source)

try:
text = r.recognize_google(audio)
print("You Said " + text)
except Exception as es:
print(f"Error due to {es}")

Output

Listening...
You Said hi how are you

🎥Visit AlsoCreate a Screen Recorder📽 using Python - Very Easy to Use

Conclusion

I hope you have got a basic idea of how Jarvis or a computer program can read human voices. Google Speech Recognition is good for developing small projects because it can be used for free somehow. 

But if you're  developing a real-world project using speech recognition then it could be best to use this feature authentically. For example, you can use a API key or by using a username and password, etc.

In this tutorial, you learned how to convert speech to text using python SpeechRecognition tool. Build your virtual assistant using this feature of python. I hope you got the idea. Please share your love and leave comments below.

Thanks for reading!💙

Subhankar Rakshit

Meet Subhankar Rakshit, a Computer Science postgraduate (M.Sc.) and the creator of PySeek. Subhankar is a programmer, specializes in Python language. With a several years of experience under his belt, he has developed a deep understanding of software development. He enjoys writing blogs on various topics related to Computer Science, Python Programming, and Software Development.

3 Comments

  1. How to Capturing speech from speaker instead of microphone?

    ReplyDelete
    Replies
    1. There are two ways, you can convert speech to text. First, by using the microphone; second, from a audio file. Did you mean the second one?

      Delete
Post a Comment
Previous Post Next Post