python语音特征提取

In this post, I will show you how to extract speeches from a video recording file. After recognizing the speeches we will convert them into a text document. This will be a simple machine learning project, that will help you to understand some basics of the Google Speech Recognition library. Speech Recognition is a popular topic under machine learning concepts. Speech Recognition is getting used more in many fields. For example, the subtitles that we see on Netflix shows or YouTube videos are created mostly by machines using Artificial Intelligence. Other great examples of speech recognizers are personal voice assistants such as Google’s Home Mini, Amazon Alexa, Apple’s Siri.

在本文中，我将向您展示如何从视频录制文件中提取语音。识别语音后，我们会将其转换为文本文档。这将是一个简单的机器学习项目，它将帮助您了解Google语音识别库的一些基础知识。语音识别是机器学习概念下的热门话题。语音识别在许多领域得到越来越多的使用。例如，我们在Netflix节目或YouTube视频上看到的字幕主要是由使用人工智能的机器创建的。语音识别器的其他出色示例还包括个人语音助手，例如Google的Home Mini，亚马逊的Alexa，苹果的Siri。

目录： (Table of Contents:)

Getting Started
入门
Step 1: Import Libraries
步骤1：导入库
Step 2: Video to Audio Conversion
第2步：视频到音频的转换
Step 3: Speech Recognition
步骤3：语音识别
Final Step: Exporting Result
最后一步：导出结果

Photo by Alexandre Pellaes on Unsplash 亚历山大·佩莱斯 ( Alexandre Pellaes)在 Unsplash上摄

入门 (Getting Started)

As you can understand from the title, we will need a video recording for this project. It can even be a recording of yourself speaking to the camera. Using a library called MoviePy, we will extract the audio from the video recording. And in the next step, we will convert that audio file into text using Google’s speech recognition library. If you are ready, let’s get started by installing the libraries!

正如您从标题中所了解的那样，我们将需要此项目的视频记录。它甚至可以记录您自己对着摄像机讲话。使用名为MoviePy的库，我们将从录像中提取音频。下一步，我们将使用Google的语音识别库将该音频文件转换为文本。如果您准备好了，就让我们开始安装这些库！

图书馆 (Libraries)

We are going to use two libraries for this project:

我们将为该项目使用两个库：

Speech Recognition
语音识别
MoviePy
MoviePy

Before importing them to our project file, we have to install them. Installing a module library is very easy in python. You can even install a couple of libraries in one line of code. Write the following line in your terminal window:

在将它们导入我们的项目文件之前，我们必须先安装它们。在python中安装模块库非常容易。您甚至可以在一行代码中安装几个库。在您的终端窗口中写以下行：

pip install SpeechRecognition moviepy

Yes, that was it. SpeechRecognition module supports multiple recognition APIs, and Google Speech API is one of them. You can learn more about the module from here.

是的，就是这样。 SpeechRecognition模块支持多种识别API，而Google Speech API就是其中之一。您可以从此处了解有关该模块的更多信息。

MoviePy is a library that can read and write all the most common audio and video formats, including GIF. If you are having issues when installing moviepy library, try by installing ffmpeg. Ffmpeg is a leading multimedia framework, able to decode, encode, transcode, mux, demux, stream, filter and play pretty much anything that humans and machines have created.

MoviePy是一个库，可以读取和写入所有最常见的音频和视频格式，包括GIF。如果在安装moviepy库时遇到问题，请尝试安装ffmpeg。 Ffmpeg是领先的多媒体框架，能够解码，编码，转码，mux，demux，流，过滤和播放人类和机器创建的几乎所有内容。

Now, we should get to writing code in our code editor. We will start by importing the libraries.

现在，我们应该在代码编辑器中编写代码。我们将从导入库开始。

第1步-导入库 (Step 1 — Import Libraries)

import speech_recognition as sr import moviepy.editor as mp

Yes, that’s all we need to get the task done. Without losing any time let’s move to the next step.

是的，这就是我们完成任务所需要的。在不浪费时间的情况下，让我们继续下一步。

步骤2 —视频到音频转换 (Step 2 — Video to Audio Conversion)

In this step, we will something really cool, which is converting our video recording into an audio file. There are many video formats, some of them can be listed as:

在这一步中，我们将做一些非常酷的事情，即将视频记录转换为音频文件。视频格式很多，其中一些可以列出为：

MP4 (mp4, m4a, m4v, f4v, f4a, m4b, m4r, f4b, mov)
MP4(mp4，m4a，m4v，f4v，f4a，m4b，m4r，f4b，mov)
3GP (3gp, 3gp2, 3g2, 3gpp, 3gpp2)
3GP(3gp，3gp2、3g2、3gpp，3gpp2)
OGG (ogg, oga, ogv, ogx)
OGG(ogg，oga，ogv，ogx)
WMV (wmv, wma, asf*)
WMV(WMV，WMA，ASF *)

We should know our video’s format to do the conversion without any problem. Besides the video format, it’s also a good practice to know some audio formats. Here are some of them:

我们应该知道我们视频的格式可以毫无问题地进行转换。除了视频格式，了解某些音频格式也是一种好习惯。这里是其中的一些：

MP3
MP3
AAC
AAC
WMA
WMA
AC3 (Dolby Digital)
AC3(杜比数字)

Now, we have some idea about both formats. It’s time to do the conversion using MoviePy library. You will not believe how easy it is.

现在，我们对两种格式都有一些了解。现在该使用MoviePy库进行转换了。您不会相信这有多么容易。

clip = mp.VideoFileClip(r”video_recording.mov”) 
 
clip.audio.write_audiofile(r”converted.wav”)

I recommend converting it to wav format. It works great with the speech recognition library, which will be covered in the next step.

我建议将其转换为WAV格式。它与语音识别库配合使用非常好，下一步将对此进行介绍。

第3步-语音识别 (Step 3 — Speech Recognition)

First, let’s define the recognizer.

首先，让我们定义识别器。

r = sr.Recognizer()

Now let’s import the audio file that was created in the previous step (Step 2).

现在，让我们导入在上一步(步骤2)中创建的音频文件。

audio = sr.AudioFile("converted.wav")

Perfect! Here comes the best part, which is recognizing the speech in an audio file. The recognizer will try to understand the speech and convert it to a text format.

完善！这是最好的部分，它是识别音频文件中的语音。识别器将尝试理解语音并将其转换为文本格式。

with audio as source:
  audio_file = r.record(source)result = r.recognize_google(audio_file)

最后一步-导出结果 (Final Step — Exporting the Result)

Well done! The hard work is completed. In this step, we will just export the recognized speech into a text document. This will help you to store your work. I’ve also added a print(“ready!”) at the end of the code. So that we know when the file is ready and the work is completed.

做得好！艰苦的工作已经完成。在此步骤中，我们将仅将识别出的语音导出到文本文档中。这将帮助您存储您的工作。我还在代码的末尾添加了打印(“ ready！”) 。这样我们就知道文件何时准备就绪并且工作完成了。

# exporting the result 
with open('recognized.txt',mode ='w') as file: 
   file.write("Recognized Speech:") 
   file.write("\n") 
   file.write(result) 
   print("ready!")

视频示范 (Video Demonstration)

Just started my journey on YouTube, I will be demonstrating Machine Learning, Data Science, Artificial Intelligence and more projects for you. Enjoy!

刚开始我在YouTube上的旅程，我将为您演示机器学习，数据科学，人工智能和更多项目。 请享用！

Congrats! You have created a program that converts a video into an audio file and then extracts the speech from that audio. And lastly, exporting the recognized speech into a text document. Hoping that you enjoyed reading this post and working on the project. I am glad if you learned something new today. Working on hands-on programming projects like this one is the best way to sharpen your coding skills.

恭喜！您已经创建了一个程序，可以将视频转换为音频文件，然后从该音频中提取语音。最后，将识别的语音导出到文本文档中。希望您喜欢阅读这篇文章并从事该项目。如果您今天学到新知识，我感到很高兴。从事这样的动手编程项目是提高您的编码技能的最佳方法。

Feel free to contact me if you have any questions while implementing the code.

实施代码时如有任何疑问，请随时与我联系。

Follow my blog and Towards Data Science to stay inspired.

关注我的博客和迈向数据科学，保持灵感。

python语音特征提取_使用Python从视频中提取语音