搭建一个自己的学术语音助手(2)

背景

上篇文章整体介绍了学术语音助手的技术架构,以及用到的技术选型,这一篇文章会把每一部分的实现细节落下来。代码会整理到github开源项目库https://github.com/liangwq/Chatglm_lora_multi-gpu。
这部分代码实现中LLM服务时符合openai的api标准,也就是说只需要把里面用到LLM地方服务换了可以用chatgpt、本地部署的LLM服务、智谱、qwen服务都可以用。我在实现这个项目时候有调试过了chatgpt接口、本地Qwen服务,下面文章也是以本地Qwen服务来介绍。
搭建一个自己的学术语音助手(1)

实现

LLM服务部署

# 安装依赖
git clone [email protected]:QwenLM/Qwen.git
cd Qwen
pip install -r requirements.txt
pip install fastapi uvicorn openai "pydantic>=2.3.0" sse_starlette

# 启动模型服务,通过 -c 参数指定模型版本
# - 指定 --server-name 0.0.0.0 将允许其他机器访问您的模型服务
# - 指定 --server-name 127.0.0.1 则只允许部署模型的机器自身访问该模型服务
python openai_api.py --server-name 0.0.0.0 --server-port 8000 -c QWen/QWen-14B-Chat

你也可以修改参数,比如-c来修改模型名称或路径, --cpu-only改为CPU部署等等。如果部署出现问题,更新上述代码库往往可以解决大多数问题。
部署好Qwen大模型的服务后,可以使用以下代码测试API是否可用:

import openai
openai.api_base = "http://localhost:8000/v1"
openai.api_key = "none"

# 使用流式回复的请求
for chunk in openai.ChatCompletion.create(
    model="Qwen",
    messages=[
        {"role": "user", "content": "你好"}
    ],
    stream=True
    # 流式输出的自定义stopwords功能尚未支持,正在开发中
):
    if hasattr(chunk.choices[0].delta, "content"):
        print(chunk.choices[0].delta.content, end="", flush=True)

# 不使用流式回复的请求
response = openai.ChatCompletion.create(
    model="Qwen",
    messages=[
        {"role": "user", "content": "你好"}
    ],
    stream=False,
    stop=[] # 在此处添加自定义的stop words 例如ReAct prompting时需要增加: stop=["Observation:"]。
)
print(response.choices[0].message.content)

文章翻译脚本

这部分代码的实现思路是,把一个大的pdf文本做切分,新建两个文件夹:一个用来存取切分完的原文、一个用来存储翻译好的文件,用Qwen大模型来做中英文翻译。之所以把切分完的文件、翻译的文件都保存,其实是为后面提高QA检索埋了个伏笔。每个文件都编号存在文件夹、并且后面抽取关键词、关键信息也是针对现在切分文件做好了,所以在检索时候可以利用抽取的关键词、关键信息来改写用户提问理解用户意图提高问题回答准确性。

# -*- coding: utf-8 -*-
# PDF Loaders. If unstructured gives you a hard time, try PyPDFLoader
from langchain.document_loaders import UnstructuredPDFLoader, OnlinePDFLoader, PyPDFLoader

# To split our transcript into pieces
from langchain.text_splitter import RecursiveCharacterTextSplitter
import os

loader = PyPDFLoader("/root/autodl-tmp/quantum_algorithms.pdf")

## Other options for loaders 
#loader = UnstructuredPDFLoader("/root/autodl-tmp/quantum_algorithms.pdf")
data = loader.load()
# Note: If you're using PyPDFLoader then it will split by page for you already
print (f'You have {len(data)} document(s) in your data')
print (f'There are {len(data[0].page_content)} characters in your document')


text_splitter = RecursiveCharacterTextSplitter(separators=["\n\n", "\n"], chunk_size=5000, chunk_overlap=500)
texts = text_splitter.split_documents(data)

print (f'Now you have {len(texts)} documents')

'''
把pdf转txt后切分成块后做翻译,保存文件
'''
import openai

import os
import logging


# 指定文件夹路径
folder_path = " article_en"

# 检查文件夹是否存在,如果不存在则创建
if not os.path.exists(folder_path):
    os.makedirs(folder_path)

# 指定文件夹路径
folder_path_ch = " article_ch"

# 检查文件夹是否存在,如果不存在则创建
if not os.path.exists(folder_path_ch):
    os.makedirs(folder_path_ch)

# 设置日志记录
logging.basicConfig(filename='retry.log', level=logging.ERROR)

def translate_article(folder_path,folder_path_ch,content):
    # 在新建的文件夹中创建文件并写入英文数据
    file_path = os.path.join(folder_path, "example_"+str(i)+".txt")
    # 判断文件是否存在
    if os.path.exists(file_path):
        # 如果文件存在,删除文件
        os.remove(file_path)
        print(f"文件 {file_path} 存在并已删除。")

    # 打开文件并写入英文数据
    with open(file_path, "w") as file:
        file.write(content.page_content)

    # 关闭文件
    file.close()

    # 在新建的文件夹中创建文件并写入中文数据
    file_path_ch = os.path.join(folder_path_ch, "example_"+str(i)+".txt")

    # 判断文件是否存在
    if os.path.exists(file_path_ch):
        # 如果文件存在,删除文件
        os.remove(file_path_ch)
        print(f"文件 {file_path_ch} 存在并已删除。")


    #请求openapi把英文翻译成中文
    try:
        import openai

        openai.api_base = "http://localhost:8000/v1"
        openai.api_key = "none"
        response = openai.ChatCompletion.create(
            model="Qwen",
            messages=[
                {"role": "system", "content": "你是一个专业翻译机器人,可以把论文翻译的准确表述流畅,严格执行人类指令"},
                {"role": "user", "content": content.page_content+"\n把上面论文片段翻译成中文"},
            ],
            stream=False,
            stop=[] # You can add custom stop words here, e.g., stop=["Observation:"] for ReAct prompting.
        )
        # 打开文件并写入中文数据
        with open(file_path_ch, "w") as file:
            file.write(response["choices"][0]["message"]["content"])
    except Exception as e:
        pass


    # 关闭文件
    file.close()

for i in range(len(texts)):
    # 最大重试次数
    max_retries = 3
    retry_count = 0

    while retry_count < max_retries:
        try:
            #print(data[i])
            translate_article(folder_path,folder_path_ch,texts[i])

                              # 如果操作成功,退出循环
                              break
                              except Exception as e:
                              # 操作失败,记录异常到日志
                              logging.error(f"操作失败: {e}")
                              # 增加重试次数
                              retry_count += 1
                              if retry_count < max_retries:
                              print(f"操作失败,重试中 ({retry_count}/{max_retries})...")
                              print(data[i])
                              # 等待一段时间后重试
                              #time.sleep(1)
                              else:
                              # 达到最大重试次数,抛出异常
                              raise
                              # 提示操作完成
                              print("文件夹和文件创建完成。")

文章抽取知识实现

这部分的作用,有三个:
1.为后面提高用户意图理解,对用户问题做改写提供meta信息
2.可以利用提取关键词、关键信息,利用检索生成方式减少长文生成摘要遍历全文消耗过长时间
3.可以把关键信息产品化呈现出来,给用户提供信息方便用户更精准提问
这部分实现在这次实验是单独拿出来,如果真做成产品可以把抽取摘要和建立索引部分呢变成函数,整合到翻译模块,减少多次遍历pdf耗费时间。

# -*- coding: utf-8 -*-
# PDF Loaders. If unstructured gives you a hard time, try PyPDFLoader
from langchain.document_loaders import UnstructuredPDFLoader, OnlinePDFLoader, PyPDFLoader

# To split our transcript into pieces
from langchain.text_splitter import RecursiveCharacterTextSplitter
import os

loader = PyPDFLoader("/root/autodl-tmp/quantum_algorithms.pdf")

## Other options for loaders 
#loader = UnstructuredPDFLoader("/root/autodl-tmp/quantum_algorithms.pdf")
data = loader.load()
# Note: If you're using PyPDFLoader then it will split by page for you already
print (f'You have {len(data)} document(s) in your data')
print (f'There are {len(data[0].page_content)} characters in your document')


text_splitter = RecursiveCharacterTextSplitter(separators=["\n\n", "\n"], chunk_size=5000, chunk_overlap=500)
texts = text_splitter.split_documents(data)

print (f'Now you have {len(texts)} documents')

'''
把pdf转txt后切分成块后做翻译,保存文件
'''
import openai

import os
import logging


# 指定文件夹路径
folder_path = " article_en"

# 检查文件夹是否存在,如果不存在则创建
if not os.path.exists(folder_path):
    os.makedirs(folder_path)

# 指定文件夹路径
folder_path_ch = " article_keyword"

# 检查文件夹是否存在,如果不存在则创建
if not os.path.exists(folder_path_ch):
    os.makedirs(folder_path_ch)

# 设置日志记录
logging.basicConfig(filename='retry.log', level=logging.ERROR)

def translate_article(folder_path,folder_path_ch,content):
    # 在新建的文件夹中创建文件并写入英文数据
    file_path = os.path.join(folder_path, "example_"+str(i)+".txt")
    # 判断文件是否存在
    if os.path.exists(file_path):
        # 如果文件存在,删除文件
        os.remove(file_path)
        print(f"文件 {file_path} 存在并已删除。")

    # 打开文件并写入英文数据
    with open(file_path, "w") as file:
        file.write(content.page_content)

    # 关闭文件
    file.close()

    # 在新建的文件夹中创建文件并写入中文数据
    file_path_ch = os.path.join(folder_path_ch, "example_"+str(i)+".txt")

    # 判断文件是否存在
    if os.path.exists(file_path_ch):
        # 如果文件存在,删除文件
        os.remove(file_path_ch)
        print(f"文件 {file_path_ch} 存在并已删除。")


    #请求openapi把英文翻译成中文
    try:
        import openai

        openai.api_base = "http://localhost:8000/v1"
        openai.api_key = "none"
        response = openai.ChatCompletion.create(
            model="gpt-3.5-turbo-16k",
            messages=[
                {"role": "system", "content": "你是一个专业学术论点抽取机器人,可以精准抽取论文中关键信息、精准抓取出论文片段中关键词、关键观点、生成精准摘要,并把英文翻译成中文输出,严格执行人类指令;\n"},
                {"role": "user", "content": content.page_content+'\n对上面论文片段抽取关信息、关键词、关键观点、生成摘要;并以{"关键信息":,"关键词":,"关键观点":,"生成摘要":}json格式输出,英文翻译成中文'},
            ],
            stream=False,
            stop=[] # You can add custom stop words here, e.g., stop=["Observation:"] for ReAct prompting.
        )
        # 打开文件并写入中文数据
        with open(file_path_ch, "w") as file:
            file.write(response["choices"][0]["message"]["content"])
    except Exception as e:
        pass


    # 关闭文件
    file.close()

for i in range(len(texts)):
    # 最大重试次数
    max_retries = 3
    retry_count = 0

    while retry_count < max_retries:
    try:
    #print(data[i])
    translate_article(folder_path,folder_path_ch,texts[i])

    # 如果操作成功,退出循环
    break
except Exception as e:
    # 操作失败,记录异常到日志
    logging.error(f"操作失败: {e}")
    # 增加重试次数
    retry_count += 1
    if retry_count < max_retries:
    print(f"操作失败,重试中 ({retry_count}/{max_retries})...")
    print(data[i])
    # 等待一段时间后重试
    #time.sleep(1)
else:
    # 达到最大重试次数,抛出异常
    raise
# 提示操作完成
    print("文件夹和文件创建完成。")

文章摘要实现

这部分实现生活利用了langchain的mapreduce的方法来实现摘要抽取。实现思路就是把大文件切成小文件,抽取没部分的摘要,然后汇总摘要在重复上面的几个流程,直到抽取的摘要符合要求(迭代轮次、字数要求)。这个效果还可以,但是后续如果要优化摘要效果其实还有很多工作可以做。比如:
1.如何在map阶段把每段摘要抽取的既能兼顾本段信息、又能考虑整体信息完整
2.对reduce信息汇总要怎么要才能保持文本思路结构,而不是只是把高频出现信息给出
3,有没可能概念层次化抽取,就是说对文章有整体认识,然后基于整体中关键信息作进一步精细化概念抽取

# PDF Loaders. If unstructured gives you a hard time, try PyPDFLoader
from langchain.document_loaders import UnstructuredPDFLoader, OnlinePDFLoader, PyPDFLoader

# To split our transcript into pieces
from langchain.text_splitter import RecursiveCharacterTextSplitter
import os

loader = PyPDFLoader("/root/autodl-tmp/quantum_algorithms.pdf")

## Other options for loaders 
#loader = UnstructuredPDFLoader("/root/autodl-tmp/quantum_algorithms.pdf")
data = loader.load()
# Note: If you're using PyPDFLoader then it will split by page for you already
print (f'You have {len(data)} document(s) in your data')
print (f'There are {len(data[0].page_content)} characters in your document')


text_splitter = RecursiveCharacterTextSplitter(separators=["\n\n", "\n"], chunk_size=8000, chunk_overlap=800)
texts = text_splitter.split_documents(data)

print (f'Now you have {len(texts)} documents')

from langchain.vectorstores import Chroma, Pinecone
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings

# 初始化 openai 的 embeddings 对象
embeddings = SentenceTransformerEmbeddings(model_name ='/root/autodl-tmp/piccolo-large-zh')

# load it into Chroma
docsearch = Chroma.from_documents(texts, embeddings)

query = "这篇文章摘要是什么?"
docs = docsearch.similarity_search(query)

# Here's an example of the first document that was returned
print(docs[0].page_content[:450])

from langchain.chains.summarize import load_summarize_chain
from langchain.chat_models import ChatOpenAI
#llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo-16k",openai_api_key='sk-gkyea6xuon0Hd64NBjb6FADdgSvaAmLsY6nvRM1x4y6On9mk',openai_api_base='https://api.closeai-asia.com/v1')

llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo-16k",openai_api_key='none',openai_api_base='http://localhost:8000/v1')



# verbose=True will output the prompts being sent to the 
chain = load_summarize_chain(llm, chain_type="map_reduce", verbose=True)

output = chain.run(texts)

print(output)

下面给出了一个对现有抽取chain模版定制化的示例模版,需要对摘要优化可以基于这个模版作进一步迭代。

# PDF Loaders. If unstructured gives you a hard time, try PyPDFLoader
from langchain.document_loaders import UnstructuredPDFLoader, OnlinePDFLoader, PyPDFLoader

# To split our transcript into pieces
from langchain.text_splitter import RecursiveCharacterTextSplitter
import os

# Prompt templates for dynamic values
from langchain.prompts.chat import (
ChatPromptTemplate,
SystemMessagePromptTemplate,
AIMessagePromptTemplate, # I included this one so you know you'll have it but we won't be using it
HumanMessagePromptTemplate
)

# To create our chat messages
from langchain.schema import (
AIMessage,
HumanMessage,
SystemMessage
)

loader = PyPDFLoader("/root/autodl-tmp/quantum_algorithms.pdf")

## Other options for loaders 
#loader = UnstructuredPDFLoader("/root/autodl-tmp/quantum_algorithms.pdf")
data = loader.load()
# Note: If you're using PyPDFLoader then it will split by page for you already
print (f'You have {len(data)} document(s) in your data')
print (f'There are {len(data[0].page_content)} characters in your document')

text_splitter = RecursiveCharacterTextSplitter(separators=["\n\n", "\n"], chunk_size=3000, chunk_overlap=250)
texts = text_splitter.split_documents(data)

print (f'Now you have {len(texts)} documents')

from langchain.chains.summarize import load_summarize_chain
from langchain.chat_models import ChatOpenAI
#llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo-16k",openai_api_key='sk-gkyea6xuon0Hd64NBjb6FADdgSvaAmLsY6nvRM1x4y6On9mk',openai_api_base='https://api.closeai-asia.com/v1')

summary_output_options = {
    'one_sentence' : """
     - Only one sentence
    """,

    'bullet_points': """
     - Bullet point format
     - Separate each bullet point with a new line
     - Each bullet point should be concise
    """,

    'short' : """
     - A few short sentences
     - Do not go longer than 4-5 sentences
    """,

    'long' : """
     - A verbose summary
     - You may do a few paragraphs to describe the transcript if needed
    """
}

template="""

You are a helpful assistant, assisting {rep_name}, a professional researcher, in extracting important information from this {rep_company} academic paper. 
Your goal is to write a summary from an academic perspective, highlighting key points relevant to this academic paper.
Do not respond with anything outside of the call transcript. If you don't know, say, "I don't know"
"""
system_message_prompt_map = SystemMessagePromptTemplate.from_template(template)

human_template="{text}" # Simply just pass the text as a human message
human_message_prompt_map = HumanMessagePromptTemplate.from_template(human_template)

chat_prompt_map = ChatPromptTemplate.from_messages(messages=[system_message_prompt_map, human_message_prompt_map])

template="""

You are a helpful assistant, assisting {rep_name}, a professional researcher, in extracting important information from this {rep_company} academic paper. 
Your goal is to write a summary from an academic perspective, highlighting key points relevant to this academic paper.
Do not respond with anything outside of the call transcript. If you don't know, say, "I don't know"

Respond with the following format
{output_format}

"""
system_message_prompt_combine = SystemMessagePromptTemplate.from_template(template)

human_template="{text}" # Simply just pass the text as a human message
human_message_prompt_combine = HumanMessagePromptTemplate.from_template(human_template)

chat_prompt_combine = ChatPromptTemplate.from_messages(messages=[system_message_prompt_combine, human_message_prompt_combine])


llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo-16k",openai_api_key='none',openai_api_base='http://localhost:8000/v1')

# verbose=True will output the prompts being sent to the 
#chain = load_summarize_chain(llm, chain_type="map_reduce", verbose=True)
#output = chain.run(texts)

chain = load_summarize_chain(llm,
chain_type="map_reduce",
map_prompt=chat_prompt_map,
combine_prompt=chat_prompt_combine,
verbose=True
)

user_selection = 'one_sentence'

output = chain.run({
"input_documents": texts,
"rep_company": "Quantum computing latest", \
"rep_name" : "Quantum computing discipline",
"output_format" : summary_output_options[user_selection]
})


print(output)

向量知识问答实现

这部分的实现思路,把用户的输入embbeding,利用embbeding相似性检索出pdf中和用户提问相关的信息,然后把召回的相关信息作为上下文和用户提问一起给LLM去生成回复答案。
如果要提高基于pdf知识的准确率,有以下几个可优化方向:
1.用户提问改写,更好的理解用户意图;方便后续重pdf中找到更准确上下文知识
2.向量召回环境要提高匹配准确性
3.对于文章中pdf知识可以对每段做关键信息索引,提高准确性

# PDF Loaders. If unstructured gives you a hard time, try PyPDFLoader
from langchain.document_loaders import UnstructuredPDFLoader, OnlinePDFLoader, PyPDFLoader

from langchain.text_splitter import RecursiveCharacterTextSplitter
import os

loader = PyPDFLoader("/root/autodl-tmp/quantum_algorithms.pdf")

## Other options for loaders 
#loader = UnstructuredPDFLoader("/root/autodl-tmp/quantum_algorithms.pdf")
data = loader.load()
# Note: If you're using PyPDFLoader then it will split by page for you already
print (f'You have {len(data)} document(s) in your data')
print (f'There are {len(data[0].page_content)} characters in your document')

text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=0)
texts = text_splitter.split_documents(data)

print (f'Now you have {len(texts)} documents')

from langchain.vectorstores import Chroma, Pinecone
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings

# 初始化 openai 的 embeddings 对象
embeddings = SentenceTransformerEmbeddings(model_name ='/root/autodl-tmp/piccolo-large-zh')

# load it into Chroma
docsearch = Chroma.from_documents(texts, embeddings)

query = "这篇文章摘要是什么?"
docs = docsearch.similarity_search(query)

# Here's an example of the first document that was returned
print(docs[0].page_content[:450])

from langchain.chains.question_answering import load_qa_chain
from langchain.chat_models import ChatOpenAI
#llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo-16k",openai_api_key='sk-gkyea6xuon0Hd64NBjb6FADdgSvaAmLsY6nvRM1x4y6On9mk',openai_api_base='https://api.closeai-asia.com/v1')

llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo-16k",openai_api_key='none',openai_api_base='http://localhost:8000/v1')


chain = load_qa_chain(llm, chain_type="stuff")

query = "这篇文章主要介绍了什么?"
docs = docsearch.similarity_search(query)
answer = chain.run(input_documents=docs, question=query)
print(answer)

语音问答实现

这部分实现就是把用户的语音输入转成了str文本,现在是通过gradio方式做了一个界面可以方便用户录入语音,或者把语音wav直接拖进来做翻译。现在的做法比较粗暴,用户语音输入转str后需要拷贝然后放入qa模块来做提问。这块如果要整合成项目,可以把这部分代码和QA部分整合。后续我也会逐步更新github。

#先下载whisper-large-v2 放到/root/autodl-tmp/whisper-large-v2
#git clone https://huggingface.co/openai/whisper-large-v2
import torch

import gradio as gr
import yt_dlp as youtube_dl
from transformers import pipeline
from transformers.pipelines.audio_utils import ffmpeg_read

import tempfile
import os

MODEL_NAME = "openai/whisper-large-v2"
BATCH_SIZE = 8
FILE_LIMIT_MB = 1000
YT_LENGTH_LIMIT_S = 3600  # limit to 1 hour YouTube files

device = 0 if torch.cuda.is_available() else "cpu"

pipe = pipeline(
    task="automatic-speech-recognition",
    model="/root/autodl-tmp/whisper-large-v2",#MODEL_NAME,
    chunk_length_s=30,
    device=device,
)


def transcribe(inputs, task):
    if inputs is None:
        raise gr.Error("No audio file submitted! Please upload or record an audio file before submitting your request.")

    text = pipe(inputs, batch_size=BATCH_SIZE, generate_kwargs={"task": task}, return_timestamps=True)["text"]
    return  text


def _return_yt_html_embed(yt_url):
    video_id = yt_url.split("?v=")[-1]
    HTML_str = (
        f'
' "
"
) return HTML_str def download_yt_audio(yt_url, filename): info_loader = youtube_dl.YoutubeDL() try: info = info_loader.extract_info(yt_url, download=False) except youtube_dl.utils.DownloadError as err: raise gr.Error(str(err)) file_length = info["duration_string"] file_h_m_s = file_length.split(":") file_h_m_s = [int(sub_length) for sub_length in file_h_m_s] if len(file_h_m_s) == 1: file_h_m_s.insert(0, 0) if len(file_h_m_s) == 2: file_h_m_s.insert(0, 0) file_length_s = file_h_m_s[0] * 3600 + file_h_m_s[1] * 60 + file_h_m_s[2] if file_length_s > YT_LENGTH_LIMIT_S: yt_length_limit_hms = time.strftime("%HH:%MM:%SS", time.gmtime(YT_LENGTH_LIMIT_S)) file_length_hms = time.strftime("%HH:%MM:%SS", time.gmtime(file_length_s)) raise gr.Error(f"Maximum YouTube length is {yt_length_limit_hms}, got {file_length_hms} YouTube video.") ydl_opts = {"outtmpl": filename, "format": "worstvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best"} with youtube_dl.YoutubeDL(ydl_opts) as ydl: try: ydl.download([yt_url]) except youtube_dl.utils.ExtractorError as err: raise gr.Error(str(err)) def yt_transcribe(yt_url, task, max_filesize=75.0): html_embed_str = _return_yt_html_embed(yt_url) with tempfile.TemporaryDirectory() as tmpdirname: filepath = os.path.join(tmpdirname, "video.mp4") download_yt_audio(yt_url, filepath) with open(filepath, "rb") as f: inputs = f.read() inputs = ffmpeg_read(inputs, pipe.feature_extractor.sampling_rate) inputs = {"array": inputs, "sampling_rate": pipe.feature_extractor.sampling_rate} text = pipe(inputs, batch_size=BATCH_SIZE, generate_kwargs={"task": task}, return_timestamps=True)["text"] return html_embed_str, text demo = gr.Blocks() mf_transcribe = gr.Interface( fn=transcribe, inputs=[ gr.inputs.Audio(source="microphone", type="filepath", optional=True), gr.inputs.Radio(["transcribe", "translate"], label="Task", default="transcribe"), ], outputs="text", layout="horizontal", theme="huggingface", title="Whisper Large V2: Transcribe Audio", description=( "Transcribe long-form microphone or audio inputs with the click of a button! Demo uses the" f" checkpoint [{MODEL_NAME}](https://huggingface.co/{MODEL_NAME}) and Transformers to transcribe audio files" " of arbitrary length." ), allow_flagging="never", ) file_transcribe = gr.Interface( fn=transcribe, inputs=[ gr.inputs.Audio(source="upload", type="filepath", optional=True, label="Audio file"), gr.inputs.Radio(["transcribe", "translate"], label="Task", default="transcribe"), ], outputs="text", layout="horizontal", theme="huggingface", title="Whisper Large V2: Transcribe Audio", description=( "Transcribe long-form microphone or audio inputs with the click of a button! Demo uses the" f" checkpoint [{MODEL_NAME}](https://huggingface.co/{MODEL_NAME}) and Transformers to transcribe audio files" " of arbitrary length." ), allow_flagging="never", ) with demo: gr.TabbedInterface([mf_transcribe, file_transcribe], ["Microphone", "Audio file"]) demo.launch(enable_queue=True,server_port= 6006)

语音回复实现

这部分选用了coqui TTS来实现语音合成,方便后续扩展语音clone功能。具体代码实现步骤如下:
1.安装TTS库
2.下载需要的tts包
3.执行语音合成代码

git clone https://github.com/coqui-ai/TTS
cd TTS
pip install -r requirements.txt
#或者直接 pip install tts
#测试是否安装成功
tts --list_models

输出模型的信息,说明OK

Name format: type/language/dataset/model
 1: tts_models/multilingual/multi-dataset/your_tts
 2: tts_models/en/ek1/tacotron2
 ....

查看模型信息

tts --model_info_by_name tts_models/tr/common-voice/glow-tts
> model type : tts_models
> language supported : tr
> dataset used : common-voice
> model name : glow-tts
> description : Turkish GlowTTS model using an unknown speaker from the Common-Voice dataset.
> default_vocoder : vocoder_models/tr/common-voice/hifigan

下载模型

tts --text "你好中国,我爱你中国" --model_name "tts_models/zh-CN/baker/tacotron2-DDC-GST" --out_path output.wav 

代码执行

from TTS.api import TTS
tts = TTS("tts_models/zh-CN/baker/tacotron2-DDC-GST", gpu=True)

# generate speech by cloning a voice using default settings
tts.tts_to_file(text="我爱你中国,中国山河壮丽",
                file_path="output.wav",
                speaker_wav="/root/autodl-tmp/TTS/TTS/female.wav",
                )

# generate speech by cloning a voice using custom settings
tts.tts_to_file(text="我爱你中国,中国山河壮丽",
                file_path="output1.wav",
                speaker_wav="/root/autodl-tmp/TTS/TTS/female.wav",
            
                decoder_iterations=30)

小结:

这篇文章把学术语音小助手几个模块部分实现细节代码完善。完整项目代码放在github中https://github.com/liangwq/Chatglm_lora_multi-gpu,感兴趣同学可以下载下来自己运行下。目前代码已经把所有功能都实现了,但是还没有整合成一个一键运行的项目,这块后续会逐步完善。项目还有很多优化提高地方,比如:
1.如何提高长文本摘要生成准确性和生成速度
2.如何提高语音合成速度、提高合成合成长度
3.如何提高基于知识库问答的准确性
4.如何提高知识抽取的准确性
5.如何快速对用户提问做语义理解和改写

你可能感兴趣的:(人工智能,算法,科技,AIGC,agi)