【小沐学AI】Python实现语音辨认(whisper+HuggingFace)

[复制链接]
发表于 2026-2-4 23:10:59 | 显示全部楼层 |阅读模式
1、简介

1.1 whisper

https://arxiv.org/pdf/2212.04356
https://github.com/openai/whisper
Whisper 是一种通用语音辨认模子。它是在各种音频的大型数据集上训练的,也是一个多任务模子,可以实验多语言语音辨认、语音翻译和语言辨认。

Transformer 序列到序列模子针对各种语音处理处罚任务举行训练,包罗多语言语音辨认、语音翻译、口语辨认和语音运动检测。这些任务共同体现为解码器要推测的一系列标记,从而允许单个模子代替传统语音处理处罚管道的很多阶段。多任务训练格式利用一组特别标记作为任务分析符或分类目的。
2、HuggingFace

https://www.hugging-face.org/models/
Hugging Face AI 是一个致力于呆板学习和数据科学的平台和社区,资助用户构建、摆设和训练 ML 模子。它为在实际应用步伐中演示、运行和实验 AI 提供了须要的根本办法。该平台利用户可以大概探索和利用其他人上传的模子和数据集。Hugging Face AI 通常被比作呆板学习的 GitHub,它鼓励对开发职员的工作举行公开共享和测试。

该平台以其 Transformers Python 库而闻名,该库简化了访问和训练 ML 模子的过程。该库为开发职员提供了一种有用的方法,可以将 Hugging Face 中的 ML 模子集成到他们的项目中并创建 ML 管道。它是实用于 PyTorch、TensorFlow 和 JAX 的开始辈的呆板学习。
Hugging Face 提供了两种方式来访问大模子:


  • Inference API (Serverless) :通过 API 举行推理。
  1. import requests
  2. API_URL = "https://api-inference.huggingface.co/models/meta-llama/Llama-2-7b-hf"  
  3. headers = {"Authorization": "Bearer xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}  
  4.   
  5. def query(payload):  
  6.     response = requests.post(API_URL, headers=headers, json=payload)  
  7.     return response.json()  
  8.   
  9. output = query({  
  10.     "inputs": "Can you please let us know more details about your ",  
  11. })
复制代码


  • 本地实验 :利用 Hugging Face 的 pipeline 来举行高级操纵。
  1. from transformers import pipeline
  2. pipe = pipeline("text-generation", model="meta-llama/Llama-2-7b-hf")
复制代码
2.1 安装transformers

  1. pip install transformers
复制代码

2.2 Pipeline 简介

Pipeline将数据预处理处罚、模子调用、效果后处理处罚三部分组装成的流水线,使我们可以大概直接输入文本便得到终极的答案。

Pipeline的创建与利用方式:
  1. # 1、根据任务类型直接创建Pipeline
  2. pipe = pipeline("text-classification")
  3. # 2、指定任务类型,再指定模型,创建基于指定模型的Pipeline
  4. pipe = pipeline("text-classification", model="uer/roberta-base-finetuned-dianping-chinese")
  5. # 3、预先加载模型,再创建Pipeline
  6. # 必须同时指定model和tokenizer
  7. model = AutoModelForSequenceClassification.from_pretrained("uer/roberta-base-finetuned-dianping-chinese")
  8. tokenizer = AutoTokenizer.from_pretrained("uer/roberta-base-finetuned-dianping-chinese")
  9. pipe = pipeline("text-classification", model=model, tokenizer=tokenizer)
  10. # 4、使用GPU进行推理加速
  11. pipe = pipeline("text-classification", model="uer/roberta-base-finetuned-dianping-chinese", device=0)
复制代码
2.3 Tasks 简介



检察Pipeline支持的任务范例:
  1. from transformers.pipelines import SUPPORTED_TASKS
  2. print(SUPPORTED_TASKS.items())
复制代码

  1. for k, v in SUPPORTED_TASKS.items():
  2.     print(k, v)
复制代码

2.3.1 sentiment-analysis

  1. from transformers import pipeline
  2. classifier = pipeline("sentiment-analysis")
  3. text = classifier("I've been waiting for a HuggingFace course my whole life.")
  4. print(text)
  5. text = classifier([
  6.     "I've been waiting for a HuggingFace course my whole life.",
  7.     "I hate this so much!"
  8. ])
  9. print(text)
复制代码

2.3.2 zero-shot-classification

  1. from transformers import pipeline
  2. classifier = pipeline("zero-shot-classification")
  3. text = classifier(
  4.     "This is a course about the Transformers library",
  5.     candidate_labels=["education", "politics", "business"],
  6. )
  7. print(text)
复制代码

2.3.3 text-generation

  1. from transformers import pipeline
  2. generator = pipeline("text-generation")
  3. text = generator("In this course, we will teach you how to")
  4. print(text)
复制代码

  1. from transformers import pipeline
  2. generator = pipeline("text-generation", model="distilgpt2")
  3. text = generator(
  4.     "In this course, we will teach you how to",
  5.     max_length=30,
  6.     num_return_sequences=2,
  7. )
  8. print(text)
复制代码

2.3.4 fill-mask

  1. from transformers import pipeline
  2. unmasker = pipeline("fill-mask")
  3. text = unmasker("This course will teach you all about <mask> models.", top_k=2)
  4. print(text)
复制代码

2.3.5 ner

  1. from transformers import pipeline
  2. ner = pipeline("ner", grouped_entities=True)
  3. text = ner("My name is Sylvain and I work at Hugging Face in Brooklyn.")
  4. print(text)
复制代码

2.3.6 question-answering

  1. from transformers import pipeline
  2. question_answerer = pipeline("question-answering")
  3. text = question_answerer(
  4.     question="Where do I work?",
  5.     context="My name is Sylvain and I work at Hugging Face in Brooklyn"
  6. )
  7. print(text)
复制代码

2.3.7 summarization

  1. from transformers import pipeline
  2. summarizer = pipeline("summarization")
  3. text = summarizer("""
  4.     America has changed dramatically during recent years. Not only has the number of
  5.     graduates in traditional engineering disciplines such as mechanical, civil,
  6.     electrical, chemical, and aeronautical engineering declined, but in most of
  7.     the premier American universities engineering curricula now concentrate on
  8.     and encourage largely the study of engineering science. As a result, there
  9.     are declining offerings in engineering subjects dealing with infrastructure,
  10.     the environment, and related issues, and greater concentration on high
  11.     technology subjects, largely supporting increasingly complex scientific
  12.     developments. While the latter is important, it should not be at the expense
  13.     of more traditional engineering.
  14.     Rapidly developing economies such as China and India, as well as other
  15.     industrial countries in Europe and Asia, continue to encourage and advance
  16.     the teaching of engineering. Both China and India, respectively, graduate
  17.     six and eight times as many traditional engineers as does the United States.
  18.     Other industrial countries at minimum maintain their output, while America
  19.     suffers an increasingly serious decline in the number of engineering graduates
  20.     and a lack of well-educated engineers.
  21. """)
  22. print(text)
复制代码

2.3.8 translation

  1. pip install sentencepiece
复制代码
  1. from transformers import pipeline
  2. # translator = pipeline("translation", model="Helsinki-NLP/opus-mt-fr-en")
  3. translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-zh")
  4. text=translator("To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator.")
  5. print(text)
复制代码

利用HuggingFace的中译英模子和英译中模子。


  • (1)中译英
    中译英模子的模子名称为:opus-mt-zh-en,下载网址为:https://huggingface.co/Helsinki-NLP/opus-mt-zh-en/tree/main
  1. from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
  2. from transformers import pipeline
  3. model_path = './zh-en/'  
  4. #创建tokenizer
  5. tokenizer = AutoTokenizer.from_pretrained(model_path)
  6. #创建模型
  7. model = AutoModelForSeq2SeqLM.from_pretrained(model_path)
  8. #创建pipeline
  9. pipeline = pipeline("translation", model=model, tokenizer=tokenizer)
  10. chinese="""
  11. 中国男子篮球职业联赛(Chinese Basketball Association),简称中职篮(CBA),是由中国篮球协会所主办的跨年度主客场制篮球联赛,中国最高等级的篮球联赛,其中诞生了如姚明、王治郅、易建联、朱芳雨等球星。"""
  12. result = pipeline(chinese)
  13. print(result[0]['translation_text'])
复制代码


  • (2)英译中
    英译中模子的模子名称为opus-mt-en-zh,下载网址为:https://huggingface.co/Helsinki-NLP/opus-mt-en-zh/tree/main
  1. from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
  2. from transformers import pipeline
  3. model_path = './en-zh/'  
  4. #创建tokenizer
  5. tokenizer = AutoTokenizer.from_pretrained(model_path)
  6. #创建模型
  7. model = AutoModelForSeq2SeqLM.from_pretrained(model_path)
  8. #创建pipeline
  9. pipeline = pipeline("translation", model=model, tokenizer=tokenizer)
  10. english="""
  11. The official site of the National Basketball Association. Follow the action on NBA scores, schedules, stats, news, Team and Player news.
  12. """
  13. result = pipeline(english)
  14. print(result[0]['translation_text'])
复制代码
3、测试

pipeline() 提供了在任何语言、盘算机视觉、音频和多模态任务上利用 Hub 中的任何模子举行推理的简朴方法。纵然您对某个具体模态没有履历大概不认识模子背后的代码,您仍旧可以利用 pipeline() 举行推理!
  1. from transformers import pipeline
  2. # 首先创建一个 pipeline() 并指定一个推理任务:
  3. generator = pipeline(task="automatic-speech-recognition")
  4. # 将输入文本传递给 pipeline():
  5. text = generator("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac")
  6. print(text)
复制代码



  • 视觉任务的 pipeline
    对于视觉任务,利用 pipeline() 险些是雷同的。指定您的任务并将图像转达给分类器。图像可以是链接或图像的本地路径。比方,下面表现的是哪个品种的猫?
  1. from transformers import pipeline
  2. vision_classifier = pipeline(model="google/vit-base-patch16-224")
  3. preds = vision_classifier(
  4.     images="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"
  5. )
  6. preds = [{"score": round(pred["score"], 4), "label": pred["label"]} for pred in preds]
  7. print(preds)
复制代码



  • 文本任务的 pipeline
    对于自然语言处理处罚(NLP)任务,利用 pipeline() 险些是雷同的。
  1. from transformers import pipeline
  2. # 该模型是一个 `zero-shot-classification (零样本分类)` 模型。
  3. # 它会对文本进行分类,您可以传入你能想到的任何标签
  4. classifier = pipeline(model="facebook/bart-large-mnli")
  5. text = classifier(
  6.     "I have a problem with my iphone that needs to be resolved asap!!",
  7.     candidate_labels=["urgent", "not urgent", "phone", "tablet", "computer"],
  8. )
  9. print(text)
复制代码



  • 语音转笔墨
  1. import os
  2. from transformers import pipeline
  3. import subprocess
  4. import argparse
  5. import json
  6. os.environ["HF_ENDPOINT"] = "https://hf-mirror.com"
  7. os.environ["CUDA_VISIBLE_DEVICES"] = "2"
  8. os.environ["TF_ENABLE_ONEDNN_OPTS"] = "0"
  9. def speech2text(speech_file):
  10.     transcriber = pipeline(task="automatic-speech-recognition", model="openai/whisper-medium")
  11.     text_dict = transcriber(speech_file)
  12.     return text_dict
  13. def main():
  14.     # parser = argparse.ArgumentParser(description="语音转文本")
  15.     # parser.add_argument("--audio","-a", type=str, help="输出音频文件路径")
  16.     # args = parser.parse_args()
  17.     # print(args)
  18.     # text_dict = speech2text(args.audio)
  19.     text_dict = speech2text("test.mp3")
  20.     print("语音识别的文本是:\n" +  text_dict["text"])
  21.     print("语音识别的文本是:\n"+ json.dumps(text_dict,indent=4, ensure_ascii=False))
  22. if __name__=="__main__":
  23.     main()
复制代码

更多AI信息如下:

2024第四届人工智能、自动化与高性能盘算国际集会(AIAHPC 2024)将于2024年7月19-21日在中国·珠海召开。
大会网站:更多集会详情
时间地点:中国珠海-中山大学珠海校区|2024年7月19-21日

结语

如果您以为该方法或代码有一点点用处,可以给作者点个赞,或打赏杯咖啡;╮( ̄▽ ̄)╭
如果您感觉方法或代码不咋地//(ㄒoㄒ)//,就在品评处留言,作者继续改进;o_O???
如果您须要相干功能的代码定制化开发,可以留言私信作者;(✿◡‿◡)
感谢各位大佬童鞋们的支持!( ´ ▽´ )ノ ( ´ ▽´)っ!!!

免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!qidao123.com:ToB企服之家,中国第一个企服评测及软件市场,开放入驻,技术点评得现金

本帖子中包含更多资源

您需要 登录 才可以下载或查看,没有账号?立即注册

×
回复

使用道具 举报

登录后关闭弹窗

登录参与点评抽奖  加入IT实名职场社区
去登录
快速回复 返回顶部 返回列表