JavaAI超强语音转文本SenseVoice，本地化摆设教程！

郭卫东 发表于 2024-9-4 23:11:36

AI超强语音转文本SenseVoice，本地化摆设教程！

模型先容

SenseVoice专注于高精度多语言语音识别、情感辨识和音频事件检测

[*]多语言识别：采用超过40万小时数据练习，支持超过50种语言，识别效果上优于Whisper模型。
[*]富文本识别：

[*]具备优秀的情感识别，能够在测试数据上到达和超过现在最佳情感识别模型的效果。
[*]支持声音事件检测能力，支持音乐、掌声、笑声、哭声、咳嗽、喷嚏等多种常见人机交互事件进行检测。

[*]高效推理： SenseVoice-Small模型采用非自回归端到端框架，推理延迟极低，10s音频推理仅耗时70ms，15倍优于Whisper-Large。
[*]微调定制：具备便捷的微调脚本与策略，方便用户根据业务场景修复长尾样本题目。
SenseVoice在线预览链接

[*]SenseVoice 在线预览：https://www.modelscope.cn/studios/iic/SenseVoice
https://img2024.cnblogs.com/other/2153830/202409/2153830-20240904232838031-1496711559.jpg
本地化摆设

这里使用autodl 呆板学习平台，官网地址：https://www.autodl.com/market/list
直接到算力市场，选择按量计费，地区随便选择一个，这里使用4090显卡。
https://img2024.cnblogs.com/other/2153830/202409/2153830-20240904232846195-838755007.png
如图选择PyTorch 版本，最后点击创建。
https://img2024.cnblogs.com/other/2153830/202409/2153830-20240904232853152-2122417129.png
创建好以后就来到了控制台，点击AutoPanel 面板，设置默以为清华源。
https://img2024.cnblogs.com/other/2153830/202409/2153830-20240904232858097-2064856395.png
点击选择清华源，因为清华源下载依赖包比较快。
https://img2024.cnblogs.com/other/2153830/202409/2153830-20240904232901092-1414980252.png
接着回到控制台，点击进入JupyterLab。
https://img2024.cnblogs.com/other/2153830/202409/2153830-20240904232903307-912021027.png
进入到autodl-tmp 目次下，然后打开终端。
https://img2024.cnblogs.com/other/2153830/202409/2153830-20240904232908012-296187905.png
然后克隆项目，输入如下命令：
git clone https://github.com/FunAudioLLM/SenseVoice.git假如提示网络超时等，输入如下命令，完了重新拉取代码就好。
source /etc/network_turbo继续打开一个条记本，下载模型。
https://img2024.cnblogs.com/other/2153830/202409/2153830-20240904232914988-1993828126.png
https://img2024.cnblogs.com/other/2153830/202409/2153830-20240904232917662-644513099.png
键入如下代码后运行：
!pip install modelscopehttps://img2024.cnblogs.com/other/2153830/202409/2153830-20240904232918677-492008480.png
继续键入如下代码下载模型：
from modelscope.hub.snapshot_download import snapshot_download

model_dir = snapshot_download("iic/SenseVoiceSmall", cache_dir='ai_models')
print(model_dir)
model_dir = snapshot_download("iic/speech_fsmn_vad_zh-cn-16k-common-pytorch", cache_dir='ai_models')
print(model_dir)https://img2024.cnblogs.com/other/2153830/202409/2153830-20240904232920900-1448490890.png
出现进度条说明模型开始下载了。
https://img2024.cnblogs.com/other/2153830/202409/2153830-20240904232929813-175184499.png
然后回到终端，进入SenseVoice目次。
cd SenseVoice/创建虚拟环境
# 创建一个名为venv 的虚拟环境。
python -m venv venv接着激活虚拟环境。
source ./venv/bin/activate安装依赖
pip install -r requirements.txthttps://img2024.cnblogs.com/other/2153830/202409/2153830-20240904232938487-1290445070.png
安装好依赖以后，我们更新pip
pip install --upgrade piphttps://img2024.cnblogs.com/other/2153830/202409/2153830-20240904232943212-329630724.png
VsCode 远程连接

回到控制台，复制ssh配置。
https://img2024.cnblogs.com/other/2153830/202409/2153830-20240904232946688-1856624602.png
打开Vsocode，远程连接。
https://img2024.cnblogs.com/other/2153830/202409/2153830-20240904232948902-1167029861.png
粘贴登录信息
https://img2024.cnblogs.com/other/2153830/202409/2153830-20240904232950619-383148904.png
选择第一个默认配置。
https://img2024.cnblogs.com/other/2153830/202409/2153830-20240904232954590-715050087.png
选择第一个链接。
https://img2024.cnblogs.com/other/2153830/202409/2153830-20240904232956063-884736007.png
复制密码
https://img2024.cnblogs.com/other/2153830/202409/2153830-20240904232958304-1009930205.png
粘贴密码
https://img2024.cnblogs.com/other/2153830/202409/2153830-20240904232959299-1905861046.png
接着打开文件夹，选择/root/autodl-tmp/
https://img2024.cnblogs.com/other/2153830/202409/2153830-20240904233001074-543591739.png
选择信托
https://img2024.cnblogs.com/other/2153830/202409/2153830-20240904233005743-2082636314.png
点击打开终端
https://img2024.cnblogs.com/other/2153830/202409/2153830-20240904233010704-556925615.png
接着激活虚拟环境。
source ./venv/bin/activate接着回到条记本模型哪里，复制下载的模型路径。
https://img2024.cnblogs.com/other/2153830/202409/2153830-20240904233023823-1384294097.png
回到VsCode ，编辑SenseVoice/webui.py,设置模型的路径为如下：
https://img2024.cnblogs.com/other/2153830/202409/2153830-20240904233026051-1101761275.png
最后，见证古迹的时候到了，运行我们的Python代码。
python webui.py https://img2024.cnblogs.com/other/2153830/202409/2153830-20240904233028769-2014935915.png
选择在浏览器打开。
https://img2024.cnblogs.com/other/2153830/202409/2153830-20240904233032025-317713895.png
接着，就可以快乐的玩耍了。
https://img2024.cnblogs.com/other/2153830/202409/2153830-20240904233043138-650350765.png
当我们上传音频时遇到了错误如下错误：
https://img2024.cnblogs.com/other/2153830/202409/2153830-20240904233050337-1191653323.png
针对安装ffmpeg时遇到的题目,按以下步调操作:

[*]首先更新软件包列表:
sudo apt update
[*]假如更新后仍无法安装,可能需要添加universe仓库:
sudo add-apt-repository universe
sudo apt update
[*]然后再次尝试安装ffmpeg:
sudo apt install ffmpeg -y假如还是不行,可能是ffmpeg所在的仓库没有启用。那么可以尝试:

[*]启用multiverse仓库:
sudo add-apt-repository multiverse
sudo apt update
[*]安装ffmpeg:
sudo apt install ffmpeg本文由博客一文多发平台 OpenWrite 发布！

免责声明：如果侵犯了您的权益，请联系站长，我们会及时删除侵权内容，谢谢合作！更多信息从访问主页：qidao123.com:ToB企服之家，中国第一个企服评测及商务社交产业平台。

页: [1]

ToB企服应用市场:ToB评测及商务社交产业平台's Archiver

AI超强语音转文本SenseVoice，本地化摆设教程！