当前位置: 首页 > news >正文

FunASR基础语音识别工具包

一、核心功能

  • FunASR是一个基础语音识别工具包,提供多种功能,包括语音识别(ASR)、语音端点检测(VAD)、标点恢复、语言模型、说话人验证、说话人分离和多人对话语音识别等。FunASR提供了便捷的脚本和教程,支持预训练好的模型的推理与微调。

代码下载地址:modelscope/FunASR: A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

模型下载地址:funasr/paraformer-zh-streaming · HF Mirror

二、安装教程

pip3 install -U funasr
或者
git clone https://github.com/alibaba/FunASR.git && cd FunASR
pip3 install -e ./pip3 install -U modelscope

三、模型仓库

四、上手教程

1.可执行命令行

funasr +model=paraformer-zh +vad_model="fsmn-vad" +punc_model="ct-punc" +input=asr_example_zh.wav

2.非实时语音识别

from funasr import AutoModel
# paraformer-zh is a multi-functional asr model
# use vad, punc, spk or not as you need
model = AutoModel(model="paraformer-zh", model_revision="v2.0.4",vad_model="fsmn-vad", vad_model_revision="v2.0.4",punc_model="ct-punc-c", punc_model_revision="v2.0.4",# spk_model="cam++", spk_model_revision="v2.0.2",)
res = model.generate(input=f"{model.model_path}/example/asr_example.wav", batch_size_s=300, hotword='魔搭')
print(res)

3.实时语音识别

from funasr import AutoModelchunk_size = [0, 10, 5] #[0, 10, 5] 600ms, [0, 8, 4] 480ms
encoder_chunk_look_back = 4 #number of chunks to lookback for encoder self-attention
decoder_chunk_look_back = 1 #number of encoder chunks to lookback for decoder cross-attentionmodel = AutoModel(model="paraformer-zh-streaming", model_revision="v2.0.4")import soundfile
import oswav_file = os.path.join(model.model_path, "example/asr_example.wav")
speech, sample_rate = soundfile.read(wav_file)
chunk_stride = chunk_size[1] * 960 # 600mscache = {}
total_chunk_num = int(len((speech)-1)/chunk_stride+1)
for i in range(total_chunk_num):speech_chunk = speech[i*chunk_stride:(i+1)*chunk_stride]is_final = i == total_chunk_num - 1res = model.generate(input=speech_chunk, cache=cache, is_final=is_final, chunk_size=chunk_size, encoder_chunk_look_back=encoder_chunk_look_back, decoder_chunk_look_back=decoder_chunk_look_back)print(res)

4.语音端点检测(非实时)

from funasr import AutoModelmodel = AutoModel(model="fsmn-vad", model_revision="v2.0.4")
wav_file = f"{model.model_path}/example/asr_example.wav"
res = model.generate(input=wav_file)
print(res)

5.语音端点检测(实时)

from funasr import AutoModelchunk_size = 200 # ms
model = AutoModel(model="fsmn-vad", model_revision="v2.0.4")import soundfilewav_file = f"{model.model_path}/example/vad_example.wav"
speech, sample_rate = soundfile.read(wav_file)
chunk_stride = int(chunk_size * sample_rate / 1000)cache = {}
total_chunk_num = int(len((speech)-1)/chunk_stride+1)
for i in range(total_chunk_num):speech_chunk = speech[i*chunk_stride:(i+1)*chunk_stride]is_final = i == total_chunk_num - 1res = model.generate(input=speech_chunk, cache=cache, is_final=is_final, chunk_size=chunk_size)if len(res[0]["value"]):print(res)

6.标点恢复

from funasr import AutoModelmodel = AutoModel(model="ct-punc", model_revision="v2.0.4")
res = model.generate(input="那今天的会就到这里吧 happy new year 明年见")
print(res)

7.时间戳预测

from funasr import AutoModelmodel = AutoModel(model="fa-zh", model_revision="v2.0.4")
wav_file = f"{model.model_path}/example/asr_example.wav"
text_file = f"{model.model_path}/example/text.txt"
res = model.generate(input=(wav_file, text_file), data_type=("sound", "text"))
print(res)

8.情感识别

from funasr import AutoModelmodel = AutoModel(model="emotion2vec_plus_large")wav_file = f"{model.model_path}/example/test.wav"res = model.generate(wav_file, output_dir="./outputs", granularity="utterance", extract_embedding=False)
print(res)

http://www.xdnf.cn/news/1365517.html

相关文章:

  • 【Canvas与标牌】维兰德汤谷公司logo
  • JavaScript 中类(class)的super 关键字
  • 【YOLOv5部署至RK3588】模型训练→转换RKNN→开发板部署
  • UniApp文件上传大小限制问题解决方案
  • kafka 副本集设置和理解
  • kafka常用命令
  • 宋红康 JVM 笔记 Day07|本地方法接口、本地方法栈
  • Linux(四):进程状态
  • python项目中pyproject.toml是做什么用的
  • SDC命令详解:使用set_timing_derate命令进行约束
  • K8s高可用:Master与候选节点核心解析
  • 基于MalConv的恶意软件检测系统设计与实现
  • 力扣(用队列实现栈)
  • SSH 反向隧道:快速解决服务器网络限制
  • 蜗牛播放器 Android TV:解决大屏观影痛点的利器
  • 【科研绘图系列】R语言绘制代谢物与临床表型相关性的森林图
  • 从0死磕全栈第1天:从写一个React的hello world开始
  • leetcode 238 除自身以外数组的乘积
  • PHP学习笔记1
  • 基于MATLAB实现支持向量机(SVM)进行预测备
  • 数据结构青铜到王者第三话---ArrayList与顺序表(1)
  • 【数学·三角函数】两角和差公式 二倍角公式
  • idea官网选择具体版本的下载步骤
  • easy-dataset的安装
  • 【STM32】G030单片机的独立看门狗
  • 不止效率工具:AI 在文化创作中如何重构 “灵感逻辑”?
  • 《拉康精神分析学中的欲望辩证法:能指的拓扑学与主体的解构性重构》
  • 【科研绘图系列】R语言浮游植物生态数据的统计与可视化
  • [系统架构设计师]专业英语(二十二)
  • 系统架构设计师-计算机系统存储管理-页式、段氏、段页式模拟题