Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.
This server has been archived and is no longer actively maintained.
Config is the same across clients — only the file and path differ.
{
"mcpServers": {
"funasr": {
"args": [
"torch"
],
"command": "uvx"
}
}
}Are you the author?
Add this badge to your README to show your security score and help users find safe servers.
(简体中文|English|日本語|한국어)
This server supports HTTP transport. Be the first to test it — help the community know if it works.
Five weighted categories — click any category to see the underlying evidence.
PYSEC-2026-139
A vulnerability was identified in PyTorch 2.10.0. The affected element is an unknown function of the component pt2 Loading Handler. The manipulation leads to deserialization. The attack can only be performed from a local environment. The exploit is publicly available and might be used. The project was informed of the problem early through a pull request but has not reacted yet.
>= 0source →PYSEC-2025-210
An issue was discovered in PyTorch v2.5 and v2.7.1. Omission of profiler.stop() can cause torch.profiler.profile (PythonTracer) to crash or hang during finalization, leading to a Denial of Service (DoS).
>= 0source →PYSEC-2025-209
An issue in pytorch v2.7.0 can lead to a Denial of Service (DoS) when a PyTorch model consists of torch.Tensor.to_sparse() and torch.Tensor.to_dense() and is compiled by Inductor.
PYSEC-2025-208
A buffer overflow occurs in pytorch v2.7.0 when a PyTorch model consists of torch.nn.Conv2d, torch.nn.functional.hardshrink, and torch.Tensor.view-torch.mv() and is compiled by Inductor, leading to a Denial of Service (DoS).
PYSEC-2025-207
A Name Error occurs in pytorch v2.7.0 when a PyTorch model consists of torch.cummin and is compiled by Inductor, leading to a Denial of Service (DoS).
Be the first to review
Have you used this server?
Share your experience — it helps other developers decide.
Sign in to write a review.
Others in ai-ml
Dynamic problem-solving through sequential thought chains
Workspace template + MCP server for Claude Code, Codex CLI, Cursor & Windsurf. Multi-agent knowledge engine (ag-refresh / ag-ask) that turns any codebase into a queryable AI assistant.
Persistent memory using a knowledge graph
Privacy-first. MCP is the protocol for tool access. We're the virtualization layer for context.
MCP Security Weekly
Get CVE alerts and security updates for FunASR and similar servers.
Start a conversation
Ask a question, share a tip, or report an issue.
Sign in to join the discussion.
Industrial speech recognition. Up to 340x realtime, 26x faster than Whisper. 50+ languages.
Speaker diarization · Emotion detection · Streaming · One API call
Quick Start · Colab · Benchmark · Model selection · Migration guide · Use cases · Deployment matrix · Models · Agent Integration · Docs · Contribute
No local setup? Open the Colab quickstart to transcribe a public sample or upload your own audio in a browser.
pip install torch torchaudio
pip install funasr
Flagship model — Fun-ASR-Nano (LLM-ASR, 31 languages; the default recommendation, needs a GPU):
from funasr import AutoModel
model = AutoModel(model="FunAudioLLM/Fun-ASR-Nano-2512", device="cuda")
result = model.generate(input="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav")
print(result[0]["text"])
# 欢迎大家来体验达摩院推出的语音识别模型。
On CPU (or for multilingual + emotion in one pass), use SenseVoice — which also returns speaker diarization and timestamps:
from funasr import AutoModel
from funasr.utils.postprocess_utils import rich_transcription_postprocess
model = AutoModel(model="iic/SenseVoiceSmall", vad_model="fsmn-vad", spk_model="cam++", device="cuda") # use device="cpu" if you don't have a GPU
result = model.generate(
input="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav",
batch_size_s=300,
)
# One call returns VAD segments with speaker id + timestamps — render them however you like:
for seg in result[0]["sentence_info"]:
print(f"[{seg['start']/1000:.1f}s] Speaker {seg['spk']}: {rich_transcription_postprocess(seg['sentence'])}")
Output — structured text with speaker labels, timestamps, and punctuation:
[0.6s] Speaker 0: 欢迎大家来体验达摩院推出的语音识别模型
That's it. One model, one call — VAD segmentation, speech recognition, punctuation, speaker diarization all happen automatically.
At scale, accelerate Fun-ASR-Nano with vLLM (batch processing):
from funasr.auto.auto_model_vllm import AutoModelVLLM
model = AutoModelVLLM(model="FunAudioLLM/Fun-ASR-Nano-2512", t
... [View full README on GitHub](https://github.com/modelscope/FunASR#readme)