Give Claude the ability to watch and understand videos — Claude Code plugin with frame extraction and multimodal audio analysis
Config is the same across clients — only the file and path differ.
{
"mcpServers": {
"claude-video-vision": {
"command": "<see-readme>",
"args": []
}
}
}Are you the author?
Add this badge to your README to show your security score and help users find safe servers.
Give Claude the ability to watch and understand videos.
This server supports HTTP transport. Be the first to test it — help the community know if it works.
Five weighted categories — click any category to see the underlying evidence.
No known CVEs.
No package registry to scan.
Click any tool to inspect its schema.
Be the first to review
Have you used this server?
Share your experience — it helps other developers decide.
Sign in to write a review.
Others in ai-ml / entertainment
A Model Context Protocol (MCP) server and CLI that provides tools for agent use when working on iOS and macOS projects.
Dynamic problem-solving through sequential thought chains
A Model Context Protocol server for searching and analyzing arXiv papers
The official Python SDK for Model Context Protocol servers and clients
MCP Security Weekly
Get CVE alerts and security updates for Claude Video Vision and similar servers.
Start a conversation
Ask a question, share a tip, or report an issue.
Sign in to join the discussion.
Give Claude the ability to watch and understand videos.
A Claude Code plugin that extracts frames via ffmpeg and processes audio via multiple backends (Gemini API, local Whisper, or OpenAI API). Claude receives frames as images and audio transcription with timestamps — the plugin is a perception layer, not an interpretation layer.
/setup-video-vision walks you through configurationInside Claude Code, run these commands one at a time:
/plugin marketplace add https://github.com/jordanrendric/claude-video-vision
Then:
/plugin install claude-video-vision
The MCP server will auto-install via npx from npm on first use — no build step required.
Alternative: local development
git clone https://github.com/jordanrendric/claude-video-vision.git
claude --plugin-dir /path/to/claude-video-vision
Inside Claude Code, run the interactive wizard:
/setup-video-vision
It will walk you through backend selection, whisper configuration (if local), frame options, and dependency verification.
/watch-video path/to/video.mp4
/watch-video tutorial.mp4 "what language is used in this tutorial?"
Just mention a video file — Claude will detect it:
"analyze this video for me: ~/Downloads/demo.mp4"
"take a look at the first second of ~/videos/bug-report.mov"
Claude adapts parameters automatically:
00:00:00 to 00:00:01| Backend | Audio processing | Cost | Setup |
|---|---|---|---|
| Gemini API | Native (speech + non-speech events) | Free tier: 1500 req/day | GEMINI_API_KEY env var |
| Local (Whisper) | whisper.cpp or Python openai-whisper | Free, fully offline | brew install whisper-cpp + auto model download |
| OpenAI API | OpenAI Whisper API | Paid per usage | OPENAI_API_KEY env var |
All backends extract video frames via ffmpeg — Claude always has direct visual access.
┌───────────────────────────────────────────────────────┐
│ Claude Code (your session) │
│ │
│ /watch-video ──→ Skill: video-perception │
│ │ │
│ ▼ │
│ MCP tool: video_watch │
│ │ │
└────────────────────────┼──────────────────────────────┘
│
▼
┌────────────────────────────────────┐
│ MCP Server (Node.js) │
│ │
│ ┌──────────┐ ┌──────────────┐ │
│ │ ffmpeg │ │ Audio backend│ │
│ │ frames │ ║ │ (parallel) │ │
│ └──────────┘ └──────────────┘ │
│ │ │ │
└───────┼─────────────────┼──────────┘
▼ ▼
base64 images transcription
+ timestamps + audio events
│ │
└────────┬────────┘
... [View full README on GitHub](https://github.com/jordanrendric/claude-video-vision#readme)