Config is the same across clients — only the file and path differ.
{
"mcpServers": {
"multimodal-mcp": {
"env": {
"OPENAI_API_KEY": "sk-..."
},
"args": [
"@r16t/multimodal-mcp@latest"
],
"command": "npx"
}
}
}Are you the author?
Add this badge to your README to show your security score and help users find safe servers.
Multi-provider media generation MCP server. Generate images, videos, audio, and transcriptions from text prompts using OpenAI, xAI, Gemini, ElevenLabs, and BFL (FLUX) through a single unified interface.
Run this in your terminal to verify the server starts. Then let us know if it worked — your result helps other developers.
npx -y '@r16t/multimodal-mcp' 2>&1 | head -1 && echo "✓ Server started successfully"
After testing, let us know if it worked:
Five weighted categories — click any category to see the underlying evidence.
No known CVEs.
Checked @r16t/multimodal-mcp against OSV.dev.
Be the first to review
Have you used this server?
Share your experience — it helps other developers decide.
Sign in to write a review.
Others in ai-ml / entertainment
Persistent memory using a knowledge graph
Privacy-first. MCP is the protocol for tool access. We're the virtualization layer for context.
An open-source AI agent that brings the power of Gemini directly into your terminal.
Just a Better Chatbot. Powered by Agent & MCP & Workflows.
MCP Security Weekly
Get CVE alerts and security updates for io.github.rsmdt/multimodal and similar servers.
Start a conversation
Ask a question, share a tip, or report an issue.
Sign in to join the discussion.
Multi-provider media generation MCP server. Generate images, videos, audio, and transcriptions from text prompts using OpenAI, xAI, Gemini, ElevenLabs, and BFL (FLUX) through a single unified interface.
Set the API key for at least one provider. Most users only need one — add more to access additional providers.
# Using OpenAI
claude mcp add multimodal-mcp -e OPENAI_API_KEY=sk-... -- npx -y @r16t/multimodal-mcp@latest
# Or using xAI
# claude mcp add multimodal-mcp -e XAI_API_KEY=xai-... -- npx -y @r16t/multimodal-mcp@latest
# Or using Gemini
# claude mcp add multimodal-mcp -e GEMINI_API_KEY=AIza... -- npx -y @r16t/multimodal-mcp@latest
# Or using ElevenLabs (audio + transcription)
# claude mcp add multimodal-mcp -e ELEVENLABS_API_KEY=xi-... -- npx -y @r16t/multimodal-mcp@latest
# Or using BFL/FLUX (images)
# claude mcp add multimodal-mcp -e BFL_API_KEY=... -- npx -y @r16t/multimodal-mcp@latest
Using a different editor? See setup instructions for Claude Desktop, Cursor, VS Code, Windsurf, and Cline.
| Variable | Required | Description |
|---|---|---|
OPENAI_API_KEY | At least one provider key | OpenAI API key — enables image, video, audio generation, and transcription via gpt-image-1, sora-2, tts-1, and whisper-1 |
XAI_API_KEY | At least one provider key | xAI API key — enables image and video generation via grok-imagine-image and grok-imagine-video |
GEMINI_API_KEY | At least one provider key | Gemini API key — enables image, video, and audio generation via imagen-4, veo-3.1, and gemini-2.5-flash-preview-tts |
GOOGLE_API_KEY | — | Alias for GEMINI_API_KEY; either name is accepted |
ELEVENLABS_API_KEY | At least one provider key | ElevenLabs API key — enables audio generation (TTS, sound effects) and transcription via Flash v2.5 and Scribe v1 |
BFL_API_KEY | At least one provider key | BFL API key — enables image generation and editing via FLUX Pro 1.1 and FLUX Kontext |
MEDIA_OUTPUT_DIR | No | Directory for saved media files. Defaults to the current working directory |
generate_imageGenerate an image from a text prompt.
| Parameter | Type | Required | Description |
|---|---|---|---|
prompt | string | Yes | Text description of the image to generate |
provider | string | No | Provider to use: openai, xai, google, bfl. Auto-selects if omitted |
aspectRatio | string | No | Aspect ratio: 1:1, 16:9, 9:16, 4:3, 3:4 |
quality | string | No | Quality level: low, standard, high |
outputDirectory | string | No | Directory to save the generated file. Absolute or relative path. Defaults to MEDIA_OUTPUT_DIR or cwd |
providerOptions | object | No | Provider-specific parameters passed through directly |
generate_videoGenerate a video from a text prompt. Video generation is asynchronous and may take several minutes.
| Parameter | Type | Required | Description |
|---|---|---|---|
prompt | string | Yes | Text description of the video |