Back to products
youtube-mcp-server

youtube-mcp-server

MCP server for YouTube video transcription and metadata.

Overview

What it is

A powerful Model Context Protocol (MCP) server for YouTube video transcription and metadata extraction.

Intent

I need it when

Extract YouTube video metadata and transcripts programmatically for AI agents

youtube-mcp-server provides MCP tools (get_video_info, transcribe_video) that allow AI agents to retrieve video titles, descriptions, view counts, and full transcriptions without downloading videos, enabling automated content analysis and processing workflows

Process YouTube videos efficiently without storing large files locally

The server uses in-memory audio processing with yt-dlp extraction, intelligent file-based caching, and parallel segment transcription, eliminating disk I/O overhead while supporting hardware acceleration (CUDA/MPS) for fast inference

Transcribe YouTube videos in multiple languages with high accuracy

The server uses OpenAI Whisper with multilingual support for 99 languages, VAD-based segmentation for precise audio detection, and optional translation capabilities, allowing users to transcribe and translate videos efficiently with configurable model sizes

Integrate YouTube video analysis into existing development workflows

youtube-mcp-server runs as a local SSE server on port 8000, integrates with MCP clients, and provides JSON-structured outputs for video metadata and transcriptions, making it compatible with existing development tools and CI/CD pipelines

Build AI-powered applications that understand YouTube video content

youtube-mcp-server integrates with Model Context Protocol, enabling developers to embed video understanding into AI agents and LLM applications through standardized MCP tools for metadata and transcription retrieval

Drop

Not a fit when

  • User needs commercial support or SLA guarantees for production systems
  • User requires a managed cloud service without self-hosting infrastructure
  • User needs real-time video streaming or live transcription capabilities
  • User lacks Python 3.10+ environment or cannot install ffmpeg dependencies
  • User needs transcription of videos with heavy background noise or poor audio quality without manual tuning
Commercials

Pricing

Free, open-source software