MCP Server leveraging crawl4ai for web scraping and LLM-based content extraction (Markdown, text snippets, smart extraction). Designed for AI agent integration.
Config is the same across clients — only the file and path differ.
{
"mcpServers": {
"web-scraping-mcp": {
"command": "<see-readme>",
"args": []
}
}
}Are you the author?
Add this badge to your README to show your security score and help users find safe servers.
This project provides an MCP (Model Context Protocol) server that uses the crawl4ai library to perform web scraping and intelligent content extraction tasks. It allows AI agents (like Claude, or agents built with LangChain/LangGraph) to interact with web pages, retrieve content, search for specific text, and perform LLM-based extraction based on natural language instructions.
No automated test available for this server. Check the GitHub README for setup instructions.
Five weighted categories — click any category to see the underlying evidence.
No known CVEs.
No package registry to scan.
Click any tool to inspect its schema.
Be the first to review
Have you used this server?
Share your experience — it helps other developers decide.
Sign in to write a review.
Others in browser
🔥 Official Firecrawl MCP Server - Adds powerful web scraping and search to Cursor, Claude and any other LLM clients.
MCP server for Firecrawl — search, scrape, and interact with the web. Supports both cloud and self-hosted instances. Features include web search, scraping, page interaction, batch processing, and LLM-powered content analysis.
The Apify MCP server enables your AI agents to extract data from social media, search engines, maps, e-commerce sites, or any other website using thousands of ready-made scrapers, crawlers, and automation tools available on the Apify Store.
Multi-engine MCP server, CLI, and local daemon for agent web search and content retrieval — skill-guided workflows, no API keys.
MCP Security Weekly
Get CVE alerts and security updates for WEB SCRAPING MCP and similar servers.
Start a conversation
Ask a question, share a tip, or report an issue.
Sign in to join the discussion.
This project provides an MCP (Model Context Protocol) server that uses the crawl4ai library to perform web scraping and intelligent content extraction tasks. It allows AI agents (like Claude, or agents built with LangChain/LangGraph) to interact with web pages, retrieve content, search for specific text, and perform LLM-based extraction based on natural language instructions.
This server uses:
.env file.scrape_url: Get the full content of a webpage in Markdown format.extract_text_by_query: Find specific text snippets on a page based on a query.smart_extract: Use an LLM (currently Google Gemini) to extract structured information based on instructions.Dockerfile) for easy, self-contained deployment.scrape_urlScrape a webpage and return its content in Markdown format.
Arguments:
url (str, required): The URL of the webpage to scrape.Returns:
extract_text_by_queryExtract relevant text snippets from a webpage that contain a specific search query. Returns up to the first 5 matches found.
Arguments:
url (str, required): The URL of the webpage to search within.query (str, required): The text query to search for (case-insensitive).context_size (int, optional): The number of characters to include before and after the matched query text in each snippet. Defaults to 300.Returns:
smart_extractIntelligently extract specific information from a webpage using the configured LLM (currently requires Google Gemini API key) based on a natural language instruction.
Arguments:
url (str, required): The URL of the webpage to analyze and extract from.instruction (str, required): Natural language instruction specifying what information to extract (e.g., "List all the speakers mentioned on this page", "Extract the main contact email address", "Summarize the key findings").Returns:
You can run this server either locally or using the provided Docker configuration.
This method bundles Python and all necessary libraries. You only need Docker installed on the host machine.
git clone https://github.com/your-username/your-repo-name.git # Replace with your repo URL
cd your-repo-name
.env File: Create a file named .env in the project root directory and add your API keys:
# Required fo