Is io.github.manchittlab/thecrawler safe to use?

io.github.manchittlab/thecrawler has no known CVEs as of the latest MCPpedia security scan. It does not require authentication, so any local process can connect — keep this in mind in shared environments.

How do I install io.github.manchittlab/thecrawler?

io.github.manchittlab/thecrawler supports copy-paste install configs on its MCPpedia page for Claude Desktop, Cursor, and Claude Code. Scroll to the Quick Install section and select your client.

What AI clients work with io.github.manchittlab/thecrawler?

io.github.manchittlab/thecrawler is compatible with claude-desktop, cursor, claude-code. It uses http transport.

Is io.github.manchittlab/thecrawler actively maintained?

io.github.manchittlab/thecrawler is actively maintained — last commit was 0 days ago.

io.github.manchittlab/thecrawler

npm

Universal web scraper with LLM-ready markdown, RAG chunking, PDF/DOCX support.

14.9M/wk GitHub npm

No known CVEs

No license

Actively maintained

Last commit 0d ago

Works with most clients

Transport: http

0 tools

Grade F

Edit this pageView history

Browser

Step 1

Install in your client

Config is the same across clients — only the file and path differ.

Supported in Claude Desktophttp · Node 18+

Paste into ~/Library/Application Support/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "io-github-manchittlab-thecrawler": {
      "args": [
        "-y",
        "npm"
      ],
      "command": "npx"
    }
  }
}

Are you the author?

Add this badge to your README to show your security score and help users find safe servers.

Embed in your READMEAbout badges →

[![MCPpedia Score](https://mcppedia.org/api/badge/io-github-manchittlab-thecrawler)](https://mcppedia.org/s/io-github-manchittlab-thecrawler)

Read me

What io.github.manchittlab/thecrawler does

Scrape any URL and get rich structured data, or extract typed JSON via your own LLM in one call. Open source (AGPL-3.0). $0.005 per page.

Test This Server

Run this in your terminal to verify the server starts. Then let us know if it worked — your result helps other developers.

npx -y 'npm' 2>&1 | head -1 && echo "✓ Server started successfully"

After testing, let us know if it worked:

Loading README…

Scored, not listed

Why this score

Five weighted categories — click any category to see the underlying evidence.

Score breakdown

60/100across 5 weighted dimensions

How we score →

0255075100

−40

Security

Maintenance

Efficiency

Documentation

Compatibility

Categoriesclick a row to see evidence

Security

OSV.dev

8 fixed

CVE-2022-29244high · fixedCVSS 7.5

Packing does not respect root-level ignore files in workspaces

### Impact `npm pack` ignores root-level `.gitignore` & `.npmignore` file exclusion directives when run in a workspace or with a workspace flag (ie. `--workspaces`, `--workspace=<name>`). Anyone who has run `npm pack` or `npm publish` with workspaces, as of [v7.9.0](https://github.com/npm/cli/releases/tag/v7.9.0) & [v7.13.0](https://github.com/npm/cli/releases/tag/v7.13.0) respectively, may be affected and have published files into the npm registry they did not intend to include. ### Patch - Up

Affected: >= 7.9.0Fixed in 8.11.0source →

CVE-2018-7408high · fixedCVSS 7.8

Incorrect Permission Assignment for Critical Resource in NPM

An issue was discovered in an npm 5.7.0 2018-02-21 pre-release (marked as "next: 5.7.0" and therefore automatically installed by an "npm upgrade -g npm" command, and also announced in the vendor's blog without mention of pre-release status). It might allow local users to bypass intended filesystem access restrictions because ownerships of /etc and /usr directories are being changed unexpectedly, related to a "correctMkdir" issue.

Affected: >= 0Fixed in 5.7.1source →

CVE-2013-4116info · fixed

Local Privilege Escalation in npm

Affected versions of `npm` use predictable temporary file names during archive unpacking. If an attacker can create a symbolic link at the location of one of these temporary file names, the attacker can arbitrarily write to any file that the user which owns the `npm` process has permission to write to, potentially resulting in local privilege escalation. ## Recommendation Update to version 1.3.3 or later.

Affected: >= 0Fixed in 1.3.3source →

CVE-2020-15095medium · fixedCVSS 4.4

npm CLI exposing sensitive information through logs

Versions of the npm CLI prior to 6.14.6 are vulnerable to an information exposure vulnerability through log files. The CLI supports URLs like `<protocol>://[<user>[:<password>]@]<hostname>[:<port>][:][/]<path>`. The password value is not redacted and is printed to stdout and also to any generated log files.

Affected: >= 0Fixed in 6.14.6source →

CVE-2019-16777high · fixedCVSS 7.7

npm Vulnerable to Global node_modules Binary Overwrite

Versions of the npm CLI prior to 6.13.4 are vulnerable to a Global node_modules Binary Overwrite. It fails to prevent existing globally-installed binaries to be overwritten by other package installations. For example, if a package was installed globally and created a `serve` binary, any subsequent installs of packages that also create a `serve` binary would overwrite the first binary. This will not overwrite system binaries but only binaries put into the global node_modules directory. This b

Affected: >= 0Fixed in 6.13.4source →

Community

Reviews

Be the first to review

Have you used this server?

Share your experience — it helps other developers decide.

How easy was setup?Did it work reliably?How was the documentation?

Frequently Asked Questions

Is io.github.manchittlab/thecrawler safe to use?: io.github.manchittlab/thecrawler has no known CVEs as of the latest MCPpedia security scan. It does not require authentication, so any local process can connect — keep this in mind in shared environments.
How do I install io.github.manchittlab/thecrawler?: io.github.manchittlab/thecrawler supports copy-paste install configs on its MCPpedia page for Claude Desktop, Cursor, and Claude Code. Scroll to the Quick Install section and select your client.
What AI clients work with io.github.manchittlab/thecrawler?: io.github.manchittlab/thecrawler is compatible with claude-desktop, cursor, claude-code. It uses http transport.
Is io.github.manchittlab/thecrawler actively maintained?: io.github.manchittlab/thecrawler is actively maintained — last commit was 0 days ago.

Similar servers

Others in browser

View all →

Puppeteer MCP Server96

Browser automation with Puppeteer for web scraping and testing

87.8k 5

Pullmd95

Self-hosted URL- and file-to-Markdown service for humans and AI agents - web pages, documents, images, audio, YouTube. PWA + REST + MCP + Claude Code skill, Reddit-aware, refreshable share links.

194 3

Firecrawl Mcp Server95

🔥 Official Firecrawl MCP Server - Adds powerful web scraping and search to Cursor, Claude and any other LLM clients.

6.7k 4

Apify Mcp Server94

The Apify MCP server enables your AI agents to extract data from social media, search engines, maps, e-commerce sites, or any other website using thousands of ready-made scrapers, crawlers, and automation tools available on the Apify Store.

1.4k 1

MCP Security Weekly

Get CVE alerts and security updates for io.github.manchittlab/thecrawler and similar servers.

Community

Discussion

Start a conversation

Ask a question, share a tip, or report an issue.

Has anyone used this with Cursor?How do you handle auth?Any alternatives?

Edit this pageView history

Browser

Step 1

Install in your client

Config is the same across clients — only the file and path differ.

Supported in Claude Desktophttp · Node 18+

Paste into ~/Library/Application Support/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "io-github-manchittlab-thecrawler": {
      "args": [
        "-y",
        "npm"
      ],
      "command": "npx"
    }
  }
}

Are you the author?

Add this badge to your README to show your security score and help users find safe servers.

Embed in your READMEAbout badges →

[![MCPpedia Score](https://mcppedia.org/api/badge/io-github-manchittlab-thecrawler)](https://mcppedia.org/s/io-github-manchittlab-thecrawler)

Read me

What io.github.manchittlab/thecrawler does

Scrape any URL and get rich structured data, or extract typed JSON via your own LLM in one call. Open source (AGPL-3.0). $0.005 per page.

Test This Server

Run this in your terminal to verify the server starts. Then let us know if it worked — your result helps other developers.

npx -y 'npm' 2>&1 | head -1 && echo "✓ Server started successfully"

After testing, let us know if it worked:

README

TheCrawler — AI-ready web scraper with validated extraction contracts

Scrape web pages, run LLM-powered structured extraction, or diagnose whether URLs are ready for a built-in extraction contract before spending LLM tokens. Open source engine (AGPL-3.0). $0.005 per successfully scraped page on Apify.

Start with a safe test: run one public URL with dryRun: true on Apify, or clone the current GitHub source and run the local CLI/MCP build from engine/. A small proof pack is in examples/diagnostic-challenge, including a sample readiness report at examples/diagnostic-challenge/sample-report.md.

$500 extraction readiness sprint

Use this when you need to know whether one real public-web workflow is worth automating before you spend engineering time on extraction.

Scope: up to 25 public URLs and one target output shape.
First step: send a public fit check through the structured issue form or use the private fit-check path on the sprint page.
Payment: requested only after the workflow looks like a fit, by one-off $500 payment link or invoice.
Output: a readiness report with ready, mixed, blocked, or not-worth-automating-yet guidance.
Credit: if the workflow continues into setup or hosted usage, the $500 is credited toward that next step.

The public offer thread is GitHub issue #1. The proof pack includes a sample readiness report showing the report shape before a buyer sends URLs.

Public fit checks should use this shape:

Workflow type:
Public URLs (up to 25):
Target output shape / required fields:
Known blockers or constraints:
Timing:

Do not include login credentials, private URLs, personal data, or raw customer data in GitHub issues.

What makes this different

Validated extraction contracts: select a built-in contract, get normalized data plus validation.valid, required fields, and missing-field evidence. Current contracts: real-estate-listing, product-page, docs-page.
Brand identity extraction (extractBrand: true): one call returns the site's ranked color palette, themeColor, and best-guess logo candidates (JSON-LD / header SVG / favicons / og:image). In Playwright mode it reads rendered colors via getComputedStyle — works on SPAs where static CSS can't. Deterministic, no LLM.
Content controls: onlyMainContent plus includeTags / excludeTags (CSS allow/deny) strip nav, footer, sidebars, and ads from text, markdown, links, and HTML output. Firecrawl-compatible. waitFor alias supported.
HTML formats: extractHtml (cleaned, main-content HTML) and extractRawHtml (full serialized DOM) alongside markdown.
No-LLM diagnostics: run diagnoseMode to score source readiness, identify blockers, and save a buyer-readable Markdown report before extraction.
LLM-powered extraction: send a JSON Schema or use a contract, get parsed typed data back. Endpoint-agnostic — point at OpenAI, your own llama.cpp / vLLM / LM Studio / Ollama. You bring the LLM, no vendor lock-in.
Adaptive crawling: Cheerio first (fast HTTP+parse), auto-fall-back to Playwright when an SPA shell is detected. Keeps browser rendering optional instead of mandatory for every page.
Structured errors: errorType enum (dns | timeout | rate-limit | blocked-bot | js-required | http-4xx | http-5xx | parse | network | unknown) + errorRetryable boolean. Agents branch programmatically — no regex on error strings.
Challenge-page detection: 200 OK responses with access-control or challenge-page bodies are flagged as errorType: 'blocked-bot' instead of returning challenge HTML as useful content.
**Out-of-box extractors

... View full README on GitHub

Loading README…

Scored, not listed

Why this score

Five weighted categories — click any category to see the underlying evidence.

Score breakdown

60/100across 5 weighted dimensions

How we score →

0255075100

−40

Security

Maintenance

Efficiency

Documentation

Compatibility

Categoriesclick a row to see evidence

Security

OSV.dev

8 fixed

CVE-2022-29244high · fixedCVSS 7.5

Packing does not respect root-level ignore files in workspaces

Affected: >= 7.9.0Fixed in 8.11.0source →

CVE-2018-7408high · fixedCVSS 7.8

Incorrect Permission Assignment for Critical Resource in NPM

Affected: >= 0Fixed in 5.7.1source →

CVE-2013-4116info · fixed

Local Privilege Escalation in npm

Affected: >= 0Fixed in 1.3.3source →

CVE-2020-15095medium · fixedCVSS 4.4

npm CLI exposing sensitive information through logs

Affected: >= 0Fixed in 6.14.6source →

CVE-2019-16777high · fixedCVSS 7.7

npm Vulnerable to Global node_modules Binary Overwrite

Affected: >= 0Fixed in 6.13.4source →

Community

Reviews

Be the first to review

Have you used this server?

Share your experience — it helps other developers decide.

How easy was setup?Did it work reliably?How was the documentation?

Frequently Asked Questions

Is io.github.manchittlab/thecrawler safe to use?: io.github.manchittlab/thecrawler has no known CVEs as of the latest MCPpedia security scan. It does not require authentication, so any local process can connect — keep this in mind in shared environments.
How do I install io.github.manchittlab/thecrawler?: io.github.manchittlab/thecrawler supports copy-paste install configs on its MCPpedia page for Claude Desktop, Cursor, and Claude Code. Scroll to the Quick Install section and select your client.
What AI clients work with io.github.manchittlab/thecrawler?: io.github.manchittlab/thecrawler is compatible with claude-desktop, cursor, claude-code. It uses http transport.
Is io.github.manchittlab/thecrawler actively maintained?: io.github.manchittlab/thecrawler is actively maintained — last commit was 0 days ago.

Similar servers

Others in browser

View all →

Puppeteer MCP Server96

Browser automation with Puppeteer for web scraping and testing

87.8k 5

Pullmd95

Self-hosted URL- and file-to-Markdown service for humans and AI agents - web pages, documents, images, audio, YouTube. PWA + REST + MCP + Claude Code skill, Reddit-aware, refreshable share links.

194 3

Firecrawl Mcp Server95

🔥 Official Firecrawl MCP Server - Adds powerful web scraping and search to Cursor, Claude and any other LLM clients.

6.7k 4

Apify Mcp Server94

1.4k 1

MCP Security Weekly

Get CVE alerts and security updates for io.github.manchittlab/thecrawler and similar servers.

Community