Config is the same across clients — only the file and path differ.
{
"mcpServers": {
"io-github-manchittlab-thecrawler": {
"args": [
"-y",
"npm"
],
"command": "npx"
}
}
}Are you the author?
Add this badge to your README to show your security score and help users find safe servers.
Scrape any URL and get rich structured data, or extract typed JSON via your own LLM in one call. Open source (AGPL-3.0). $0.005 per page.
Run this in your terminal to verify the server starts. Then let us know if it worked — your result helps other developers.
npx -y 'npm' 2>&1 | head -1 && echo "✓ Server started successfully"
After testing, let us know if it worked:
Five weighted categories — click any category to see the underlying evidence.
Packing does not respect root-level ignore files in workspaces
### Impact `npm pack` ignores root-level `.gitignore` & `.npmignore` file exclusion directives when run in a workspace or with a workspace flag (ie. `--workspaces`, `--workspace=<name>`). Anyone who has run `npm pack` or `npm publish` with workspaces, as of [v7.9.0](https://github.com/npm/cli/releases/tag/v7.9.0) & [v7.13.0](https://github.com/npm/cli/releases/tag/v7.13.0) respectively, may be affected and have published files into the npm registry they did not intend to include. ### Patch - Up
Incorrect Permission Assignment for Critical Resource in NPM
An issue was discovered in an npm 5.7.0 2018-02-21 pre-release (marked as "next: 5.7.0" and therefore automatically installed by an "npm upgrade -g npm" command, and also announced in the vendor's blog without mention of pre-release status). It might allow local users to bypass intended filesystem access restrictions because ownerships of /etc and /usr directories are being changed unexpectedly, related to a "correctMkdir" issue.
Local Privilege Escalation in npm
Affected versions of `npm` use predictable temporary file names during archive unpacking. If an attacker can create a symbolic link at the location of one of these temporary file names, the attacker can arbitrarily write to any file that the user which owns the `npm` process has permission to write to, potentially resulting in local privilege escalation. ## Recommendation Update to version 1.3.3 or later.
npm CLI exposing sensitive information through logs
Versions of the npm CLI prior to 6.14.6 are vulnerable to an information exposure vulnerability through log files. The CLI supports URLs like `<protocol>://[<user>[:<password>]@]<hostname>[:<port>][:][/]<path>`. The password value is not redacted and is printed to stdout and also to any generated log files.
npm Vulnerable to Global node_modules Binary Overwrite
Versions of the npm CLI prior to 6.13.4 are vulnerable to a Global node_modules Binary Overwrite. It fails to prevent existing globally-installed binaries to be overwritten by other package installations. For example, if a package was installed globally and created a `serve` binary, any subsequent installs of packages that also create a `serve` binary would overwrite the first binary. This will not overwrite system binaries but only binaries put into the global node_modules directory. This b
Be the first to review
Have you used this server?
Share your experience — it helps other developers decide.
Sign in to write a review.
Others in browser
Browser automation with Puppeteer for web scraping and testing
Self-hosted URL- and file-to-Markdown service for humans and AI agents - web pages, documents, images, audio, YouTube. PWA + REST + MCP + Claude Code skill, Reddit-aware, refreshable share links.
🔥 Official Firecrawl MCP Server - Adds powerful web scraping and search to Cursor, Claude and any other LLM clients.
The Apify MCP server enables your AI agents to extract data from social media, search engines, maps, e-commerce sites, or any other website using thousands of ready-made scrapers, crawlers, and automation tools available on the Apify Store.
MCP Security Weekly
Get CVE alerts and security updates for io.github.manchittlab/thecrawler and similar servers.
Start a conversation
Ask a question, share a tip, or report an issue.
Sign in to join the discussion.
Scrape web pages, run LLM-powered structured extraction, or diagnose whether URLs are ready for a built-in extraction contract before spending LLM tokens. Open source engine (AGPL-3.0). $0.005 per successfully scraped page on Apify.
Start with a safe test: run one public URL with dryRun: true on Apify, or clone the current GitHub source and run the local CLI/MCP build from engine/. A small proof pack is in examples/diagnostic-challenge, including a sample readiness report at examples/diagnostic-challenge/sample-report.md.
Use this when you need to know whether one real public-web workflow is worth automating before you spend engineering time on extraction.
The public offer thread is GitHub issue #1. The proof pack includes a sample readiness report showing the report shape before a buyer sends URLs.
Public fit checks should use this shape:
Workflow type:
Public URLs (up to 25):
Target output shape / required fields:
Known blockers or constraints:
Timing:
Do not include login credentials, private URLs, personal data, or raw customer data in GitHub issues.
validation.valid, required fields, and missing-field evidence. Current contracts: real-estate-listing, product-page, docs-page.extractBrand: true): one call returns the site's ranked color palette, themeColor, and best-guess logo candidates (JSON-LD / header SVG / favicons / og:image). In Playwright mode it reads rendered colors via getComputedStyle — works on SPAs where static CSS can't. Deterministic, no LLM.onlyMainContent plus includeTags / excludeTags (CSS allow/deny) strip nav, footer, sidebars, and ads from text, markdown, links, and HTML output. Firecrawl-compatible. waitFor alias supported.extractHtml (cleaned, main-content HTML) and extractRawHtml (full serialized DOM) alongside markdown.diagnoseMode to score source readiness, identify blockers, and save a buyer-readable Markdown report before extraction.errorType enum (dns | timeout | rate-limit | blocked-bot | js-required | http-4xx | http-5xx | parse | network | unknown) + errorRetryable boolean. Agents branch programmatically — no regex on error strings.errorType: 'blocked-bot' instead of returning challenge HTML as useful content.