Config is the same across clients — only the file and path differ.
{
"mcpServers": {
"octen": {
"env": {
"OCTEN_API_KEY": "your-key-here"
},
"args": [
"-y",
"octen-mcp"
],
"command": "npx"
}
}
}Are you the author?
Add this badge to your README to show your security score and help users find safe servers.
MCP server for Octen Extract — turn any URL into clean, LLM-ready markdown. Plug into Claude / Cursor / VS Code / Windsurf and let the model pull the live web.
Run this in your terminal to verify the server starts. Then let us know if it worked — your result helps other developers.
npx -y 'octen-mcp' 2>&1 | head -1 && echo "✓ Server started successfully"
After testing, let us know if it worked:
Five weighted categories — click any category to see the underlying evidence.
No known CVEs.
Checked octen-mcp against OSV.dev.
Be the first to review
Have you used this server?
Share your experience — it helps other developers decide.
Sign in to write a review.
Others in writing
A markdown editor — and the bridge to your LLM. Local-first, MIT, ~15 MB. Bundled MCP server lets Claude Code / Codex / Cursor drive your vault directly. 14 AI providers BYOK.
f.k.a. Awesome ChatGPT Prompts. Share, discover, and collect prompts from the community. Free and open source — self-host for your organization with complete privacy.
Define task-specific AI sub-agents in Markdown for any MCP-compatible tool.
一键同步文章到多个内容平台,支持今日头条、WordPress、知乎、简书、掘金、CSDN、typecho各大平台,一次发布,多平台同步发布。解放个人生产力
MCP Security Weekly
Get CVE alerts and security updates for io.github.Octen-Team/octen-mcp and similar servers.
Start a conversation
Ask a question, share a tip, or report an issue.
Sign in to join the discussion.
MCP server for Octen Extract — turn any URL into clean, LLM-ready markdown. Plug into Claude / Cursor / VS Code / Windsurf and let the model pull the live web.
Most extract tools (Firecrawl, Jina Reader, Exa, Tavily) hand you the page body. Octen returns the body plus structured page labels in the same call:
category — topical labels with subcategories (e.g., Computers, Electronics & Technology / Artificial Intelligence, Health, Finance, Travel). Use to skip out-of-vertical pages in RAG pipelines — a finance pipeline can filter out random forum / entertainment pages before embedding.
page_structure — what kind of page this actually is (e.g., Content Page / Article, Homepage, Index Page, No Main Content). Use to skip listing/navigation pages, dead links, and login-wall shells before paying for LLM calls — in real RAG pipelines, a meaningful share of fetched URLs (often 20–30%) are index pages or content-less shells.
highlights — pass a query and get the most relevant snippets ranked per page instead of the full body (cheaper context, better signal).
The two labels move filtering upstream — instead of fetching everything, embedding it, then realizing a chunk of pages are useless, you skip them at fetch time. None of category / page_structure / highlights exist in Firecrawl, Jina, Exa, or Tavily today.
success isn't enoughA common failure mode for extract pipelines: the request returns success, the response body is non-empty, but the page is actually a login wall, paywall, JS shell, or "we'll be right back" stub. The agent has no signal until it pays for an LLM call to discover the page has nothing to summarize. Octen flags these at fetch time.
Take https://github.com/login — visually it looks like a normal page:

But there's no main content to extract — it's a sign-in form. Same URL on both APIs returns very different signals:
Firecrawl /v1/scrape | Octen /extract (this server) |
|---|---|
![]() | ![]() |
That single page_structure: "No Main Content" lets the agent skip the page without an LLM call. With other tools, the agent only finds out by spending tokens to summarize an empty page — at scale, a real chunk of the token bill.