Add NameetP/pdfmux to Search & Data Extraction

pdfmux is an open-source (MIT) PDF extraction router with a built-in MCP server. Differentiator from existing PDF MCPs (MinerU, etc.): instead of a single extractor, it classifies each page and routes to the best backend (PyMuPDF / Docling / OCR / optional LLM), then verifies output with per-page confidence scoring to prevent silent extraction failures in RAG pipelines.
2026-06-18 23:31:28 +00:00 · 2026-04-14 19:33:27 +05:30
parent ea9dadd7d6
commit 8d8c1d888b
1 changed files with 1 additions and 0 deletions
@@ -1839,6 +1839,7 @@ Tools for conducting research, surveys, interviews, and data collection.
 - [OctagonAI/octagon-deep-research-mcp](https://github.com/OctagonAI/octagon-deep-research-mcp) 🎖️ 📇 ☁️ 🏠 - Lightning-Fast, High-Accuracy Deep Research Agent
 - [olostep/olostep-mcp-server](https://github.com/olostep/olostep-mcp-server) 📇 ☁️ - API to search, extract and structure web data. Web scraping, AI-powered answers with citations, batch processing (10k URLs), and autonomous site crawling.
 - [opendatalab/MinerU-Ecosystem](https://github.com/opendatalab/MinerU-Ecosystem/tree/main/mcp) [![opendatalab/MinerU-Ecosystem MCP server](https://glama.ai/mcp/servers/opendatalab/MinerU-Ecosystem/badges/score.svg)](https://glama.ai/mcp/servers/opendatalab/MinerU-Ecosystem) 🎖️ 🐍 🏠 ☁️ - Official MinerU document parsing MCP ([mineru-open-mcp](https://pypi.org/project/mineru-open-mcp/) on PyPI). Converts PDFs, doc/docx/ppt/pptx, images, and spreadsheets to Markdown via the [MinerU](https://mineru.net) API; free Flash mode without an API key (about 20 pages per file); optional `MINERU_API_TOKEN` for higher limits. 
+- [NameetP/pdfmux](https://github.com/NameetP/pdfmux) 🐍 🏠 - PDF extraction router with built-in MCP server. Classifies each page (digital, scanned, tables) and routes to the best backend (PyMuPDF, Docling, OCR, or optional LLM fallback). Per-page confidence scoring flags low-quality pages and auto-reextracts them — prevents silent RAG failures. Zero config: `pip install pdfmux`. MIT licensed.
 - [parallel-web/search-mcp](https://github.com/parallel-web/search-mcp) ☁️ 🔎 - Highest Accuracy Web Search for AI
 - [FayAndXan/spectrawl](https://github.com/FayAndXan/spectrawl) [![spectrawl MCP server](https://glama.ai/mcp/servers/FayAndXan/spectrawl/badges/score.svg)](https://glama.ai/mcp/servers/FayAndXan/spectrawl) 📇 🏠 - Unified web layer for AI agents. Search (8 engines), stealth browse, cookie auth, and act on 24 platforms. 5,000 free searches/month via Gemini Grounded Search.
 - [parallel-web/task-mcp](https://github.com/parallel-web/task-mcp) ☁️ 🔎 - Highest Accuracy Deep Research and Batch Tasks MCP