mirror of
https://github.com/punkpeye/awesome-mcp-servers.git
synced 2026-05-01 08:15:59 +00:00
Add NameetP/pdfmux to Search & Data Extraction
pdfmux is an open-source (MIT) PDF extraction router with a built-in MCP server. Differentiator from existing PDF MCPs (MinerU, etc.): instead of a single extractor, it classifies each page and routes to the best backend (PyMuPDF / Docling / OCR / optional LLM), then verifies output with per-page confidence scoring to prevent silent extraction failures in RAG pipelines.
This commit is contained in:
@@ -1839,6 +1839,7 @@ Tools for conducting research, surveys, interviews, and data collection.
|
||||
- [OctagonAI/octagon-deep-research-mcp](https://github.com/OctagonAI/octagon-deep-research-mcp) 🎖️ 📇 ☁️ 🏠 - Lightning-Fast, High-Accuracy Deep Research Agent
|
||||
- [olostep/olostep-mcp-server](https://github.com/olostep/olostep-mcp-server) 📇 ☁️ - API to search, extract and structure web data. Web scraping, AI-powered answers with citations, batch processing (10k URLs), and autonomous site crawling.
|
||||
- [opendatalab/MinerU-Ecosystem](https://github.com/opendatalab/MinerU-Ecosystem/tree/main/mcp) [](https://glama.ai/mcp/servers/opendatalab/MinerU-Ecosystem) 🎖️ 🐍 🏠 ☁️ - Official MinerU document parsing MCP ([mineru-open-mcp](https://pypi.org/project/mineru-open-mcp/) on PyPI). Converts PDFs, doc/docx/ppt/pptx, images, and spreadsheets to Markdown via the [MinerU](https://mineru.net) API; free Flash mode without an API key (about 20 pages per file); optional `MINERU_API_TOKEN` for higher limits.
|
||||
- [NameetP/pdfmux](https://github.com/NameetP/pdfmux) 🐍 🏠 - PDF extraction router with built-in MCP server. Classifies each page (digital, scanned, tables) and routes to the best backend (PyMuPDF, Docling, OCR, or optional LLM fallback). Per-page confidence scoring flags low-quality pages and auto-reextracts them — prevents silent RAG failures. Zero config: `pip install pdfmux`. MIT licensed.
|
||||
- [parallel-web/search-mcp](https://github.com/parallel-web/search-mcp) ☁️ 🔎 - Highest Accuracy Web Search for AI
|
||||
- [FayAndXan/spectrawl](https://github.com/FayAndXan/spectrawl) [](https://glama.ai/mcp/servers/FayAndXan/spectrawl) 📇 🏠 - Unified web layer for AI agents. Search (8 engines), stealth browse, cookie auth, and act on 24 platforms. 5,000 free searches/month via Gemini Grounded Search.
|
||||
- [parallel-web/task-mcp](https://github.com/parallel-web/task-mcp) ☁️ 🔎 - Highest Accuracy Deep Research and Batch Tasks MCP
|
||||
|
||||
Reference in New Issue
Block a user