An MCP enabled multi-format document reader supporting DOCX, PDF, TXT, and Excel files
User Guide Β· API Reference Β· Contributing Β· Changelog Β· License
graph TB
A[AI Assistant / User] -->|Call read_document| B[MCP Document Reader]
B -->|Detect file type| C{File Type?}
C -->|.docx| D[DOCX Reader]
C -->|.pdf| E[PDF Reader]
C -->|.xlsx/.xls| F[Excel Reader]
C -->|.txt| G[Text Reader]
D -->|Extract text| H[Return Content]
E -->|Extract text| H
F -->|Extract text| H
G -->|Extract text| H
H -->|Text content| A
style A fill:#e1f5ff
style B fill:#fff4e1
style C fill:#f0f0f0
style D fill:#e8f5e9
style E fill:#e8f5e9
style F fill:#e8f5e9
style G fill:#e8f5e9
style H fill:#fff9c4| Format | Extensions | MIME Type | Features |
|---|---|---|---|
| Excel | .xlsx, .xls | application/vnd.openxmlformats-officedocument.spreadsheetml.sheet | Sheet and cell data extraction |
| DOCX | .docx | application/vnd.openxmlformats-officedocument.wordprocessingml.document | Text and structure extraction |
| application/pdf | Text extraction | ||
| Text | .txt | text/plain | Plain text reading |
pip install mcp-documents-readergit clone https://github.com/xt765/mcp_documents_reader.git
cd mcp_documents_reader
pip install -e .This server provides the following tool:
read_documentRead any supported document type with a unified interface.
Arguments:
filename (string, required): Document file path, supports absolute or relative paths.Add the following to your MCP configuration file:
Option 1: Using PyPI (Recommended)
{
"mcpServers": {
"mcp-document-reader": {
"command": "uvx",
"args": [
"mcp-documents-reader"
]
}
}
}Option 2: Using GitHub repository
{
"mcpServers": {
"mcp-document-reader": {
"command": "uvx",
"args": [
"--from",
"git+https://github.com/xt765/mcp_documents_reader",
"mcp_documents_reader"
]
}
}
}Option 3: Using Gitee repository (Faster access in China)
{
"mcpServers": {
"mcp-document-reader": {
"command": "uvx",
"args": [
"--from",
"git+https://gitee.com/xt765/mcp_documents_reader",
"mcp_documents_reader"
]
}
}
}After configuration, AI assistants can directly call the following tool:
# Read a DOCX file
read_document(filename="example.docx")
# Read a PDF file
read_document(filename="example.pdf")
# Read an Excel file
read_document(filename="example.xlsx")
# Read a text file
read_document(filename="example.txt")from mcp_documents_reader import DocumentReaderFactory
# Using factory (recommended)
reader = DocumentReaderFactory.get_reader("document.pdf")
content = reader.read("/path/to/document.pdf")
# Check if format is supported
if DocumentReaderFactory.is_supported("file.xlsx"):
reader = DocumentReaderFactory.get_reader("file.xlsx")
content = reader.read("/path/to/file.xlsx")Read any supported document type.
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
| filename | string | β | Document file path, supports absolute or relative paths |
mcp >= 1.26.0 - MCP protocol implementationpython-docx >= 1.2.0 - DOCX file readingpypdf >= 6.8.0 - PDF file reading (replaces PyPDF2)openpyxl >= 3.1.5 - Excel file readingpytest >= 8.0.0 - Testing frameworkpytest-asyncio >= 0.24.0 - Async testing supportpytest-cov >= 6.0.0 - Coverage reportingbasedpyright >= 0.28.0 - Type checkingruff >= 0.8.0 - Linting and formattingMIT License
Issues and Pull Requests are welcome!