Optical Context MCP
Compress OCR-heavy PDFs into dense packed images so agents can work with long visual documents.
★ 1MITother
Install
Config snippet generator goes here (5 client tabs)
README
<!-- mcp-name: io.github.ChrBoebel/optical-context-mcp -->
<p align="center">
<img src="./assets/optical-context-logo.png" alt="Optical Context MCP logo" width="680">
</p>
<h1 align="center">Optical Context MCP</h1>
<p align="center">
Compress OCR-heavy PDFs into dense packed images so agents can work with long visual documents.
</p>
<p align="center">
<a href="https://pypi.org/project/optical-context-mcp/"><img src="https://img.shields.io/pypi/v/optical-context-mcp.svg" alt="PyPI version"></a>
<a href="https://www.python.org/"><img src="https://img.shields.io/badge/python-3.11%2B-blue.svg" alt="Python 3.11+"></a>
<a href="https://gofastmcp.com/"><img src="https://img.shields.io/badge/MCP-FastMCP-111111.svg" alt="FastMCP"></a>
<a href="https://github.com/ChrBoebel/optical-context-mcp/actions/workflows/ci.yml"><img src="https://github.com/ChrBoebel/optical-context-mcp/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
<a href="./LICENSE"><img src="https://img.shields.io/badge/license-MIT-green.svg" alt="MIT License"></a>
</p>
Optical Context MCP is built for one specific job: turning **large, visually structured PDFs** into a smaller set of retrievable packed images for agent workflows.
It reads a local PDF, runs OCR with Mistral, recomposes the extracted text and figures into dense PNGs, and exposes those artifacts over MCP for batch retrieval.
## What It Does
- reads a local PDF from the MCP host machine
- extracts page markdown and embedded images with Mistral OCR
- packs that content into dense PNGs that preserve visual grouping
- stores a manifest and temp job artifacts for follow-up retrieval
- lets an agent pull only the packed images it needs
## Where It Fits
Use it for:
- operating manuals
- scanned handbooks
- product catalogs
- PDF slide decks
- visually structured OCR-heavy documents
Skip it for:
- tiny PDFs
- clean text-native PDFs where normal extraction is enough
- workflows that require exact page-faithful rendering
- cases where OCR cost is not justified
## Example Result
The image below shows a real local validation run on a public research paper with dense text, figures, charts, and page-level visual structure. The packed image on the right consolidates the seven source pages shown on the left.
<p align="center">
<img src="./assets/original-vs-packed-comparison-straight-arrow.png" alt="Side-by-side comparison of original pages and the generated packed output" width="980">
</p>
Example local run facts from the generated manifest:
- source paper pages: 22
- previewed source page range: 15 to 21
- extracted images: 30
- packed output images: 6
- example packed image size: `986x1084`
- example packed image file size: `536,697 bytes`
This example shows the intended workflow: take a long, visually structured PDF and compress it into a smaller set of retrievable packed images that still preserve the visual structure of the source.
## Install
```bash
python -m pip install optical-context-mcp
```
Run without installing:
```bash
uvx optical-context-mcp
```
- `MISTRAL_API_KEY` is required for `compress_pdf`
- packed images are always stored locally under the system temp directory
- `compress_pdf` returns up to `30` packed images inline by default
For pinned shared setups:
```bash
uvx --from optical-context-mcp==0.1.4 optical-context-mcp
```
## Run
Default transport is `stdio`:
```bash
optical-context-mcp
```
## Claude Code
Register the server in a project:
```bash
claude mcp add -s project optical-context -- uvx optical-context-mcp
```
Typical use:
1. call `compress_pdf`
2. inspect the returned manifest
3. fetch packed images with `get_packed_images`
## MCP Tools
- `compress_pdf`: run OCR plus recomposition and create a stored job
- `get_job_manifest`: load metadata for an existing job
- `get_packed_images`: fetch one or more packed PNGs from an existing job
## How It Works
```mermaid
flowchart LR
A["Local PDF"] --> B["Mistral OCR"]
B --> C["Page markdown + embedded images"]
C --> D["Recomposition engine"]
D --> E["Dense packed PNG images"]
E --> F["Stored job artifacts"]
F --> G["Agent fetches manifest or image batches over MCP"]
```
## Why Packed Images Instead Of Just OCR Text
- section grouping
- table-like layout
- captions near figures
- visual adjacency between text and embedded graphics
For many vision-capable agents, that is a better intermediate format than a plain OCR dump.
## Current Scope
- depends on Mistral OCR
- currently handles local file paths, not remote uploads
- stores artifacts in the local system temp directory by default
- optimized for compression and retrieval, not final polished markdown generation
- quality depends on OCR quality and the visual density of the source document
## Roadmap
- make the OCR layer provider-agnostic so different OCR backends can be swapped behind the same MCP workflow
## Development
```bash
uv venv --python /opt/homebrew/bin/python3.11 .venv
uv pip install --pyth