io.github.TickTockBent/charlotte
Renders web pages into structured, agent-readable representations using headless Chromium.
★ 109MITdevtools
Install
Config snippet generator goes here (5 client tabs)
README
# Charlotte
**The Web, Readable.**
Your AI agent spends 60,000 tokens just to look at a web page. Charlotte does it in 336.
Charlotte is an MCP server that gives AI agents structured, token-efficient access to the web.
Instead of dumping the full accessibility tree on every call, Charlotte returns only what
the agent needs: a compact page summary on arrival, targeted queries for specific elements,
and full detail only when explicitly requested. The result is 25-182x less data per page
compared to [Playwright MCP](https://github.com/anthropics/playwright-mcp), saving thousands of dollars across production workloads.
## Why Charlotte?
Most browser MCP servers dump the entire accessibility tree on every call — a flat text blob that can exceed a million characters on content-heavy pages. Agents pay for all of it whether they need it or not.
Charlotte decomposes each page into a typed, structured representation — landmarks, headings, interactive elements, forms, content summaries — and lets agents control how much they receive with three detail levels. When an agent navigates to a new page, it gets a compact orientation (336 characters for Hacker News) instead of the full element dump (61,000+ characters). When it needs specifics, it asks for them.
### Benchmarks
Charlotte v0.5.1 vs Playwright MCP, measured by characters returned per tool call on real websites:
**Navigation** (first contact with a page):
| Site | Charlotte `navigate` | Playwright `browser_navigate` |
|:---|---:|---:|
| example.com | 612 | 817 |
| Wikipedia (AI article) | 7,667 | 1,040,636 |
| Hacker News | 336 | 61,230 |
| GitHub repo | 3,185 | 80,297 |
Charlotte's `navigate` returns minimal detail by default — landmarks, headings, and interactive element counts grouped by page region. Enough to orient, not enough to overwhelm. On Wikipedia, that's **135x smaller** than Playwright's response.
**Tool definition overhead** (invisible cost per API call):
| Profile | Tools | Def. tokens/call | Savings vs full |
|:---|---:|---:|---:|
| full | 42 | ~7,400 | — |
| browse (default) | 23 | ~3,900 | **~47%** |
| core | 7 | 1,677 | **~77%** |
Tool definitions are sent on every API round-trip. With the default `browse` profile, Charlotte carries ~47% less definition overhead than loading all tools. Over a 20-call browsing session, that's **~38% fewer total tokens**. See the [profile benchmark report](docs/charlotte-profile-benchmark-report.md) for full results.
**The workflow difference:** Playwright agents receive 61K+ characters every time they look at Hacker News, whether they're reading headlines or looking for a login button. Charlotte agents get 336 characters on arrival, call `find({ type: "link", text: "login" })` to get exactly what they need, and never pay for the rest.
## How It Works
Charlotte maintains a persistent headless Chromium session and acts as a translation layer between the visual web and the agent's text-native reasoning. Every page is decomposed into a structured representation:
```
┌─────────────┐ MCP Protocol ┌──────────────────┐
│ AI Agent │<────────────────────>│ Charlotte │
└─────────────┘ │ │
│ ┌────────────┐ │
│ │ Renderer │ │
│ │ Pipeline │ │
│ └─────┬──────┘ │
│ │ │
│ ┌─────▼──────┐ │
│ │ Headless │ │
│ │ Chromium │ │
│ └────────────┘ │
└──────────────────┘
```
Agents receive landmarks, headings, interactive elements with typed metadata, bounding boxes, form structures, and content summaries — all derived from what the browser already knows about every page.
## Features
**Navigation** — `navigate`, `back`, `forward`, `reload`
**Observation** — `observe` (3 detail levels, structural tree view), `find` (spatial + semantic search, CSS selector mode), `screenshot` (with persistent artifact management), `screenshots`, `screenshot_get`, `screenshot_delete`, `diff` (structural comparison against snapshots)
**Interaction** — `click`, `click_at` (coordinate-based), `type`, `select`, `toggle`, `submit`, `scroll`, `hover`, `drag`, `key` (single/sequence with element targeting), `wait_for` (async condition polling), `upload` (file input), `dialog` (accept/dismiss JS dialogs)
**Monitoring** — `console` (all severity levels, filtering, timestamps), `requests` (full HTTP history, method/status/resource type filtering)
**Session Management** — `tabs`, `tab_open`, `tab_switch`, `tab_close`, `viewport` (device presets), `network` (throttling, URL blocking), `set_cookies`, `get_cookies`, `clear_cookies`, `set_headers`, `configure`
**Development Mode** — `dev_se