io.github.annibale-x/mcp-webgate
Web search that doesn't wreck your AI's memory.
★ 2MITdevtools
Install
Config snippet generator goes here (5 client tabs)
README
# mcp-webgate
[](https://www.python.org/downloads/)
[](LICENSE)
[](https://spec.modelcontextprotocol.io/)
[](https://github.com/annibale-x/mcp-webgate/releases/tag/v0.1.31)
[](https://github.com/annibale-x/mcp-webgate/issues)
Web search that doesn't wreck your AI's memory.
mcp-webgate is an MCP server that gives your AI clean, bounded web content — across all major AI clients:
- **IDEs**: Claude Desktop, Claude Code, Zed, Cursor, Windsurf, VSCode
- **CLI Agents**: Gemini CLI, Claude CLI, custom agents
## 🌱 A Gentle Introduction
**What is mcp-webgate?**
When your AI uses a standard "fetch URL" tool, it gets the raw HTML of the page — ads, menus, scripts, cookie banners and all. A single news article can dump **200,000 tokens** of garbage into the AI's memory, wiping out your entire conversation.
**mcp-webgate** is a protective filter that sits between your AI and the web:
1. **Strips the junk** — menus, scripts, ads, footers are removed with surgical HTML parsing; only readable text passes through
2. **Hard-caps every response** — no page can ever blow up your context window, no matter how big the original was
3. **Optionally summarizes** — route results through a secondary local LLM that produces a compact Markdown report with citations; your primary AI gets a polished briefing instead of a wall of text
The result: clean, bounded, useful web content — always.
### 🔬 Real example: what happens under the hood
Searching for *"mcp model context protocol"* with LLM features on:
```
Query → LLM expands to 5 search variants → 20 pages found, 13 fetched in parallel
Raw HTML downloaded 5.16 MB (~1,290,000 tokens)
After cleaning 52.1 KB ( ~13,000 tokens) — 99% noise stripped
After LLM summary 5.8 KB ( ~1,450 tokens) — structured report with citations
```
**13 sources distilled into ~1,450 tokens.** A single naive fetch of just *one* of those pages (e.g. a security blog at 563 KB) would dump **~140,000 tokens** of raw HTML into your AI's context. webgate processes all 13 and delivers a clean briefing that fits in a footnote.
This is an intensive case (5 queries × 5 results). A typical search with 3–5 results still saves 95%+ of context compared to raw fetching — and your AI gets structured, ranked content instead of a wall of HTML soup.
## 🚀 Quick Start
### 1. Make sure you have `uvx`
```bash
pip install uv
```
`uvx` runs Python tools without installing them permanently. You only need to do this once.
### 2. Set up a search backend
The easiest option is **SearXNG** — free, no account, runs locally:
```bash
docker run -d -p 8080:8080 --name searxng searxng/searxng
```
No Docker? Use a cloud backend instead (Brave, Tavily, Exa, SerpAPI) — see [Backends](#backends).
### 3. Add webgate to your AI client
See the [Integrations](#integrations) table for your specific client. As a quick example, for **Claude Desktop**:
Open the config file:
- **macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`
- **Windows**: `%APPDATA%\Claude\claude_desktop_config.json`
Add this:
```json
{
"mcpServers": {
"webgate": {
"command": "uvx",
"args": ["mcp-webgate"],
"env": {
"WEBGATE_DEFAULT_BACKEND": "searxng",
"WEBGATE_SEARXNG_URL": "http://localhost:8080"
}
}
}
}
```
Restart the client after editing.
### 4. Ask your AI to search!
```
Search the web for: latest news on AI regulation
```
The AI will use `webgate_query` automatically. You're done.
## 🔍 How it works
```
Your question
↓
Search backend (SearXNG / Brave / Tavily / Exa / SerpAPI)
↓ [deduplicate URLs, block binary files, filter domains]
Fetch pages in parallel (streaming — hard size cap per page)
↓ [optional: retry failed pages from reserve pool]
Strip HTML junk (menus, ads, scripts, footers — lxml)
↓
Clean up text (invisible chars, unicode junk, BiDi tricks)
↓
BM25 reranking (best-matching results first — always active)
↓ [optional: LLM reranking]
Cap total output to budget
↓ [optional: LLM summarization → compact Markdown report]
Clean result lands in your AI's context
```
## 🛠️ Tools
webgate gives your AI three tools:
### `webgate_fetch` — read a single page
Use this when you already know the URL you want. The AI passes the URL and gets back the cleaned text — up to `max_query_budget` characters (default 32,000).
```json
{ "url": "https://example.com/article", "max_chars": 32000 }
```
```json
{
"url": "https://example.com/article",
"title": "Article Title",
"text": "cleaned text...",
"truncated": true,
"char_count": 12450
}
```
### `webgate_query` — search + f