io.github.annibale-x/mcp-webgate

Web search that doesn't wreck your AI's memory.

2MITdevtools

Install

Config snippet generator goes here (5 client tabs)

README

# mcp-webgate

[![Python Version](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)
[![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)
[![MCP Protocol](https://img.shields.io/badge/MCP-Protocol-blueviolet)](https://spec.modelcontextprotocol.io/)
[![Latest Release](https://img.shields.io/badge/release-v0.1.31-purple.svg)](https://github.com/annibale-x/mcp-webgate/releases/tag/v0.1.31)
[![Beta](https://img.shields.io/badge/status-beta-orange.svg)](https://github.com/annibale-x/mcp-webgate/issues)

Web search that doesn't wreck your AI's memory.

mcp-webgate is an MCP server that gives your AI clean, bounded web content — across all major AI clients:
- **IDEs**: Claude Desktop, Claude Code, Zed, Cursor, Windsurf, VSCode
- **CLI Agents**: Gemini CLI, Claude CLI, custom agents

## 🌱 A Gentle Introduction

**What is mcp-webgate?**
When your AI uses a standard "fetch URL" tool, it gets the raw HTML of the page — ads, menus, scripts, cookie banners and all. A single news article can dump **200,000 tokens** of garbage into the AI's memory, wiping out your entire conversation.

**mcp-webgate** is a protective filter that sits between your AI and the web:

1. **Strips the junk** — menus, scripts, ads, footers are removed with surgical HTML parsing; only readable text passes through
2. **Hard-caps every response** — no page can ever blow up your context window, no matter how big the original was
3. **Optionally summarizes** — route results through a secondary local LLM that produces a compact Markdown report with citations; your primary AI gets a polished briefing instead of a wall of text

The result: clean, bounded, useful web content — always.

### 🔬 Real example: what happens under the hood

Searching for *"mcp model context protocol"* with LLM features on:

```
Query → LLM expands to 5 search variants → 20 pages found, 13 fetched in parallel

Raw HTML downloaded     5.16 MB   (~1,290,000 tokens)
After cleaning          52.1 KB   (   ~13,000 tokens)  — 99% noise stripped
After LLM summary        5.8 KB   (    ~1,450 tokens)  — structured report with citations
```

**13 sources distilled into ~1,450 tokens.** A single naive fetch of just *one* of those pages (e.g. a security blog at 563 KB) would dump **~140,000 tokens** of raw HTML into your AI's context. webgate processes all 13 and delivers a clean briefing that fits in a footnote.

This is an intensive case (5 queries × 5 results). A typical search with 3–5 results still saves 95%+ of context compared to raw fetching — and your AI gets structured, ranked content instead of a wall of HTML soup.

## 🚀 Quick Start

### 1. Make sure you have `uvx`

```bash
pip install uv
```

`uvx` runs Python tools without installing them permanently. You only need to do this once.

### 2. Set up a search backend

The easiest option is **SearXNG** — free, no account, runs locally:

```bash
docker run -d -p 8080:8080 --name searxng searxng/searxng
```

No Docker? Use a cloud backend instead (Brave, Tavily, Exa, SerpAPI) — see [Backends](#backends).

### 3. Add webgate to your AI client

See the [Integrations](#integrations) table for your specific client. As a quick example, for **Claude Desktop**:

Open the config file:
- **macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`
- **Windows**: `%APPDATA%\Claude\claude_desktop_config.json`

Add this:

```json
{
  "mcpServers": {
    "webgate": {
      "command": "uvx",
      "args": ["mcp-webgate"],
      "env": {
        "WEBGATE_DEFAULT_BACKEND": "searxng",
        "WEBGATE_SEARXNG_URL": "http://localhost:8080"
      }
    }
  }
}
```

Restart the client after editing.

### 4. Ask your AI to search!

```
Search the web for: latest news on AI regulation
```

The AI will use `webgate_query` automatically. You're done.

## 🔍 How it works

```
Your question
    ↓
Search backend  (SearXNG / Brave / Tavily / Exa / SerpAPI)
    ↓  [deduplicate URLs, block binary files, filter domains]
Fetch pages in parallel  (streaming — hard size cap per page)
    ↓  [optional: retry failed pages from reserve pool]
Strip HTML junk  (menus, ads, scripts, footers — lxml)
    ↓
Clean up text  (invisible chars, unicode junk, BiDi tricks)
    ↓
BM25 reranking  (best-matching results first — always active)
    ↓  [optional: LLM reranking]
Cap total output to budget
    ↓  [optional: LLM summarization → compact Markdown report]
Clean result lands in your AI's context
```

## 🛠️ Tools

webgate gives your AI three tools:

### `webgate_fetch` — read a single page

Use this when you already know the URL you want. The AI passes the URL and gets back the cleaned text — up to `max_query_budget` characters (default 32,000).

```json
{ "url": "https://example.com/article", "max_chars": 32000 }
```

```json
{
  "url": "https://example.com/article",
  "title": "Article Title",
  "text": "cleaned text...",
  "truncated": true,
  "char_count": 12450
}
```

### `webgate_query` — search + f