WET - Web Extended Toolkit MCP Server

Name: io.github.n24q02m/wet-mcp
Author: n24q02m

mcp-name: io.github.n24q02m/wet-mcp

Open-source MCP Server for web search, content extraction, library docs & multimodal analysis.

Features

Web Search -- Embedded SearXNG metasearch (Google, Bing, DuckDuckGo, Brave) with filters, semantic reranking, query expansion, and snippet enrichment
Academic Research -- Search Google Scholar, Semantic Scholar, arXiv, PubMed, CrossRef, BASE
Library Docs -- Auto-discover and index documentation with FTS5 hybrid search, HyDE-enhanced retrieval, and version-specific docs
Content Extract -- Clean content extraction (Markdown/Text), structured data extraction (LLM + JSON Schema), batch processing (up to 50 URLs), deep crawling, site mapping
Local File Conversion -- Convert PDF, DOCX, XLSX, CSV, HTML, EPUB, PPTX to Markdown
Media -- List, download, and analyze images, videos, audio files
Anti-bot -- Stealth mode bypasses Cloudflare, Medium, LinkedIn, Twitter
Zero Config -- Built-in local Qwen3 embedding + reranking, no API keys needed. Optional cloud providers (Jina AI, Gemini, OpenAI, Cohere)
Sync -- Cross-machine sync of indexed docs via rclone (Google Drive, S3, Dropbox)

Quick Start

Claude Code Plugin (Recommended)

Via marketplace (includes skills: /fact-check, /compare):

bash

/plugin marketplace add n24q02m/claude-plugins
/plugin install wet-mcp@claude-plugins

Or install this plugin only:

bash

/plugin marketplace add n24q02m/wet-mcp
/plugin install wet-mcp

Configure env vars in ~/.claude/settings.local.json or shell profile. See Environment Variables.

MCP Server

Python 3.13 required -- Python 3.14+ is not supported due to SearXNG incompatibility. You must specify --python 3.13 when using uvx.

On first run, the server automatically installs SearXNG, Playwright chromium, and starts the embedded search engine.

Option 1: uvx

jsonc

{
  "mcpServers": {
    "wet": {
      "command": "uvx",
      "args": ["--python", "3.13", "wet-mcp@latest"]
    }
  }
}

<details> <summary>Other MCP clients (Cursor, Codex, Gemini CLI)</summary>

jsonc

// Cursor (~/.cursor/mcp.json), Windsurf, Cline, Amp, OpenCode
{
  "mcpServers": {
    "wet": {
      "command": "uvx",
      "args": ["--python", "3.13", "wet-mcp@latest"]
    }
  }
}

toml

# Codex (~/.codex/config.toml)
[mcp_servers.wet]
command = "uvx"
args = ["--python", "3.13", "wet-mcp@latest"]

</details>

Option 2: Docker

jsonc

{
  "mcpServers": {
    "wet": {
      "command": "docker",
      "args": [
        "run", "-i", "--rm",
        "--name", "mcp-wet",
        "-v", "wet-data:/data",
        "-e", "API_KEYS",
        "-e", "GITHUB_TOKEN",
        "-e", "SYNC_ENABLED",
        "n24q02m/wet-mcp:latest"
      ]
    }
  }
}

Configure env vars in ~/.claude/settings.local.json or your shell profile. See Environment Variables below.

Pre-install (optional)

Use the setup MCP tool to warmup models and install dependencies:

text

# Via MCP tool call (recommended):
setup(action="warmup")

# With cloud embedding configured, warmup validates API keys
# and skips local model download if cloud models are available.

The warmup action pre-downloads SearXNG, Playwright, and embedding/reranker models (~1.1GB total) so the first real connection does not timeout.

Sync setup

Sync is fully automatic. Just set SYNC_ENABLED=true and the server handles everything:

First sync: rclone is auto-downloaded, a browser opens for OAuth authentication
Token saved: OAuth token is stored locally at ~/.wet-mcp/tokens/ (600 permissions)
Subsequent runs: Token is loaded automatically -- no manual steps needed

For non-Google Drive providers, set SYNC_PROVIDER and SYNC_REMOTE:

jsonc

{
  "SYNC_ENABLED": "true",
  "SYNC_PROVIDER": "dropbox",
  "SYNC_REMOTE": "dropbox"
}

Tools

Tool	Actions	Description
`search`	`search`, `research`, `docs`, `similar`	Web search (with filters, reranking, expand/enrich), academic research, library docs (HyDE), find similar
`extract`	`extract`, `batch`, `crawl`, `map`, `convert`, `extract_structured`	Content extraction, batch processing (up to 50 URLs), deep crawling, site mapping, local file conversion, structured data extraction (JSON Schema)
`media`	`list`, `download`, `analyze`	Media discovery, download, and analysis
`config`	`status`, `set`, `cache_clear`, `docs_reindex`	Server configuration and cache management
`setup`	`warmup`, `setup_sync`	Pre-download models, configure cloud sync
`help`	--	Full documentation for any tool

MCP Prompts

Prompt	Parameters	Description
`research_topic`	`topic`	Research a topic using academic search
`library_docs`	`library`, `question`	Find library documentation

Configuration

Variable	Required	Default	Description
`API_KEYS`	No	--	LLM API keys for SDK mode (format: `ENV_VAR:key,...`). Enables cloud embedding + reranking
`LITELLM_PROXY_URL`	No	--	LiteLLM Proxy URL. Enables proxy mode
`LITELLM_PROXY_KEY`	No	--	LiteLLM Proxy virtual key
`GITHUB_TOKEN`	No	auto-detect	GitHub token for docs discovery (60 -> 5000 req/hr). Auto-detected from `gh auth token`
`EMBEDDING_BACKEND`	No	auto-detect	`litellm` (cloud) or `local` (Qwen3). Auto: API_KEYS -> litellm, else local
`EMBEDDING_MODEL`	No	auto-detect	LiteLLM embedding model name
`EMBEDDING_DIMS`	No	`0` (auto=768)	Embedding dimensions
`RERANK_ENABLED`	No	`true`	Enable reranking after search
`RERANK_BACKEND`	No	auto-detect	`litellm` or `local`. Auto: Cohere/Jina key -> litellm, else local
`RERANK_MODEL`	No	auto-detect	LiteLLM rerank model name
`RERANK_TOP_N`	No	`10`	Return top N results after reranking
`LLM_MODELS`	No	`gemini/gemini-3-flash-preview`	LiteLLM model for media analysis
`WET_AUTO_SEARXNG`	No	`true`	Auto-start embedded SearXNG subprocess
`WET_SEARXNG_PORT`	No	`41592`	SearXNG port
`SEARXNG_URL`	No	`http://localhost:41592`	External SearXNG URL (when auto disabled)
`SEARXNG_TIMEOUT`	No	`30`	SearXNG request timeout in seconds
`CONVERT_MAX_FILE_SIZE`	No	`104857600`	Max file size for local conversion in bytes (100MB)
`CONVERT_ALLOWED_DIRS`	No	--	Comma-separated paths to restrict local file conversion
`CACHE_DIR`	No	`~/.wet-mcp`	Data directory for cache, docs, downloads
`DOCS_DB_PATH`	No	`~/.wet-mcp/docs.db`	Docs database location
`DOWNLOAD_DIR`	No	`~/.wet-mcp/downloads`	Media download directory
`TOOL_TIMEOUT`	No	`120`	Tool execution timeout in seconds (0=no timeout)
`WET_CACHE`	No	`true`	Enable/disable web cache
`SYNC_ENABLED`	No	`false`	Enable rclone sync
`SYNC_PROVIDER`	No	`drive`	rclone provider type (drive, dropbox, s3, etc.)
`SYNC_REMOTE`	No	`gdrive`	rclone remote name
`SYNC_FOLDER`	No	`wet-mcp`	Remote folder name
`SYNC_INTERVAL`	No	`300`	Auto-sync interval in seconds (0=manual)
`LOG_LEVEL`	No	`INFO`	Logging level

Embedding & Reranking

Both embedding and reranking are always available -- local models are built-in and require no configuration.

Jina AI (recommended): A single JINA_AI_API_KEY enables both embedding and reranking
Embedding priority: Jina AI > Gemini > OpenAI > Cohere. Local Qwen3 fallback always available
Reranking priority: Jina AI > Cohere. Local Qwen3 fallback always available
GPU auto-detection: CUDA/DirectML auto-detected, uses GGUF models for better performance
All embeddings stored at 768 dims. Switching providers never breaks the vector table

LLM Configuration (3-Mode Architecture)

Priority	Mode	Config	Use case
1	Proxy	`LITELLM_PROXY_URL` + `LITELLM_PROXY_KEY`	Production (selfhosted gateway)
2	SDK	`API_KEYS`	Dev/local with direct API access
3	Local	Nothing needed	Offline, embedding/rerank only (no LLM)

SearXNG Configuration (2-Mode)

Mode	Config	Description
Embedded (default)	`WET_AUTO_SEARXNG=true`	Auto-installs and manages SearXNG as subprocess
External	`WET_AUTO_SEARXNG=false` + `SEARXNG_URL=http://host:port`	Connects to pre-existing SearXNG instance

Security

SSRF prevention -- URL validation on crawl targets
Graceful fallbacks -- Cloud → Local embedding, multi-tier crawling
Error sanitization -- No credentials in error messages
File conversion sandboxing -- Optional CONVERT_ALLOWED_DIRS restriction

Build from Source

bash

git clone https://github.com/n24q02m/wet-mcp.git
cd wet-mcp
uv sync
uv run wet-mcp

Compatible With

Also by n24q02m

Server	Description
mnemo-mcp	Persistent AI memory with hybrid search and cross-machine sync
better-notion-mcp	Markdown-first Notion API with 9 composite tools
better-email-mcp	Email (IMAP/SMTP) with multi-account and auto-discovery
better-godot-mcp	Godot Engine 4.x with 18 tools for scenes, scripts, and shaders
better-telegram-mcp	Telegram dual-mode (Bot API + MTProto) with 6 composite tools
better-code-review-graph	Knowledge graph for token-efficient code reviews

Contributing

See CONTRIBUTING.md.

License

MIT -- See LICENSE.

io.github.n24q02m/wet-mcp

WET - Web Extended Toolkit MCP Server

Features

Quick Start

Claude Code Plugin (Recommended)

MCP Server

Option 1: uvx

Option 2: Docker

Pre-install (optional)

Sync setup

Tools

MCP Prompts

Configuration

Embedding & Reranking

LLM Configuration (3-Mode Architecture)

SearXNG Configuration (2-Mode)

Security

Build from Source

Compatible With

Also by n24q02m

Contributing

License

Learn More