Back to Directory/Developer Tools

io.github.n24q02m/wet-mcp

MCP server for web search, content extraction, academic research, and library docs.

Developer ToolsPythonv2.24.0

WET - Web Extended Toolkit MCP Server

mcp-name: io.github.n24q02m/wet-mcp

Open-source MCP Server for web search, content extraction, library docs & multimodal analysis.

CI codecov PyPI Docker License: MIT

Python SearXNG MCP semantic-release Renovate

<a href="https://glama.ai/mcp/servers/n24q02m/wet-mcp"> </a>

Features

  • Web Search -- Embedded SearXNG metasearch (Google, Bing, DuckDuckGo, Brave) with filters, semantic reranking, query expansion, and snippet enrichment
  • Academic Research -- Search Google Scholar, Semantic Scholar, arXiv, PubMed, CrossRef, BASE
  • Library Docs -- Auto-discover and index documentation with FTS5 hybrid search, HyDE-enhanced retrieval, and version-specific docs
  • Content Extract -- Clean content extraction (Markdown/Text), structured data extraction (LLM + JSON Schema), batch processing (up to 50 URLs), deep crawling, site mapping
  • Local File Conversion -- Convert PDF, DOCX, XLSX, CSV, HTML, EPUB, PPTX to Markdown
  • Media -- List, download, and analyze images, videos, audio files
  • Anti-bot -- Stealth mode bypasses Cloudflare, Medium, LinkedIn, Twitter
  • Zero Config -- Built-in local Qwen3 embedding + reranking, no API keys needed. Optional cloud providers (Jina AI, Gemini, OpenAI, Cohere)
  • Sync -- Cross-machine sync of indexed docs via rclone (Google Drive, S3, Dropbox)

Quick Start

Claude Code Plugin (Recommended)

Via marketplace (includes skills: /fact-check, /compare):

bash
/plugin marketplace add n24q02m/claude-plugins
/plugin install wet-mcp@claude-plugins

Or install this plugin only:

bash
/plugin marketplace add n24q02m/wet-mcp
/plugin install wet-mcp

Configure env vars in ~/.claude/settings.local.json or shell profile. See Environment Variables.

MCP Server

Python 3.13 required -- Python 3.14+ is not supported due to SearXNG incompatibility. You must specify --python 3.13 when using uvx.

On first run, the server automatically installs SearXNG, Playwright chromium, and starts the embedded search engine.

Option 1: uvx

jsonc
{
  "mcpServers": {
    "wet": {
      "command": "uvx",
      "args": ["--python", "3.13", "wet-mcp@latest"]
    }
  }
}
<details> <summary>Other MCP clients (Cursor, Codex, Gemini CLI)</summary>
jsonc
// Cursor (~/.cursor/mcp.json), Windsurf, Cline, Amp, OpenCode
{
  "mcpServers": {
    "wet": {
      "command": "uvx",
      "args": ["--python", "3.13", "wet-mcp@latest"]
    }
  }
}
toml
# Codex (~/.codex/config.toml)
[mcp_servers.wet]
command = "uvx"
args = ["--python", "3.13", "wet-mcp@latest"]
</details>

Option 2: Docker

jsonc
{
  "mcpServers": {
    "wet": {
      "command": "docker",
      "args": [
        "run", "-i", "--rm",
        "--name", "mcp-wet",
        "-v", "wet-data:/data",
        "-e", "API_KEYS",
        "-e", "GITHUB_TOKEN",
        "-e", "SYNC_ENABLED",
        "n24q02m/wet-mcp:latest"
      ]
    }
  }
}

Configure env vars in ~/.claude/settings.local.json or your shell profile. See Environment Variables below.

Pre-install (optional)

Use the setup MCP tool to warmup models and install dependencies:

text
# Via MCP tool call (recommended):
setup(action="warmup")

# With cloud embedding configured, warmup validates API keys
# and skips local model download if cloud models are available.

The warmup action pre-downloads SearXNG, Playwright, and embedding/reranker models (~1.1GB total) so the first real connection does not timeout.

Sync setup

Sync is fully automatic. Just set SYNC_ENABLED=true and the server handles everything:

  1. First sync: rclone is auto-downloaded, a browser opens for OAuth authentication
  2. Token saved: OAuth token is stored locally at ~/.wet-mcp/tokens/ (600 permissions)
  3. Subsequent runs: Token is loaded automatically -- no manual steps needed

For non-Google Drive providers, set SYNC_PROVIDER and SYNC_REMOTE:

jsonc
{
  "SYNC_ENABLED": "true",
  "SYNC_PROVIDER": "dropbox",
  "SYNC_REMOTE": "dropbox"
}

Tools

ToolActionsDescription
searchsearch, research, docs, similarWeb search (with filters, reranking, expand/enrich), academic research, library docs (HyDE), find similar
extractextract, batch, crawl, map, convert, extract_structuredContent extraction, batch processing (up to 50 URLs), deep crawling, site mapping, local file conversion, structured data extraction (JSON Schema)
medialist, download, analyzeMedia discovery, download, and analysis
configstatus, set, cache_clear, docs_reindexServer configuration and cache management
setupwarmup, setup_syncPre-download models, configure cloud sync
help--Full documentation for any tool

MCP Prompts

PromptParametersDescription
research_topictopicResearch a topic using academic search
library_docslibrary, questionFind library documentation

Configuration

VariableRequiredDefaultDescription
API_KEYSNo--LLM API keys for SDK mode (format: ENV_VAR:key,...). Enables cloud embedding + reranking
LITELLM_PROXY_URLNo--LiteLLM Proxy URL. Enables proxy mode
LITELLM_PROXY_KEYNo--LiteLLM Proxy virtual key
GITHUB_TOKENNoauto-detectGitHub token for docs discovery (60 -> 5000 req/hr). Auto-detected from gh auth token
EMBEDDING_BACKENDNoauto-detectlitellm (cloud) or local (Qwen3). Auto: API_KEYS -> litellm, else local
EMBEDDING_MODELNoauto-detectLiteLLM embedding model name
EMBEDDING_DIMSNo0 (auto=768)Embedding dimensions
RERANK_ENABLEDNotrueEnable reranking after search
RERANK_BACKENDNoauto-detectlitellm or local. Auto: Cohere/Jina key -> litellm, else local
RERANK_MODELNoauto-detectLiteLLM rerank model name
RERANK_TOP_NNo10Return top N results after reranking
LLM_MODELSNogemini/gemini-3-flash-previewLiteLLM model for media analysis
WET_AUTO_SEARXNGNotrueAuto-start embedded SearXNG subprocess
WET_SEARXNG_PORTNo41592SearXNG port
SEARXNG_URLNohttp://localhost:41592External SearXNG URL (when auto disabled)
SEARXNG_TIMEOUTNo30SearXNG request timeout in seconds
CONVERT_MAX_FILE_SIZENo104857600Max file size for local conversion in bytes (100MB)
CONVERT_ALLOWED_DIRSNo--Comma-separated paths to restrict local file conversion
CACHE_DIRNo~/.wet-mcpData directory for cache, docs, downloads
DOCS_DB_PATHNo~/.wet-mcp/docs.dbDocs database location
DOWNLOAD_DIRNo~/.wet-mcp/downloadsMedia download directory
TOOL_TIMEOUTNo120Tool execution timeout in seconds (0=no timeout)
WET_CACHENotrueEnable/disable web cache
SYNC_ENABLEDNofalseEnable rclone sync
SYNC_PROVIDERNodriverclone provider type (drive, dropbox, s3, etc.)
SYNC_REMOTENogdriverclone remote name
SYNC_FOLDERNowet-mcpRemote folder name
SYNC_INTERVALNo300Auto-sync interval in seconds (0=manual)
LOG_LEVELNoINFOLogging level

Embedding & Reranking

Both embedding and reranking are always available -- local models are built-in and require no configuration.

  • Jina AI (recommended): A single JINA_AI_API_KEY enables both embedding and reranking
  • Embedding priority: Jina AI > Gemini > OpenAI > Cohere. Local Qwen3 fallback always available
  • Reranking priority: Jina AI > Cohere. Local Qwen3 fallback always available
  • GPU auto-detection: CUDA/DirectML auto-detected, uses GGUF models for better performance
  • All embeddings stored at 768 dims. Switching providers never breaks the vector table

LLM Configuration (3-Mode Architecture)

PriorityModeConfigUse case
1ProxyLITELLM_PROXY_URL + LITELLM_PROXY_KEYProduction (selfhosted gateway)
2SDKAPI_KEYSDev/local with direct API access
3LocalNothing neededOffline, embedding/rerank only (no LLM)

SearXNG Configuration (2-Mode)

ModeConfigDescription
Embedded (default)WET_AUTO_SEARXNG=trueAuto-installs and manages SearXNG as subprocess
ExternalWET_AUTO_SEARXNG=false + SEARXNG_URL=http://host:portConnects to pre-existing SearXNG instance

Security

  • SSRF prevention -- URL validation on crawl targets
  • Graceful fallbacks -- Cloud → Local embedding, multi-tier crawling
  • Error sanitization -- No credentials in error messages
  • File conversion sandboxing -- Optional CONVERT_ALLOWED_DIRS restriction

Build from Source

bash
git clone https://github.com/n24q02m/wet-mcp.git
cd wet-mcp
uv sync
uv run wet-mcp

Compatible With

Claude Code Claude Desktop Cursor VS Code Copilot Antigravity Gemini CLI OpenAI Codex OpenCode

Also by n24q02m

ServerDescription
mnemo-mcpPersistent AI memory with hybrid search and cross-machine sync
better-notion-mcpMarkdown-first Notion API with 9 composite tools
better-email-mcpEmail (IMAP/SMTP) with multi-account and auto-discovery
better-godot-mcpGodot Engine 4.x with 18 tools for scenes, scripts, and shaders
better-telegram-mcpTelegram dual-mode (Bot API + MTProto) with 6 composite tools
better-code-review-graphKnowledge graph for token-efficient code reviews

Contributing

See CONTRIBUTING.md.

License

MIT -- See LICENSE.

Learn More