Back to Directory/Developer Tools

io.github.j0hanz/superfetch

Intelligent web content fetcher MCP server that converts HTML to clean, AI-readable JSONL format

Developer ToolsTypeScriptv1.0.2

Fetch URL MCP Server

npm version License

Install in VS Code Install in VS Code Insiders Install in Visual Studio

Add to LM Studio Install in Cursor Install in Goose

A web content fetcher MCP server that converts HTML to clean, AI and human readable markdown.

Overview

The Fetch URL MCP Server provides a standardized interface for fetching public web content and transforming it into Markdown enriched with structured metadata. It validates URLs, applies noise removal heuristics, and caches results for reuse. The server supports both inline and task-based execution modes, making it suitable for a wide range of client applications and LLM interactions.

Key Features

  • fetch-url validates public HTTP(S) URLs, fetches the page, and returns cleaned Markdown plus structured metadata.
  • The tool advertises optional task support and emits progress updates while fetching and transforming larger pages.
  • GitHub, GitLab, Bitbucket, and Gist page URLs are rewritten to raw-content endpoints when possible before fetch.
  • internal://instructions and internal://cache/{namespace}/{hash} expose built-in guidance and cached Markdown as MCP resources.
  • HTTP mode adds host/origin validation, auth, rate limiting, health checks, OAuth protected-resource metadata, and cached-download URLs.

Requirements

  • Node.js >=24 (from package.json)
  • Docker is optional if you want to run the published container image.

Quick Start

Use this standard MCP client configuration:

json
{
  "mcpServers": {
    "fetch-url-mcp": {
      "command": "npx",
      "args": ["-y", "@j0hanz/fetch-url-mcp@latest"]
    }
  }
}

Client Configuration

<details> <summary><b>Install in VS Code</b></summary>

Install in VS Code

Add to .vscode/mcp.json:

json
{
  "servers": {
    "fetch-url-mcp": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@j0hanz/fetch-url-mcp@latest"]
    }
  }
}

Or install via CLI:

sh
code --add-mcp '{"name":"fetch-url-mcp","command":"npx","args":["-y","@j0hanz/fetch-url-mcp@latest"]}'

For more info, see VS Code MCP docs.

</details> <details> <summary><b>Install in VS Code Insiders</b></summary>

Install in VS Code Insiders

Add to .vscode/mcp.json:

json
{
  "servers": {
    "fetch-url-mcp": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@j0hanz/fetch-url-mcp@latest"]
    }
  }
}

Or install via CLI:

sh
code-insiders --add-mcp '{"name":"fetch-url-mcp","command":"npx","args":["-y","@j0hanz/fetch-url-mcp@latest"]}'

For more info, see VS Code Insiders MCP docs.

</details> <details> <summary><b>Install in Cursor</b></summary>

Install in Cursor

Add to ~/.cursor/mcp.json:

json
{
  "mcpServers": {
    "fetch-url-mcp": {
      "command": "npx",
      "args": ["-y", "@j0hanz/fetch-url-mcp@latest"]
    }
  }
}

For more info, see Cursor MCP docs.

</details> <details> <summary><b>Install in Visual Studio</b></summary>

Install in Visual Studio

For solution-scoped setup, add this to .mcp.json at the solution root:

json
{
  "servers": {
    "fetch-url-mcp": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@j0hanz/fetch-url-mcp@latest"]
    }
  }
}

For more info, see Visual Studio MCP docs.

</details> <details> <summary><b>Install in Goose</b></summary>

Install in Goose

Add to ~/.config/goose/config.yaml on macOS/Linux or %APPDATA%\Block\goose\config\config.yaml on Windows:

yaml
extensions:
  fetch-url-mcp:
    name: fetch-url-mcp
    cmd: npx
    args: ['-y', '@j0hanz/fetch-url-mcp@latest']
    enabled: true
    type: stdio
    timeout: 300

For more info, see Goose extension docs.

</details> <details> <summary><b>Install in LM Studio</b></summary>

Add to LM Studio

Add to ~/.lmstudio/mcp.json on macOS/Linux or %USERPROFILE%/.lmstudio/mcp.json on Windows:

json
{
  "mcpServers": {
    "fetch-url-mcp": {
      "command": "npx",
      "args": ["-y", "@j0hanz/fetch-url-mcp@latest"]
    }
  }
}

For more info, see LM Studio MCP docs.

</details> <details> <summary><b>Install in Claude Desktop</b></summary>

Add to claude_desktop_config.json:

json
{
  "mcpServers": {
    "fetch-url-mcp": {
      "command": "npx",
      "args": ["-y", "@j0hanz/fetch-url-mcp@latest"]
    }
  }
}

For more info, see Claude Desktop MCP docs.

</details> <details> <summary><b>Install in Claude Code</b></summary>

Use the CLI:

sh
claude mcp add fetch-url-mcp -- npx -y @j0hanz/fetch-url-mcp@latest

For project-scoped config, Claude Code writes .mcp.json with:

json
{
  "mcpServers": {
    "fetch-url-mcp": {
      "command": "npx",
      "args": ["-y", "@j0hanz/fetch-url-mcp@latest"],
      "env": {}
    }
  }
}

For more info, see Claude Code MCP docs.

</details> <details> <summary><b>Install in Windsurf</b></summary>

Add to ~/.codeium/windsurf/mcp_config.json:

json
{
  "mcpServers": {
    "fetch-url-mcp": {
      "command": "npx",
      "args": ["-y", "@j0hanz/fetch-url-mcp@latest"]
    }
  }
}

For more info, see Windsurf MCP docs.

</details> <details> <summary><b>Install in Amp</b></summary>

Add to ~/.config/amp/settings.json on macOS/Linux, %USERPROFILE%\.config\amp\settings.json on Windows, or .amp/settings.json for workspace-scoped config:

json
{
  "amp.mcpServers": {
    "fetch-url-mcp": {
      "command": "npx",
      "args": ["-y", "@j0hanz/fetch-url-mcp@latest"]
    }
  }
}

Or install via CLI:

sh
amp mcp add fetch-url-mcp -- npx -y @j0hanz/fetch-url-mcp@latest

For more info, see Amp docs.

</details> <details> <summary><b>Install in Cline</b></summary>

Open the MCP Servers panel, choose Configure MCP Servers, and add this to cline_mcp_settings.json:

json
{
  "mcpServers": {
    "fetch-url-mcp": {
      "command": "npx",
      "args": ["-y", "@j0hanz/fetch-url-mcp@latest"]
    }
  }
}

For more info, see Cline MCP docs.

</details> <details> <summary><b>Install in Codex CLI</b></summary>

Use the CLI:

sh
codex mcp add fetch-url-mcp -- npx -y @j0hanz/fetch-url-mcp@latest

Or add this to ~/.codex/config.toml or project-scoped .codex/config.toml:

toml
[mcp_servers.fetch-url-mcp]
command = "npx"
args = ["-y", "@j0hanz/fetch-url-mcp@latest"]

For more info, see Codex MCP docs.

</details> <details> <summary><b>Install in GitHub Copilot</b></summary>

Add to .vscode/mcp.json:

json
{
  "servers": {
    "fetch-url-mcp": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@j0hanz/fetch-url-mcp@latest"]
    }
  }
}

For more info, see GitHub Copilot MCP docs.

</details> <details> <summary><b>Install in Warp</b></summary>

Open Personal > MCP Servers in Warp, choose + Add, and either add a CLI server with:

  • command: npx
  • args: ["-y", "@j0hanz/fetch-url-mcp@latest"]

Or paste this JSON snippet when using Warp's multi-server import flow:

json
{
  "mcpServers": {
    "fetch-url-mcp": {
      "command": "npx",
      "args": ["-y", "@j0hanz/fetch-url-mcp@latest"]
    }
  }
}

For more info, see Warp MCP docs.

</details> <details> <summary><b>Install in Kiro</b></summary>

Use Kiro's MCP Servers panel or the Add to Kiro install flow. Kiro stores workspace-scoped MCP config in .kiro/settings/mcp.json and user-scoped config in ~/.kiro/settings/mcp.json.

For this server, use:

  • command: npx
  • args: ["-y", "@j0hanz/fetch-url-mcp@latest"]

For more info, see Kiro MCP docs.

</details> <details> <summary><b>Install in Gemini CLI</b></summary>

Add to ~/.gemini/settings.json:

json
{
  "mcpServers": {
    "fetch-url-mcp": {
      "command": "npx",
      "args": ["-y", "@j0hanz/fetch-url-mcp@latest"]
    }
  }
}

For more info, see Gemini CLI MCP docs.

</details> <details> <summary><b>Install in Zed</b></summary>

Add to ~/.config/zed/settings.json:

json
{
  "context_servers": {
    "fetch-url-mcp": {
      "command": "npx",
      "args": ["-y", "@j0hanz/fetch-url-mcp@latest"],
      "env": {}
    }
  }
}

For more info, see Zed MCP docs.

</details> <details> <summary><b>Install in Augment</b></summary>

Use the Augment Settings panel and either add the server manually or choose Import from JSON:

json
{
  "mcpServers": {
    "fetch-url-mcp": {
      "command": "npx",
      "args": ["-y", "@j0hanz/fetch-url-mcp@latest"]
    }
  }
}

For more info, see Augment MCP docs.

</details> <details> <summary><b>Install in Roo Code</b></summary>

Use Roo Code's MCP Servers UI or marketplace flow.

For this server, use:

  • command: npx
  • args: ["-y", "@j0hanz/fetch-url-mcp@latest"]

For more info, see Roo Code docs.

</details> <details> <summary><b>Install in Kilo Code</b></summary>

Use Kilo Code's MCP Servers UI or marketplace flow.

For this server, use:

  • command: npx
  • args: ["-y", "@j0hanz/fetch-url-mcp@latest"]

For more info, see Kilo Code docs.

</details>

Use Cases

  • Fetch documentation pages, blog posts, or reference material into Markdown before sending them to an LLM.
  • Retrieve repository-hosted content from GitHub, GitLab, Bitbucket, or Gists and let the server rewrite page URLs to raw endpoints when possible.
  • Reuse cached Markdown through internal://cache/{namespace}/{hash} or bypass the cache with forceRefresh for time-sensitive pages.
  • Use task mode for large pages or slower sites when the fetch might otherwise be delayed, cancelled, or better handled asynchronously.

Architecture

text
[MCP Client]
  ├─ stdio -> `src/index.ts` -> `startStdioServer()` -> `createMcpServer()`
  └─ HTTP (`--http`) -> `src/index.ts` -> `startHttpServer()` -> HTTP dispatcher
       ├─ `GET /health`
       ├─ `GET /.well-known/oauth-protected-resource`
       ├─ `GET /.well-known/oauth-protected-resource/mcp`
       ├─ `GET /mcp/downloads/{namespace}/{hash}`
       └─ `POST|GET|DELETE /mcp`

`createMcpServer()`
  ├─ registers tool: `fetch-url`
  ├─ registers prompt: `get-help`
  ├─ registers resources:
  │    - `internal://instructions`
  │    - `internal://cache/{namespace}/{hash}`
  ├─ enables capabilities: completions, logging, resources, prompts, tasks
  └─ installs task handlers, log-level handling, and shutdown cleanup

`fetch-url` execution
  ├─ validate input with `fetchUrlInputSchema`
  ├─ normalize URL and block local/private targets unless allowed
  ├─ rewrite supported code-host URLs to raw endpoints when possible
  ├─ fetch and cache content via the shared pipeline
  ├─ transform HTML into Markdown in the transform worker path
  └─ validate `structuredContent` with `fetchUrlOutputSchema`

Request Lifecycle

text
[Client] -- initialize {protocolVersion, capabilities} --> [Server]
[Server] -- {protocolVersion, capabilities, serverInfo} --> [Client]
[Client] -- notifications/initialized --> [Server]
[Client] -- tools/call {name, arguments} --> [Server]
[Server] -- {content: [{type, text}], structuredContent?, isError?} --> [Client]

MCP Surface

Tools

fetch-url

Fetch public webpages and convert HTML into AI-readable Markdown. The tool is read-only, does not execute page JavaScript, can bypass the cache with forceRefresh, and supports optional task mode for larger or slower fetches. forceRefresh refreshes cached content only; it does not bypass fetch or inline truncation limits.

ParameterTypeRequiredDescription
urlstringyesTarget URL. Max 2048 chars.
forceRefreshbooleannoBypass cache and fetch fresh content.

The response is returned as MCP text content and, when validation succeeds, as structuredContent containing url, resolvedUrl, finalUrl, title, metadata, markdown, fromCache, fetchedAt, contentSize, and truncated. A truncated: true result means the content hit server-enforced fetch or inline limits; forceRefresh does not remove that limit.

text
1. [Client] -- tools/call {name: "fetch-url", arguments} --> [Server]
2. [Server] -- dispatch("fetch-url") --> [src/tools/fetch-url.ts]
3. [Handler] -- validate(fetchUrlInputSchema) --> normalize / fetch / transform
4. [Handler] -- validate(fetchUrlOutputSchema) --> assemble content + structuredContent
5. [Server] -- result or tool error --> [Client]

Resources

ResourceURIMIME TypeDescription
fetch-url-mcp-instructionsinternal://instructionstext/markdownGuidance for using the Fetch URL MCP server.
fetch-url-mcp-cache-entryinternal://cache/{namespace}/{hash}text/markdownRead cached markdown generated by previous fetch-url calls.

Prompts

PromptArgumentsDescription
get-helpnoneReturn Fetch URL server instructions: workflows, cache usage, task mode, and error handling.

MCP Capabilities

CapabilityStatusNotes
completionsconfirmedAdvertised in createServerCapabilities() and used by the cache resource template for namespace and hash completion.
loggingconfirmedAdvertised in createServerCapabilities() and handled through SetLevelRequestSchema.
resources subscribe/listChangedconfirmedAdvertised in createServerCapabilities() and implemented for cache resource subscriptions and list changes.
promptsconfirmedget-help is registered during server startup.
tasksconfirmedAdvertised in createServerCapabilities() and backed by registered task handlers plus optional tool task support.
progress notificationsconfirmedTool execution reports notifications/progress updates during fetch and transform stages.

Tool Annotations

AnnotationValue
readOnlyHinttrue
destructiveHintfalse
idempotentHinttrue
openWorldHinttrue

Structured Output

  • fetch-url publishes an explicit outputSchema and returns structuredContent when the assembled response passes validation.

Configuration

VariableDefaultApplies ToNotes
HOST127.0.0.1HTTP modeBind address. Non-loopback bindings also require ALLOW_REMOTE=true.
PORT3000HTTP modeListening port for --http.
ALLOW_REMOTEfalseHTTP modeMust be enabled to bind to a non-loopback interface.
ACCESS_TOKENSunsetHTTP modeComma- or space-separated static bearer tokens.
API_KEYunsetHTTP modeAlternate static token source for header auth.
OAUTH_ISSUER_URLunsetHTTP modeEnables OAuth mode when combined with the other OAuth URLs.
OAUTH_AUTHORIZATION_URLunsetHTTP modeOptional explicit authorization endpoint.
OAUTH_TOKEN_URLunsetHTTP modeOptional explicit token endpoint.
OAUTH_REVOCATION_URLunsetHTTP modeOptional OAuth revocation endpoint.
OAUTH_REGISTRATION_URLunsetHTTP modeOptional OAuth dynamic client registration endpoint.
OAUTH_INTROSPECTION_URLunsetHTTP modeRequired for OAuth token introspection.
OAUTH_REQUIRED_SCOPESemptyHTTP modeRequired scopes enforced after auth.
OAUTH_CLIENT_IDunsetHTTP modeOptional introspection client ID.
OAUTH_CLIENT_SECRETunsetHTTP modeOptional introspection client secret.
SERVER_TLS_KEY_FILEunsetHTTP modeEnable HTTPS when set together with SERVER_TLS_CERT_FILE.
SERVER_TLS_CERT_FILEunsetHTTP modeTLS certificate path.
SERVER_TLS_CA_FILEunsetHTTP modeOptional custom CA bundle.
SERVER_MAX_CONNECTIONS0HTTP modeOptional connection cap.
SERVER_HEADERS_TIMEOUT_MSunsetHTTP modeOptional Node server tuning.
SERVER_REQUEST_TIMEOUT_MSunsetHTTP modeOptional Node server tuning.
SERVER_KEEP_ALIVE_TIMEOUT_MSunsetHTTP modeOptional keep-alive tuning.
SERVER_KEEP_ALIVE_TIMEOUT_BUFFER_MSunsetHTTP modeOptional keep-alive tuning buffer.
SERVER_MAX_HEADERS_COUNTunsetHTTP modeOptional header count limit.
SERVER_BLOCK_PRIVATE_CONNECTIONSfalseHTTP modeEnables inbound private-network protections.
MCP_STRICT_PROTOCOL_VERSION_HEADERtrueHTTP modeRequires MCP-Protocol-Version on session init.
ALLOWED_HOSTSemptyHTTP modeAdditional allowed Host and Origin values.
ALLOW_LOCAL_FETCHfalseFetchingAllows loopback and private-network fetch targets.
FETCH_TIMEOUT_MS15000FetchingNetwork fetch timeout in milliseconds.
USER_AGENTfetch-url-mcp/<version>FetchingOverride the outbound user agent string.
MAX_INLINE_CONTENT_CHARS0Tool output0 means no explicit inline truncation limit.
CACHE_ENABLEDtrueCachingEnables in-memory fetch result caching.
TASKS_MAX_TOTAL5000TasksTotal task capacity.
TASKS_MAX_PER_OWNER1000TasksPer-owner task cap, clamped to the total cap.
TASKS_STATUS_NOTIFICATIONSfalseTasksEnables status notifications for tasks.
TASKS_REQUIRE_INTERCEPTIONtrueTasksRequires task interception for task-capable tool execution.
TRANSFORM_CANCEL_ACK_TIMEOUT_MS200Transform workersCancellation acknowledgement timeout.
TRANSFORM_WORKER_MODEthreadsTransform workersWorker execution mode.
TRANSFORM_WORKER_MAX_OLD_GENERATION_MBunsetTransform workersOptional worker memory limit.
TRANSFORM_WORKER_MAX_YOUNG_GENERATION_MBunsetTransform workersOptional worker memory limit.
TRANSFORM_WORKER_CODE_RANGE_MBunsetTransform workersOptional worker memory limit.
TRANSFORM_WORKER_STACK_MBunsetTransform workersOptional worker stack size.
FETCH_URL_MCP_EXTRA_NOISE_TOKENSemptyContent cleanupExtra noise-removal tokens.
FETCH_URL_MCP_EXTRA_NOISE_SELECTORSemptyContent cleanupExtra DOM selectors for noise removal.
FETCH_URL_MCP_LOCALEsystem defaultContent cleanupLocale override for extraction heuristics.
MARKDOWN_HEADING_KEYWORDSbuilt-in listMarkdown cleanupOverride heading keywords used by cleanup.
LOG_LEVELinfoLoggingdebug, info, warn, or error.
LOG_FORMATtextLoggingSet to json for structured logs.

HTTP Endpoints

MethodPathAuthPurpose
GET/healthno, unless ?verbose=1 on a remote serverBasic health response, with optional diagnostics.
GET/.well-known/oauth-protected-resourcenoOAuth protected-resource metadata.
GET/.well-known/oauth-protected-resource/mcpnoOAuth protected-resource metadata for the MCP endpoint.
POST/mcpyesSession initialization and JSON-RPC requests.
GET/mcpyesSession-bound server-to-client stream handling.
DELETE/mcpyesSession shutdown.
GET/mcp/downloads/{namespace}/{hash}yesDownload route used by HTTP-mode cached fetch results.

Security

ControlStatusNotes
Host and origin validationimplementedHTTP requests are rejected unless Host and Origin match the allowlist built from loopback, the configured host, and ALLOWED_HOSTS.
AuthenticationimplementedHTTP mode supports static bearer tokens locally or OAuth token introspection; remote bindings require OAuth.
Protocol version checksimplementedHTTP sessions validate MCP-Protocol-Version and pin it to the negotiated session version.
Rate limitingimplementedRequests pass through the HTTP rate limiter before route dispatch.
Outbound SSRF protectionsimplementedLocal/private IPs, metadata endpoints, and .local/.internal hosts are blocked unless ALLOW_LOCAL_FETCH=true.
TLSoptionalHTTPS is enabled when both TLS key and certificate files are configured.
Stdio logging safetyimplementedServer logs are written to stderr, not stdout, so stdio MCP traffic stays clean.

Development

ScriptCommand
cleannode scripts/tasks.mjs clean
buildnode scripts/tasks.mjs build
copy:assetsnode scripts/tasks.mjs copy:assets
preparenpm run build
devtsc --watch --preserveWatchOutput
dev:runnode --env-file=.env --watch dist/index.js
startnode dist/index.js
formatprettier --write .
type-checknode scripts/tasks.mjs type-check
type-check:srcnode node_modules/typescript/bin/tsc -p tsconfig.json --noEmit
type-check:testsnode node_modules/typescript/bin/tsc -p tsconfig.test.json --noEmit
type-check:diagnosticstsc --noEmit --extendedDiagnostics
type-check:tracenode -e "require('fs').rmSync('.ts-trace',{recursive:true,force:true})" && tsc --noEmit --generateTrace .ts-trace
linteslint .
lint:testseslint src/__tests__
lint:fixeslint . --fix
testnode scripts/tasks.mjs test
test:fastnode --test --import tsx/esm src/__tests__/**/*.test.ts node-tests/**/*.test.ts
test:coveragenode scripts/tasks.mjs test --coverage
knipknip
knip:fixknip --fix
inspectornpm run build && npx -y @modelcontextprotocol/inspector node dist/index.js --stdio
prepublishOnlynpm run lint && npm run type-check && npm run build

Build and Release

  • The repository includes release automation under .github/workflows/.
  • Dockerfile and docker-compose.yml are available for container-based packaging and local runs.
  • npm run prepublishOnly runs the release gate: lint, type-check, and build.

Troubleshooting

  • For stdio mode, avoid writing logs to stdout; keep logs on stderr.
  • For HTTP mode, verify MCP protocol headers and endpoint routing.
  • Update client snippets when client MCP configuration formats change.

Credits

DependencyRegistry
@modelcontextprotocol/sdknpm
@mozilla/readabilitynpm
linkedomnpm
node-html-markdownnpm
undicinpm
zodnpm

Contributing and License

  • License: MIT
  • Contributions are welcome via pull requests.