io.github.jghiringhelli/codeseeker
Graph-powered code intelligence with semantic search and knowledge graph for AI assistants
★ 11NOASSERTIONdevtools
Install
Config snippet generator goes here (5 client tabs)
README
# CodeSeeker
**Four-layer hybrid search and knowledge graph for AI coding assistants.**
BM25 + vector embeddings + RAPTOR directory summaries + graph expansion — fused into a single MCP tool that gives Claude, Copilot, and Cursor a real understanding of your codebase.
[](https://www.npmjs.com/package/codeseeker)
[](LICENSE)
[](https://www.typescriptlang.org/)
Works with **Claude Code**, **GitHub Copilot** (VS Code 1.99+), **Cursor**, **Windsurf**, and **Claude Desktop**.
Zero configuration — indexes on first use, stays in sync automatically.
## The Problem
AI assistants are powerful editors, but they navigate code like a tourist:
- **Grep finds text** — not meaning. `"find authentication logic"` returns every file containing the word "auth"
- **File reads are isolated** — Claude sees a file but not its dependencies, callers, or the patterns your team established
- **No memory of your project** — every session starts from scratch
CodeSeeker fixes this. It indexes your codebase once and gives AI assistants a queryable knowledge graph they can use on every turn.
## How It Works
A 4-stage pipeline runs on every query:
```
Query: "find JWT refresh token logic"
│
▼ Stage 1 — Hybrid retrieval
┌─────────────────────────────────────────────────────┐
│ BM25 (exact symbols, camelCase tokenized) │
│ + │
│ Vector search (384-dim Xenova embeddings) │
│ ↓ │
│ Reciprocal Rank Fusion: score = Σ 1/(60 + rank_i) │
│ Top-30 results, including RAPTOR directory nodes │
└─────────────────────────────────────────────────────┘
│
▼ Stage 2 — RAPTOR cascade (conditional)
┌─────────────────────────────────────────────────────┐
│ IF best directory-summary score ≥ 0.5: │
│ → narrow results to that directory automatically │
│ ELSE: all 30 results pass through unchanged │
│ Effect: "what does auth/ do?" scopes to auth/ │
│ "jwt.ts decode function" bypasses this │
└─────────────────────────────────────────────────────┘
│
▼ Stage 3 — Scoring and deduplication
┌─────────────────────────────────────────────────────┐
│ Dedup: keep highest-score chunk per file │
│ Source files: +0.10 (definition sites matter) │
│ Test files: −0.15 (prevent test dominance) │
│ Symbol boost: +0.20 (query token in filename) │
│ Multi-chunk: up to +0.30 (file has many hits) │
└─────────────────────────────────────────────────────┘
│
▼ Stage 4 — Graph expansion
┌─────────────────────────────────────────────────────┐
│ Top-10 results → follow IMPORTS/CALLS/EXTENDS edges │
│ Structural neighbors scored at source × 0.7 │
│ Avg graph connectivity: 20.8 edges/node │
└─────────────────────────────────────────────────────┘
│
▼
auth/jwt.ts (0.94), auth/refresh.ts (0.89), ...
```
The knowledge graph is built from AST-parsed imports at index time. It's what powers `analyze dependencies`, dead-code detection, and graph expansion in every search.
## What Makes It Different
| Approach | Strengths | Limitations |
|----------|-----------|-------------|
| **Grep / ripgrep** | Fast, universal | No semantic understanding |
| **Vector search only** | Finds similar code | Misses structural relationships |
| **Serena** | Precise LSP symbol navigation, 30+ languages | No semantic search, no cross-file reasoning |
| **Codanna** | Fast symbol lookup, good call graphs | Semantic search needs JSDoc — undocumented code gets no embeddings; no BM25, no RAPTOR, Windows experimental |
| **CodeSeeker** | BM25 + embedding fusion + RAPTOR + graph + coding standards + multi-language AST | Requires initial indexing (30s–5min) |
**What LSP tools can't do:**
- *"Find code that handles errors like this"* → semantic pattern search
- *"What validation approach does this project use?"* → auto-detected coding standards
- *"Show me everything related to authentication"* → graph traversal across indirect dependencies
**What vector-only search misses:**
- Direct import/export chains
- Class inheritance hierarchies
- Which files actually depend on which
## Installation
### Recommended: npx (no install needed)
The standard way to configure any MCP server — no global install required:
```json
{
"mcpServers": {
"codeseeker": {
"command": "npx",
"args": ["-y", "codeseeker", "serve", "--mcp"]
}
}
}
```
Add this to your MCP config file ([see below](#advanced-installation-options) for per-client locations) and restart your editor.
### npm global install
```bash
npm install -g codeseeker
codesee