com.vaadin/docs-mcp

Provides Vaadin Documentation and help with development tasks

ā˜… 13MITother

Install

Config snippet generator goes here (5 client tabs)

README

# Vaadin Documentation RAG Service


A sophisticated, hierarchically-aware Retrieval-Augmented Generation (RAG) system for Vaadin documentation that understands document structure, provides framework-specific filtering, and enables intelligent parent-child navigation through documentation sections.

## šŸŽÆ Project Overview

This project provides an advanced RAG system with enhanced hybrid search that:

- **Understands Hierarchical Structure**: Navigates parent-child relationships within and across documentation files
- **Enhanced Hybrid Search**: Combines semantic and intelligent keyword search with native Pinecone reranking for superior relevance
- **Framework Filtering**: Intelligently filters content for Vaadin Flow (Java) vs Hilla (React) frameworks
- **Agent-Friendly**: Provides MCP (Model Context Protocol) server for seamless IDE assistant integration
- **Production Ready**: Clean architecture with dependency injection, comprehensive testing, and error handling

## šŸ—ļø Architecture

```
vaadin-documentation-services/
ā”œā”€ā”€ packages/
│   ā”œā”€ā”€ core-types/              # Shared TypeScript interfaces
│   ā”œā”€ā”€ 1-asciidoc-converter/    # AsciiDoc → Markdown + metadata extraction
│   ā”œā”€ā”€ 2-embedding-generator/   # Markdown → Vector database with hierarchical chunking
│   └── mcp-server/              # MCP server with hierarchical navigation
ā”œā”€ā”€ package.json                 # Bun workspace configuration
└── PROJECT_PLAN.md             # Complete project documentation
```

### Data Flow

```mermaid
flowchart TD
    subgraph "Step 1: Documentation Processing"
        VaadinDocs["šŸ“š Vaadin Docs<br/>(AsciiDoc)"] 
        Converter["šŸ”„ AsciiDoc Converter<br/>• Framework detection<br/>• URL generation<br/>• Markdown output"]
        Processor["⚔ Embedding Generator<br/>• Hierarchical chunking<br/>• Parent-child relationships<br/>• OpenAI embeddings"]
    end

    subgraph "Step 2: Agent Integration"
        Pinecone["šŸ—„ļø Pinecone Vector DB<br/>• Rich metadata<br/>• Hierarchical relationships<br/>• Framework tags"]
        MCP["šŸ¤– MCP Server<br/>• search_vaadin_docs<br/>• get_full_document<br/>• Full document retrieval"]
        IDEs["šŸ’» IDE Assistants<br/>• Context-aware search<br/>• Hierarchical exploration<br/>• Framework-specific help"]
    end

    VaadinDocs --> Converter
    Converter --> Processor
    Processor --> Pinecone
    Pinecone <--> MCP
    MCP <--> IDEs

    classDef processing fill:#e1f5fe,stroke:#01579b,stroke-width:2px
    classDef storage fill:#f3e5f5,stroke:#4a148c,stroke-width:2px
    classDef api fill:#e8f5e8,stroke:#2e7d32,stroke-width:2px
    classDef agent fill:#fff3e0,stroke:#e65100,stroke-width:2px

    class VaadinDocs,Converter,Processor processing
    class Pinecone storage
    class MCP api
    class IDEs agent
```

## ✨ Key Features

### šŸ” Intelligent Search
- **Enhanced Hybrid Search**: Combines semantic similarity with intelligent keyword extraction and scoring
- **Native Pinecone Reranking**: Uses Pinecone's bge-reranker-v2-m3 for optimal result ranking
- **Framework Awareness**: Filters Flow vs Hilla content with common content inclusion
- **Query Preprocessing**: Smart keyword extraction with stopword filtering for better search quality

### 🌳 Hierarchical Navigation
- **Parent-Child Relationships**: Navigate from specific details to broader context
- **Cross-File Links**: Understand relationships between different documentation files
- **Context Breadcrumbs**: Maintain navigation context for better user experience

### šŸŽ›ļø Developer Experience
- **MCP Integration**: Standardized protocol for IDE assistant integration
- **TypeScript**: Full type safety across all packages
- **Comprehensive Testing**: Unit tests, integration tests, and hierarchical workflow validation
- **Clean Architecture**: Dependency injection and interface-based design

## šŸš€ Quick Start

### Prerequisites
- [Bun](https://bun.sh/) runtime v1.3.6
- OpenAI API key (for embeddings)
- Pinecone API key and index

### Installation
```bash
# Clone and install dependencies
git clone https://github.com/vaadin/vaadin-documentation-services
cd vaadin-documentation-services
bun install
```

### Environment Setup
```bash
# Create .env file with your API keys
echo "OPENAI_API_KEY=your_openai_api_key" > .env
echo "PINECONE_API_KEY=your_pinecone_api_key" >> .env
echo "PINECONE_INDEX=your_pinecone_index" >> .env
```

### Running the System

#### 1. Process Documentation (One-time setup)
```bash
# Convert AsciiDoc to Markdown with metadata
cd packages/1-asciidoc-converter
bun run convert

# Generate embeddings and populate vector database
cd ../2-embedding-generator
bun run generate
```

#### 2. Use MCP Server with IDE Assistant
The MCP server is deployed and available remotely via HTTP transport at:
**`https://mcp.vaadin.com/`**

Configure your IDE assistant to use the Streamable HTTP transport:
```javascript
import { StreamableHTTPClientTransport } from "@modelcontextprotocol/sdk/client/streamableHttp.js";
const transport =