Local RAG API
Local RAG API with semantic search and reranking
Overview
A lightweight & fully customizable API server for Contextual Retrieval-Augmented Generation (RAG) operations, supporting document chunking with context generation, multi-embedding semantic search, and reranking.
Modes
- Stateless RAG — Provide documents and chunks in the request
- Database RAG — Complete contextual RAG pipeline using PostgreSQL (PgVector)
Features
- 🔍 Text chunking with configurable size and overlap
- 🧠 Optional context generation using OpenAI or local models
- 📈 Flexible embedding model selection
- 🎯 Hybrid semantic search with configurable weights (60/40 content/context)
- 🔄 Cross-encoder reranking for better relevance
Search Pipeline
- Initial Retrieval — Generates query embedding, calculates cosine similarity, applies threshold
- Reranking — Cross-encoder model for more accurate relevance scoring
API Endpoints
POST /v1/chunk— Process document chunksPOST /v1/query— Search across chunksPOST /v1/store— Store document in databasePOST /v1/retrieve— Hybrid semantic searchPOST /v1/delete— Delete chunks by file_id
Tech Stack
- Node.js 18+, PostgreSQL 15+ with pgvector
- Docker support included