Local LLM API
OpenAI-compatible API server for local GGUF models
Overview
An API server that provides OpenAI-compatible endpoints for running GGUF models locally. Designed for easy integration with any system that supports OpenAI's API format.
Supported Operations
- Chat Completions (like GPT-3.5/4)
- Embeddings Generation
- Document Reranking
Features
- โจ OpenAI-compatible API endpoints
- ๐ Drop-in replacement for OpenAI's client libraries
- ๐ Run models locally for privacy and cost savings
- ๐ Auto-loading and unloading of models for memory efficiency
- ๐ Organized model management by type (chat/embedding/reranking)
API Endpoints
POST /v1/chat/completionsโ Chat completionsPOST /v1/embeddingsโ Generate embeddingsPOST /v1/rerankโ Rerank documentsPOST /v1/models/loadโ Pre-load a modelPOST /v1/models/unloadโ Unload a modelGET /v1/modelsโ List available models
Tech Stack
- TypeScript + Node.js
- pnpm, runs on port 23673
- Models auto-unload after 30 minutes of inactivity