An intelligent document assistant with RAG, built for private deployment
Developed by Sebastjan Rijavec at Agentic.gi
Current version: v0.3.4 | Stack: FastAPI + React + ChromaDB + LM Studio
What is Amente?
Amente is a locally deployed AI chatbot that lets users upload documents and have natural conversations grounded in their content. It uses Retrieval-Augmented Generation (RAG) to search uploaded documents and feed relevant context to an LLM, producing answers that cite their sources.
The system runs entirely on private infrastructure — no data leaves the network. Users connect their own LLM provider, whether a local model via LM Studio or a cloud API like OpenAI.
Architecture
Browser (React)
|
| HTTPS
v
Nginx (reverse proxy, /amente/)
|
v
FastAPI (backend, uvicorn)
|
+---> ChromaDB (vector store, per-user + global)
+---> LM Studio / OpenAI-compatible API (LLM)
+---> SQLite (planned: metrics)
+---> JSON file storage (users, conversations, admin defaults)
- Backend: Python / FastAPI with JWT authentication, SSE streaming, and per-user data isolation
- Frontend: React with a custom design system (Amente DS), no component library dependencies
- Embeddings: HuggingFace
all-MiniLM-L6-v2for semantic search - Vector Store: ChromaDB with per-user and global collections
- LLM: Any OpenAI-compatible endpoint (LM Studio, OpenAI, Anthropic via proxy, etc.)
Core Features
RAG Pipeline
Documents are uploaded, chunked (512 tokens, 50-token overlap), and embedded into a ChromaDB vector store. When a user asks a question, Amente performs a hybrid search:
- Semantic search — cosine similarity on embeddings, filtered by L2 distance threshold (max 1.3) to avoid irrelevant matches
- BM25 keyword search — traditional term-frequency retrieval with stop word filtering, catching exact names and terms that embedding models might miss
Results from both global and user vectorstores are merged (3 global + 4 user chunks) and injected into the LLM’s system prompt as context. The LLM response streams to the frontend via Server-Sent Events.
Supported file types: PDF, TXT, DOCX, Markdown, CSV
Memory Management System
Amente organizes documents into three distinct memory spaces:
| Space | Scope | Persistence | Managed by |
|---|---|---|---|
| Global | Shared across all users | Permanent | Admin only |
| Permanent | Per-user | Permanent | User |
| Temporary | Per-user | Ephemeral (bulk clear) | User |
This gives administrators control over shared knowledge (company policies, product docs) while letting each user maintain their own document library. Temporary space is ideal for one-off analysis — upload, query, then clear.
Message Pinning
Users can pin any message in a conversation. Pinned messages are permanently prepended to the LLM context on every subsequent query in that conversation, regardless of the sliding window. This acts as a persistent instruction layer — pin a correction, a preference, or a key fact, and the LLM will always see it.
Pinned messages are visually highlighted with an accent border, tinted background, “PINNED” chip, and a flash animation on pin.
Multi-Conversation System
- Create, switch, and delete independent chat conversations
- Conversation memory: the last 10 messages are included in LLM context for natural multi-turn dialogue
- Auto-titling: conversations are automatically named from the first user message
- Persistence: all messages stored as JSON files, surviving page refresh and server restarts
- Recent Chats sidebar: quick access to conversation history
Per-User LLM Configuration
Each user can connect to their own LLM provider directly from the settings dashboard:
- Enter a Base URL, API Key, and Model name
- Test Connection button verifies the endpoint before saving
- Reset to Defaults reverts to the admin-configured default
- Admins set the default LLM config for new users
This enables a multi-model setup: one user might use a local Gemma model for speed, another might connect to GPT-4o for capability, all within the same Amente instance.
Resolution chain: User config > Admin defaults > System .env
User Dashboard
A settings modal accessible by clicking the user name or avatar in the navigation rail:
- Profile — change display name and password
- LLM — configure personal LLM provider with connection testing
- Appearance — select theme (light, dark, contrast) and chat wallpaper (5 SVG patterns)
- Admin Defaults (admin only) — set default LLM config for new users
Theming and Personalization
- Three themes: Light (warm cream), Dark, and Contrast (high accessibility)
- Five chat wallpapers: Topo, Graph, Weave, Halftone, and Marginalia — tileable SVG patterns rendered via CSS mask for automatic theme adaptation
- Custom brand: Amente mark with light and dark variants, auto-switching based on theme
- Per-user preferences persisted server-side
Authentication and Administration
- JWT-based auth with access tokens (30 min) and refresh tokens (7 days)
- Rate limiting on login (5 attempts per 15-minute window)
- Admin panel: create users, enable/disable accounts, reset passwords, delete users
- Per-user data isolation: each user gets their own vectorstore, uploads directory, and conversation history
Design System
Amente ships with a custom design system built on CSS custom properties:
- Typography: Geist (sans) + Geist Mono
- Color model: OKLCH for perceptually uniform colors across themes
- Component primitives: Cards, buttons, chips, inputs, nav items, avatar — all theme-aware through CSS variables
- No external UI library: zero runtime dependency on component frameworks
Deployment
Amente runs on a single server with minimal infrastructure:
- Server: Ubuntu with Nginx reverse proxy
- Backend: systemd service running uvicorn
- Frontend: static build served by Nginx at
/amente/ - Data: all state in the
data/directory (users, vectorstores, uploads, conversations)
Deploy workflow:
git pull origin main
cd frontend && npm run build
sudo systemctl restart househelp-backend.service
Roadmap
| Version | Focus | Status |
|---|---|---|
| v0.1.0 | Core RAG chatbot | Done |
| v0.2.0 | UI redesign, memory spaces, auth | Done |
| v0.3.0 | Conversations, pinning, user dashboard, themes | Done |
| v0.4.0 | Metrics and consumption dashboard | Planned |
| v0.5.0 | Security hardening for public internet | Planned |
Technology Summary
| Layer | Technology |
|---|---|
| Frontend | React, TypeScript, Vite, Custom CSS (Amente DS) |
| Backend | Python, FastAPI, Pydantic, uvicorn |
| Embeddings | HuggingFace all-MiniLM-L6-v2, sentence-transformers |
| Vector Store | ChromaDB (per-user + global collections) |
| Search | Hybrid: ChromaDB semantic + BM25 keyword |
| LLM | OpenAI-compatible API (LM Studio, OpenAI, etc.) |
| Auth | JWT (access + refresh tokens), bcrypt |
| Storage | JSON files (users, conversations), ChromaDB (vectors) |
| Deployment | Nginx, systemd, Ubuntu |
Built at Agentic.gi by Sebastjan Rijavec