An intelligent document assistant with RAG, built for private deployment

Developed by Sebastjan Rijavec at Agentic.gi

Current version: v0.3.4 | Stack: FastAPI + React + ChromaDB + LM Studio


What is Amente?

Amente is a locally deployed AI chatbot that lets users upload documents and have natural conversations grounded in their content. It uses Retrieval-Augmented Generation (RAG) to search uploaded documents and feed relevant context to an LLM, producing answers that cite their sources.

The system runs entirely on private infrastructure — no data leaves the network. Users connect their own LLM provider, whether a local model via LM Studio or a cloud API like OpenAI.


Architecture

Browser (React)
    |
    | HTTPS
    v
Nginx (reverse proxy, /amente/)
    |
    v
FastAPI (backend, uvicorn)
    |
    +---> ChromaDB (vector store, per-user + global)
    +---> LM Studio / OpenAI-compatible API (LLM)
    +---> SQLite (planned: metrics)
    +---> JSON file storage (users, conversations, admin defaults)
  • Backend: Python / FastAPI with JWT authentication, SSE streaming, and per-user data isolation
  • Frontend: React with a custom design system (Amente DS), no component library dependencies
  • Embeddings: HuggingFace all-MiniLM-L6-v2 for semantic search
  • Vector Store: ChromaDB with per-user and global collections
  • LLM: Any OpenAI-compatible endpoint (LM Studio, OpenAI, Anthropic via proxy, etc.)

Core Features

RAG Pipeline

Documents are uploaded, chunked (512 tokens, 50-token overlap), and embedded into a ChromaDB vector store. When a user asks a question, Amente performs a hybrid search:

  1. Semantic search — cosine similarity on embeddings, filtered by L2 distance threshold (max 1.3) to avoid irrelevant matches
  2. BM25 keyword search — traditional term-frequency retrieval with stop word filtering, catching exact names and terms that embedding models might miss

Results from both global and user vectorstores are merged (3 global + 4 user chunks) and injected into the LLM’s system prompt as context. The LLM response streams to the frontend via Server-Sent Events.

Supported file types: PDF, TXT, DOCX, Markdown, CSV

Memory Management System

Amente organizes documents into three distinct memory spaces:

SpaceScopePersistenceManaged by
GlobalShared across all usersPermanentAdmin only
PermanentPer-userPermanentUser
TemporaryPer-userEphemeral (bulk clear)User

This gives administrators control over shared knowledge (company policies, product docs) while letting each user maintain their own document library. Temporary space is ideal for one-off analysis — upload, query, then clear.

Message Pinning

Users can pin any message in a conversation. Pinned messages are permanently prepended to the LLM context on every subsequent query in that conversation, regardless of the sliding window. This acts as a persistent instruction layer — pin a correction, a preference, or a key fact, and the LLM will always see it.

Pinned messages are visually highlighted with an accent border, tinted background, “PINNED” chip, and a flash animation on pin.

Multi-Conversation System

  • Create, switch, and delete independent chat conversations
  • Conversation memory: the last 10 messages are included in LLM context for natural multi-turn dialogue
  • Auto-titling: conversations are automatically named from the first user message
  • Persistence: all messages stored as JSON files, surviving page refresh and server restarts
  • Recent Chats sidebar: quick access to conversation history

Per-User LLM Configuration

Each user can connect to their own LLM provider directly from the settings dashboard:

  • Enter a Base URL, API Key, and Model name
  • Test Connection button verifies the endpoint before saving
  • Reset to Defaults reverts to the admin-configured default
  • Admins set the default LLM config for new users

This enables a multi-model setup: one user might use a local Gemma model for speed, another might connect to GPT-4o for capability, all within the same Amente instance.

Resolution chain: User config > Admin defaults > System .env

User Dashboard

A settings modal accessible by clicking the user name or avatar in the navigation rail:

  • Profile — change display name and password
  • LLM — configure personal LLM provider with connection testing
  • Appearance — select theme (light, dark, contrast) and chat wallpaper (5 SVG patterns)
  • Admin Defaults (admin only) — set default LLM config for new users

Theming and Personalization

  • Three themes: Light (warm cream), Dark, and Contrast (high accessibility)
  • Five chat wallpapers: Topo, Graph, Weave, Halftone, and Marginalia — tileable SVG patterns rendered via CSS mask for automatic theme adaptation
  • Custom brand: Amente mark with light and dark variants, auto-switching based on theme
  • Per-user preferences persisted server-side

Authentication and Administration

  • JWT-based auth with access tokens (30 min) and refresh tokens (7 days)
  • Rate limiting on login (5 attempts per 15-minute window)
  • Admin panel: create users, enable/disable accounts, reset passwords, delete users
  • Per-user data isolation: each user gets their own vectorstore, uploads directory, and conversation history

Design System

Amente ships with a custom design system built on CSS custom properties:

  • Typography: Geist (sans) + Geist Mono
  • Color model: OKLCH for perceptually uniform colors across themes
  • Component primitives: Cards, buttons, chips, inputs, nav items, avatar — all theme-aware through CSS variables
  • No external UI library: zero runtime dependency on component frameworks

Deployment

Amente runs on a single server with minimal infrastructure:

  • Server: Ubuntu with Nginx reverse proxy
  • Backend: systemd service running uvicorn
  • Frontend: static build served by Nginx at /amente/
  • Data: all state in the data/ directory (users, vectorstores, uploads, conversations)

Deploy workflow:

git pull origin main
cd frontend && npm run build
sudo systemctl restart househelp-backend.service

Roadmap

VersionFocusStatus
v0.1.0Core RAG chatbotDone
v0.2.0UI redesign, memory spaces, authDone
v0.3.0Conversations, pinning, user dashboard, themesDone
v0.4.0Metrics and consumption dashboardPlanned
v0.5.0Security hardening for public internetPlanned

Technology Summary

LayerTechnology
FrontendReact, TypeScript, Vite, Custom CSS (Amente DS)
BackendPython, FastAPI, Pydantic, uvicorn
EmbeddingsHuggingFace all-MiniLM-L6-v2, sentence-transformers
Vector StoreChromaDB (per-user + global collections)
SearchHybrid: ChromaDB semantic + BM25 keyword
LLMOpenAI-compatible API (LM Studio, OpenAI, etc.)
AuthJWT (access + refresh tokens), bcrypt
StorageJSON files (users, conversations), ChromaDB (vectors)
DeploymentNginx, systemd, Ubuntu

Built at Agentic.gi by Sebastjan Rijavec