Document-centric databases are no longer niche. Here are three practical takes: production-ready SQLite-backed search with embeddings, a lean Python doc store, and real-world doc-server indexing decisions.
Production-ready SQLite-backed search
Flamehaven FileSearch hits the deploy button fast: 5-minute setup, 100% self-hosted, REST API via FastAPI & Swagger UI. It uses SQLite as the store and Gemini embeddings for Q&A [1].
- 5-minute setup — pip install flamehaven-filesearch[api] [1]
- Self-hosted — data stays in-house [1]
- SQLite store for portability [1]
- Gemini embeddings for natural-language Q&A [1]
YaraDB is a lightweight open-source doc DB built with FastAPI and Pydantic. It offers a core engine, WAL, in-memory lookups, JSON storage, OCC, data integrity hashing, soft deletes, and batch operations [2].
- Core Engine [2]
- WAL — crash safety [2]
- In-Memory First [2]
- JSON Storage — yaradb_storage.json [2]
- OCC [2]
- Python Client — yaradb-client on PyPI [2]
Real-world indexing: tsvector vs Tantivy
The personal doc-server thread flags the indexing choice between tsvector (PostgreSQL) and standalone Tantivy for search [3].
Closing thought: for tiny, embedded setups, Flamehaven shines; for rapid prototyping, YaraDB helps; for production-grade indexing, weigh tsvector vs Tantivy by scale and language needs.
References
Production-ready, self-hosted document search with SQLite, Python SDK, REST API; Gemini embeddings; 5-minute setup; Docker-ready; vendor lock-in-free.
View sourceShow HN: YaraDB – Lightweight open-source document database built with FastAPI
Open-source lightweight document database in Python using FastAPI; features WAL, OCC, JSON storage, REST API, with indexing, replication planned.
View sourceAsk HN: Seeking advice on designing a personal document server
Explores local DB stack choices (PostgreSQL tsvector vs Tantivy), indexing, language detection, and grouping for a multi-language doc server.
View source