Semantic search over indexed local files with parent-child chunking for precise retrieval with full context.
| Component | Model | Size |
|---|---|---|
| ----------- | ------- | ------ |
| Embeddings | sentence-transformers/all-MiniLM-L6-v2 | ~80MB |
| Reranker | cross-encoder/ms-marco-MiniLM-L-6-v2 | ~80MB |
| Vector DB | ChromaDB (persistent, cosine similarity, HNSW) | varies |
| Chunking | Parent-child | — |
Memory strategy: Embedding model loaded first → freed with gc.collect() → reranker loaded → freed after scoring. This keeps peak RAM ~400MB on ARM.
All scripts must use the venv Python:
VENV=~/.local/share/local-rag/venv/bin/python
# Incremental index (default — skips unchanged files via SHA-256 hash)
$VENV ~/.openclaw/workspace/skills/lookupmark-local-rag/scripts/index.py
# Re-index from scratch
$VENV ~/.openclaw/workspace/skills/lookupmark-local-rag/scripts/index.py --reindex
# Custom paths
$VENV ~/.openclaw/workspace/skills/lookupmark-local-rag/scripts/index.py --paths ~/Documenti ~/Progetti
# Batch indexing (per-subfolder with git checkpoints, for low-RAM systems)
bash ~/.openclaw/workspace/skills/lookupmark-local-rag/scripts/index-batch.sh
# Basic query
$VENV ~/.openclaw/workspace/skills/lookupmark-local-rag/scripts/query.py "what are the termination clauses?"
# More results
$VENV ~/.openclaw/workspace/skills/lookupmark-local-rag/scripts/query.py "Falcon LLM" --top-k 30 --top-n 5
# JSON output for programmatic use
$VENV ~/.openclaw/workspace/skills/lookupmark-local-rag/scripts/query.py "transformer architecture" --json
# With timeout
$VENV ~/.openclaw/workspace/skills/lookupmark-local-rag/scripts/query.py "deep learning" --timeout 60
Options:
--top-k N — Child candidates from vector search (default: 20)--top-n N — Final parent results after reranking (default: 3)--json — JSON output--timeout N — Max seconds per query (default: 120)$VENV ~/.openclaw/workspace/skills/lookupmark-local-rag/scripts/monitor.py # Status
$VENV ~/.openclaw/workspace/skills/lookupmark-local-rag/scripts/monitor.py --watch # Auto-refresh
$VENV ~/.openclaw/workspace/skills/lookupmark-local-rag/scripts/monitor.py --log # Logs
$VENV ~/.openclaw/workspace/skills/lookupmark-local-rag/scripts/monitor.py --errors # Errors only
$VENV ~/.openclaw/workspace/skills/lookupmark-local-rag/scripts/monitor.py --git # Git checkpoints
Documents only (no code files):
.txt, .md, .csv, .json, .yaml, .yml, .toml, .tex, .bib.pdf (pdfminer.six), .docx (python-docx), .pptxExcluded: .py, .js, .sh, .ipynb, .html, .css and all code files.
.git, .venv, node_modules, __pycache__, labs, exercises, src, scripts, ablation, test*, fixtures| Path | Purpose |
|---|---|
| ------ | --------- |
~/.local/share/local-rag/chromadb/ | ChromaDB data (git repo for rollback) |
~/.local/share/local-rag/venv/ | Python venv with dependencies |
~/.local/share/local-rag/index.lock | Prevents concurrent indexing |
~/.local/share/local-rag/index-batch.log | Batch indexing log |
~/.local/share/local-rag/queries.log | Query history log |
~/Documenti/github/thesis, ~/Documenti/github/polito, ~/Documenti, ~/Scaricati.ssh, .gnupg, .env, credentials, tokens, .config/openclawindex.py — builds/rebuilds the index (incremental via SHA-256 hash check)query.py to search with natural languagemonitor.py for stats and queries.log for query history共 1 个版本