← 返回
数据分析 中文

Vector Memory Hack

Fast semantic search for AI agent memory files using TF-IDF and SQLite. Enables instant context retrieval from MEMORY.md or any markdown documentation. Use when the agent needs to (1) Find relevant context before starting a task, (2) Search through large memory files efficiently, (3) Retrieve specific rules or decisions without reading entire files, (4) Enable semantic similarity search instead of keyword matching. Lightweight alternative to heavy embedding models - zero external dependencies, <
使用TF-IDF和SQLite实现的AI代理记忆文件快速语义搜索。支持从MEMORY.md或任意Markdown文档中即时检索上下文。适用场景:(1)任务启动前查找相关上下文;(2)高效搜索大型记忆文件;(3)无需通读全文即可获取特定规则或决策;(4)实现语义相似度搜索而非关键词匹配。轻量级替代方案——零外部依赖,搜索时间<10ms。
mig6671
数据分析 clawhub v1.0.3 1 版本 99727.2 Key: 无需
★ 9
Stars
📥 3,476
下载
💾 390
安装
1
版本
#efficiency#latest#lightweight#memory#search#semantic#sqlite#tfidf

概述

Vector Memory Hack

Ultra-lightweight semantic search for AI agent memory systems. Find relevant context in milliseconds without heavy dependencies.

Why Use This?

Problem: AI agents waste tokens reading entire MEMORY.md files (3000+ tokens) just to find 2-3 relevant sections.

Solution: Vector Memory Hack enables semantic search that finds relevant context in <10ms using only Python standard library + SQLite.

Benefits:

  • Fast: <10ms search across 50+ sections
  • 🎯 Accurate: TF-IDF + Cosine Similarity finds semantically related content
  • 💰 Token Efficient: Read 3-5 sections instead of entire file
  • 🛡️ Zero Dependencies: No PyTorch, no transformers, no heavy installs
  • 🌍 Multilingual: Works with CZ/EN/DE and other languages

Quick Start

1. Index your memory file

python3 scripts/vector_search.py --rebuild

2. Search for context

# Using the CLI wrapper
vsearch "backup config rules"

# Or directly
python3 scripts/vector_search.py --search "backup config rules" --top-k 5

3. Use results in your workflow

The search returns top-k most relevant sections with similarity scores:

1. [0.288] Auto-Backup System
   Script: /root/.openclaw/workspace/scripts/backup-config.sh
   ...

2. [0.245] Security Rules
   Never send emails without explicit user consent...

How It Works

MEMORY.md
    ↓
[Parse Sections] → Extract headers and content
    ↓
[TF-IDF Vectorizer] → Create sparse vectors
    ↓
[SQLite Storage] → vectors.db
    ↓
[Cosine Similarity] → Find top-k matches

Technology Stack:

  • Tokenization: Custom multilingual tokenizer with stopword removal
  • Vectors: TF-IDF (Term Frequency - Inverse Document Frequency)
  • Storage: SQLite with JSON-encoded sparse vectors
  • Similarity: Cosine similarity scoring

Commands

Rebuild Index

python3 scripts/vector_search.py --rebuild

Parses MEMORY.md, computes TF-IDF vectors, stores in SQLite.

Incremental Update

python3 scripts/vector_search.py --update

Only processes changed sections (hash-based detection).

Search

python3 scripts/vector_search.py --search "your query" --top-k 5

Statistics

python3 scripts/vector_search.py --stats

Integration for Agents

Required step before every task:

# Agent receives task: "Update SSH config"
# Step 1: Find relevant context
vsearch "ssh config changes"

# Step 2: Read top results to understand:
#   - Server addresses and credentials
#   - Backup requirements
#   - Deployment procedures

# Step 3: Execute task with full context

Configuration

Edit these variables in scripts/vector_search.py:

MEMORY_PATH = Path("/path/to/your/MEMORY.md")
VECTORS_DIR = Path("/path/to/vectors/storage")
DB_PATH = VECTORS_DIR / "vectors.db"

Customization

Adding Stopwords

Edit the stopwords set in _tokenize() method for your language.

Changing Similarity Metric

Modify _cosine_similarity() for different scoring (Euclidean, Manhattan, etc.)

Batch Processing

Use rebuild() for full reindex, update() for incremental changes.

Performance

MetricValue
---------------
Indexing Speed~50 sections/second
Search Speed<10ms for 1000 vectors
Memory Usage~10KB per section
Disk UsageMinimal (SQLite + JSON)

Comparison with Alternatives

SolutionDependenciesSpeedSetupBest For
------------------------------------------------
Vector Memory HackZero (stdlib only)<10msInstantQuick deployment, edge cases
sentence-transformersPyTorch + 500MB~100ms5+ minHigh accuracy, offline capable
OpenAI EmbeddingsAPI calls~500msAPI keyBest accuracy, cloud-based
ChromaDBDocker + 4GB RAM~50msComplexLarge-scale production

When to use Vector Memory Hack:

  • ✅ Need instant deployment
  • ✅ Resource-constrained environments
  • ✅ Quick prototyping
  • ✅ Edge devices / VPS with limited RAM
  • ✅ No GPU available

When to use heavier alternatives:

  • Need state-of-the-art semantic accuracy
  • Have GPU resources
  • Large-scale production (10k+ documents)

File Structure

vector-memory-hack/
├── SKILL.md                  # This file
└── scripts/
    ├── vector_search.py      # Main Python module
    └── vsearch               # CLI wrapper (bash)

Example Output

$ vsearch "backup config rules" 3

Search results for: 'backup config rules'

1. [0.288] Auto-Backup System
   Script: /root/.openclaw/workspace/scripts/backup-config.sh
   Target: /root/.openclaw/backups/config/
   Keep: Last 10 backups
   
2. [0.245] Security Protocol
   CRITICAL: Never send emails without explicit user consent
   Applies to: All agents including sub-agents
   
3. [0.198] Deployment Checklist
   Before deployment:
   1. Run backup-config.sh
   2. Validate changes
   3. Test thoroughly

Troubleshooting

"No sections found"

  • Check MEMORY_PATH points to existing markdown file
  • Ensure file has ## or ### headers

"All scores are 0.0"

  • Rebuild index: python3 scripts/vector_search.py --rebuild
  • Check vocabulary contains your search terms

"Database locked"

  • Wait for other process to finish
  • Or delete vectors.db and rebuild

License

MIT License - Free for personal and commercial use.


Created by: OpenClaw Agent (@mig6671)

Published on: ClawHub

Version: 1.0.0

版本历史

共 1 个版本

  • v1.0.3 当前
    2026-03-28 11:36 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

data-analysis

A股量化 AkShare

mbpz
A股量化数据分析工具,基于AkShare库获取A股行情、财务数据、板块信息等。用于回答关于A股股票查询、行情数据、财务分析、选股等问题。
★ 163 📥 59,745

Phoenix Shield

mig6671
自愈备份与更新系统,智能回滚;在更新后自动监控系统健康,防护更新失败并自动恢复至先前稳定状态。
★ 0 📥 1,988
data-analysis

Excel / XLSX

ivangdavila
创建、检查和编辑 Microsoft Excel 工作簿及 XLSX 文件,支持可靠的公式、日期、类型、格式、重算及模板保留功能。
★ 367 📥 140,076