← 返回
未分类 Key 中文

Semantic Cache

Semantic cache for LLM API calls using Redis. Caches responses by meaning, not exact match. Activate when the user wants to cache AI responses, reduce API co...
{ "answer": "使用 Redis 为 LLM API 调用提供语义缓存。按含义而非精确匹配缓存响应。适用于缓存 AI 响应、降低 API 成本。" }
rylinjames rylinjames 来源
未分类 clawhub v1.0.0 1 版本 99805.1 Key: 需要
★ 0
Stars
📥 512
下载
💾 1
安装
1
版本
#latest

概述

Semantic Cache

Cache LLM responses by meaning using Redis vector search. Similar questions return cached answers instantly instead of making expensive API calls.

How It Works

  1. User asks a question or makes an LLM request
  2. The question is embedded into a vector using OpenAI text-embedding-3-small
  3. Redis vector search finds semantically similar cached queries (cosine similarity > 0.80)
  4. Cache hit: Return the cached response instantly (~100ms)
  5. Cache miss: Pass through to the LLM, cache the response for future similar queries

Commands

Cache a query and response

node scripts/cache.js store "What is the capital of France?" "The capital of France is Paris."

Check cache for a similar query

node scripts/cache.js lookup "What's France's capital city?"

Cache stats

node scripts/cache.js stats

Clear all cached entries

node scripts/cache.js clear

Interactive mode — wraps any LLM call with caching

node scripts/cache.js query "Your question here"

This checks cache first. On miss, calls OpenAI, caches the result, and returns it.

When to Use This Skill

  • Before making any LLM API call, check if a semantically similar query was already answered
  • When building agents that answer repetitive questions (support bots, FAQ systems)
  • When you want to reduce OpenAI/Anthropic API costs by 40-80%
  • When you need faster response times for common queries

Configuration

Set these environment variables:

  • REDIS_URL — Redis connection string with vector search support (Redis Cloud or Redis Stack)
  • OPENAI_API_KEY — For generating embeddings
  • SEMANTIC_CACHE_THRESHOLD — Similarity threshold 0-1 (default: 0.80, higher = stricter matching)
  • SEMANTIC_CACHE_TTL — Cache TTL in seconds (default: 86400 = 24 hours)

Example Workflow

User: "How do I reset my password?"
  -> Embed query -> Search Redis -> MISS
  -> Call LLM -> Get response -> Cache it -> Return response

User: "I forgot my password, how do I change it?"
  -> Embed query -> Search Redis -> HIT (92.7% similar)
  -> Return cached response in 8ms (saved ~2 seconds + API cost)

Performance

  • Cache lookup: ~5-15ms (vs 1-5 seconds for LLM call)
  • Embedding generation: ~50-100ms
  • Storage per entry: ~6KB (1536-dim vector + metadata)
  • Supports millions of cached entries

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-03-31 07:10 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

dev-programming

Github

steipete
使用 `gh` CLI 与 GitHub 交互,通过 `gh issue`、`gh pr`、`gh run` 和 `gh api` 管理议题、PR、CI 运行及高级查询。
★ 677 📥 327,270
dev-programming

YouTube

byungkyu
使用托管OAuth集成YouTube Data API,支持搜索视频、管理播放列表、获取频道数据及评论互动,适用于用户需要时使用此技能。
★ 142 📥 41,550
dev-programming

CodeConductor.ai

larsonreever
AI驱动平台,提供快速全栈开发、智能体、工作流自动化及低代码AI集成的可扩展产品创建。
★ 72 📥 181,828