概述

Semantic Cache

Cache LLM responses by meaning using Redis vector search. Similar questions return cached answers instantly instead of making expensive API calls.

How It Works

User asks a question or makes an LLM request
The question is embedded into a vector using OpenAI text-embedding-3-small
Redis vector search finds semantically similar cached queries (cosine similarity > 0.80)
Cache hit: Return the cached response instantly (~100ms)
Cache miss: Pass through to the LLM, cache the response for future similar queries

Commands

Cache a query and response

node scripts/cache.js store "What is the capital of France?" "The capital of France is Paris."

Check cache for a similar query

node scripts/cache.js lookup "What's France's capital city?"

Cache stats

node scripts/cache.js stats

Clear all cached entries

node scripts/cache.js clear

Interactive mode — wraps any LLM call with caching

node scripts/cache.js query "Your question here"

This checks cache first. On miss, calls OpenAI, caches the result, and returns it.

When to Use This Skill

Before making any LLM API call, check if a semantically similar query was already answered
When building agents that answer repetitive questions (support bots, FAQ systems)
When you want to reduce OpenAI/Anthropic API costs by 40-80%
When you need faster response times for common queries

Configuration

Set these environment variables:

REDIS_URL — Redis connection string with vector search support (Redis Cloud or Redis Stack)
OPENAI_API_KEY — For generating embeddings
SEMANTIC_CACHE_THRESHOLD — Similarity threshold 0-1 (default: 0.80, higher = stricter matching)
SEMANTIC_CACHE_TTL — Cache TTL in seconds (default: 86400 = 24 hours)

Example Workflow

User: "How do I reset my password?"
  -> Embed query -> Search Redis -> MISS
  -> Call LLM -> Get response -> Cache it -> Return response

User: "I forgot my password, how do I change it?"
  -> Embed query -> Search Redis -> HIT (92.7% similar)
  -> Return cached response in 8ms (saved ~2 seconds + API cost)

Performance

Cache lookup: ~5-15ms (vs 1-5 seconds for LLM call)
Embedding generation: ~50-100ms
Storage per entry: ~6KB (1536-dim vector + metadata)
Supports millions of cached entries

版本历史

共 1 个版本

v1.0.0 当前

2026-03-31 07:10 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)

安全，无风险

查看报告

Semantic Cache

概述

Semantic Cache

How It Works

Commands

Cache a query and response

Check cache for a similar query

Cache stats

Clear all cached entries

Interactive mode — wraps any LLM call with caching

When to Use This Skill

Configuration

Example Workflow

Performance

版本历史

安全检测

腾讯云安全 (Keen)

腾讯云安全 (Sanbu)

🔗 相关推荐

Github

YouTube

CodeConductor.ai