← 返回
未分类 Key 中文

Opensearch Vector Search

Amazon OpenSearch vector search expert knowledge base. Comprehensive guidance on vector search configuration, cluster tuning, quantization, cost optimization...
Amazon OpenSearch向量搜索专家知识库,涵盖向量搜索配置、集群调优、量化压缩、成本优化等全面指导。
norrishuang
未分类 clawhub v1.3.2 1 版本 100000 Key: 需要
★ 0
Stars
📥 487
下载
💾 0
安装
1
版本
#aws#knn#latest#opensearch#vector-search

概述

OpenSearch Vector Search Expert

> GitHub: norrishuang/opensearch-vector-search-skill

> — Issues, PRs, and new reference contributions are welcome!

Safety Notes

  • Pricing script (scripts/get_opensearch_pricing.py): Makes outbound HTTPS requests to the AWS Pricing API (pricing.us-east-1.amazonaws.com). Requires boto3 and valid AWS credentials. The script is read-only (fetches public pricing data) and does not modify any AWS resources. Only run it when the user explicitly requests cost estimation.
  • Reference examples: Code snippets in references/ contain example API calls to localhost:9200 (standard OpenSearch endpoint). These are documentation examples only — do NOT execute them automatically. Present them to the user as configuration references.
  • Cluster analyzer (scripts/analyze_cluster.py): Connects to a user-provided OpenSearch cluster and performs read-only analysis. It NEVER creates, modifies, or deletes any indices or data. Only run it when the user explicitly provides cluster credentials (URL + username/password).

Knowledge Base Structure

Read the corresponding reference file based on the question type:

Question TypeReference FileKeywords
----------------------------------------
Vector search, k-NN, HNSW, disk modereferences/vector-search.mdvector, knn, hnsw, warmup, disk mode, on_disk
Quantization techniquesreferences/quantization-techniques.mdquantization, compression, binary, byte, fp16, product quantization
Cost optimization, instance sizing, memory calcreferences/cost-optimization.mdcost, pricing, instance, memory calculation, cluster sizing, budget
Cluster tuning, JVM, thread poolsreferences/cluster-tuning.mdJVM, heap, thread pool, node role, shard allocation
Performance benchmarks, dataset sizingreferences/performance-benchmarks.mdbenchmark, QPS, latency, recall, dataset size
Indexing strategies, mappingreferences/indexing-strategies.mdindex, mapping, shard, replica, lifecycle
Query optimizationreferences/query-optimization.mdquery, filter, aggregation, cache, pagination
Optimized instances (OR1/OR2/OM2/OI2)references/optimized-instances.mdoptimized, OR1, OR2, OM2, OI2, S3 durability, indexing throughput
Live cluster analysisscripts/analyze_cluster.pyanalyze cluster, connect, diagnose, review config, health check

Core Workflows

1. Answering Vector Search Configuration Questions

  1. Read references/vector-search.md
  2. Recommend in-memory mode or disk mode based on user scenario (latency requirements, data scale, QPS)
  3. Provide specific mapping JSON configuration
  4. Recommend FAISS engine + cosine similarity + 7/8 series instances

2. Capacity Planning & Instance Sizing (Most Common Scenario)

After user provides vector count and dimensions:

  1. Read references/cost-optimization.md for memory calculation formulas and examples
  2. Calculate using the standard HNSW memory formula (source: AWS official blog):

```

Unquantized (float32):

Memory = 1.1 × (4 × d + 8 × m) × num_vectors × (replicas + 1) bytes

Quantized (FAISS engine, compressed vectors in memory):

FP16 (2x): Memory = 1.1 × (2 × d + 8 × m) × num_vectors × (replicas + 1)

Byte (4x): Memory = 1.1 × (1 × d + 8 × m) × num_vectors × (replicas + 1)

Binary 4-bit: Memory = 1.1 × (d/2 + 8 × m) × num_vectors × (replicas + 1)

Binary 2-bit: Memory = 1.1 × (d/4 + 8 × m) × num_vectors × (replicas + 1)

Binary 1-bit: Memory = 1.1 × (d/8 + 8 × m) × num_vectors × (replicas + 1)

Where: d=vector dimensions, m=HNSW connections (default 16), num_vectors=total vector count

```

  1. Apply OpenSearch node memory allocation rules:

```

JVM Heap = min(node_memory × 50%, 32GB)

Remaining memory = node_memory - JVM Heap

KNN available memory = remaining × 75% (with knn.memory.circuit_breaker.limit=70%, ~35% of node memory)

```

  1. Select instance type, ensuring total cluster KNN available memory > vector index memory requirement
  2. Run pricing script for real-time pricing (see below)

3. Cost Estimation (with Real-Time Pricing)

When user needs cost estimation:

  1. Complete capacity planning above
  2. Run pricing script for real-time prices:

```bash

python3 scripts/get_opensearch_pricing.py --region --instance-type

```

  1. Calculate monthly cost:

```

Instance cost = unit_price × node_count × (1 + replica_count)

EBS cost = capacity(GB) × $0.08 + additional IOPS charges

Total cost = Instance cost + EBS cost

```

  1. Compare cost differences across quantization options

4. Live Cluster Analysis (When User Provides Cluster Credentials)

When the user provides an OpenSearch cluster URL and credentials, use the cluster analyzer to

connect and review their vector search configuration. This is read-only — never modify the cluster.

Prerequisites: User must explicitly provide:

  • Cluster URL (e.g., https://my-cluster.us-east-1.es.amazonaws.com)
  • Username and password (basic auth), OR --no-auth for clusters without authentication

Workflow:

  1. Ask for credentials if not provided: URL, username, password
  2. Run cluster overview to get health, nodes, and k-NN index list:

```bash

python3 scripts/analyze_cluster.py --url -u -p --action cluster-overview -f pretty

```

  1. Analyze specific index if user specifies one, or pick the most important k-NN index:

```bash

python3 scripts/analyze_cluster.py --url -u -p --action index-detail --index -f pretty

```

  1. Analyze shard distribution for the target index:

```bash

python3 scripts/analyze_cluster.py --url -u -p --action shard-analysis --index -f pretty

```

  1. Run all analyses at once (for a comprehensive report):

```bash

python3 scripts/analyze_cluster.py --url -u -p --action all --index -f pretty

```

  1. Interpret the JSON output and present findings to the user:
    • Cluster health status and node resource utilization
    • Vector field configurations (engine, dimensions, HNSW params, quantization)
    • Memory estimates vs actual cluster capacity
    • Auto-generated recommendations (from the script)
  2. Provide actionable advice based on findings:
    • Suggest better engine/quantization if needed (provide example mapping JSON)
    • Suggest instance resizing if memory is over/under-provisioned
    • Suggest shard rebalancing if distribution is uneven
    • NEVER execute write operations — only provide example configurations for the user to apply

Cluster Analyzer Script Reference:

Usage:
  python3 scripts/analyze_cluster.py --url <url> -u <user> -p <pass> [options]

Actions:
  --action cluster-overview   Cluster health, nodes, k-NN stats, and all k-NN index summary (default)
  --action index-detail       Deep dive into a specific index's vector config + memory estimates
  --action shard-analysis     Shard distribution and sizing for a specific index
  --action all                Run all analyses

Options:
  --index <name>     Target a specific index (required for index-detail and shard-analysis)
  --no-auth          Connect without authentication
  --verify-ssl       Verify SSL certificates (default: skip)
  --format pretty    Human-readable JSON output

Output: JSON with these top-level keys:
  - cluster_overview: health, version, nodes (memory/CPU/JVM), knn_stats
  - knn_indices: list of all k-NN enabled indices with vector field summaries
  - index_detail/index_details: vector field configs, memory estimates, search stats
  - shard_analysis/shard_analyses: shard distribution across nodes
  - recommendations: auto-generated optimization suggestions with severity levels

Safety constraints for live cluster analysis:

  • The script is strictly read-only (uses only GET/CAT APIs)
  • NEVER create, update, or delete indices on the user's cluster
  • NEVER change cluster settings or mappings
  • Only provide example JSON configurations for the user to review and apply themselves
  • If the user asks to apply changes, provide the exact API calls/JSON but let the user execute them

Pricing Script Usage

# Query all instance prices for a region
python3 scripts/get_opensearch_pricing.py --region us-east-1

# Query specific instance type (no .search suffix needed)
python3 scripts/get_opensearch_pricing.py --region us-east-1 --instance-type r7g.xlarge

# JSON format output (for calculations)
python3 scripts/get_opensearch_pricing.py --region us-east-1 --instance-type r7g.xlarge --format json

Output fields: instance_type, vcpu, memory_gib, price_per_hour_usd, price_per_month_usd, network

Recommended Defaults

Always recommend these defaults unless user has specific requirements:

  • Engine: FAISS
  • Similarity: cosine
  • Instance family (Gen 7+ only, never recommend older generations):
  • Vector search (k-NN): r7g/r8g/r8gd (memory-optimized, lowest search latency; r8g Graviton4 ~30% faster than r7g)
  • Indexing-heavy + vector: OR2 (optimized, S3 durability, good memory-to-price ratio)
  • Indexing-heavy (no vector): OM2 (highest indexing throughput, 15% faster than OR1)
  • Large dataset with NVMe: OI2 (storage-optimized, no EBS needed)
  • Do NOT recommend: r6g, r5, m5, c5, i3, or any older instance families
  • HNSW parameters: ef_construction=512, m=16
  • Quantization preference: Byte (4x) for production, Binary (32x) for aggressive cost optimization
  • Disk mode threshold: Consider when data > 50M vectors and 100-200ms latency is acceptable

Instance Selection Decision Tree

Is this primarily a vector search (k-NN) workload?
├─ YES → r7g/r8g/r8gd (best search latency, standard EBS; prefer r8g for Graviton4)
│        └─ Need S3 durability? → OR2 (accept 10s refresh interval tradeoff)
├─ Mixed (logs + vectors) → OR2 for log nodes, r7g/r8g for vector nodes
└─ NO (logs/observability/analytics)
   ├─ Write-heavy → OM2 (highest ingest throughput)
   ├─ Balanced → OR2 (good all-around with S3 durability)
   └─ Need NVMe IOPS → OI2

Response Template

Organize cost/sizing answers in this structure:

  1. Requirements confirmation: Vector count, dimensions, QPS, latency requirements
  2. Memory calculation: Raw size → quantized size → required KNN memory
  3. Cluster configuration: Instance type × count, shards, replicas
  4. Cost estimation: Instance cost + EBS cost = monthly total
  5. Optimization suggestions: Quantization comparison, Reserved Instance discounts

版本历史

共 1 个版本

  • v1.3.2 当前
    2026-03-30 12:24 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-intelligence

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,358 📥 318,334
ai-intelligence

ontology

oswalpalash
类型化知识图谱,用于结构化智能体记忆与可组合技能。支持创建/查询实体(人员、项目、任务、事件、文档)及关联...
★ 712 📥 243,813
developer-tools

Github

steipete
使用 `gh` CLI 与 GitHub 交互,通过 `gh issue`、`gh pr`、`gh run` 和 `gh api` 管理议题、PR、CI 运行及高级查询。
★ 668 📥 324,146