← 返回
数据分析
中文
Elasticsearch
Query and index Elasticsearch with proper mappings, analyzers, and search patterns.
使用恰当的映射、分析器和搜索模式查询和索引 Elasticsearch。
ivangdavila
数据分析
clawhub
v1.0.0 1 版本 99859 Key: 无需
#latest
概述
Mapping Mistakes
- Always define explicit mappings—dynamic mapping guesses wrong (first "123" makes field integer, later "abc" fails)
text for full-text search, keyword for exact match/aggregations—using text for IDs breaks filters- Can't change field type after indexing—must reindex to new index with correct mapping
- Set
dynamic: "strict" to reject unmapped fields—catches typos in field names
Text vs Keyword
text is analyzed (tokenized, lowercased)—"Quick Brown" matches search for "quick"keyword is exact bytes—"Quick Brown" only matches exactly "Quick Brown"- Need both? Use multi-field:
"title": { "type": "text", "fields": { "raw": { "type": "keyword" }}} - Sort/aggregate on
title.raw, search on title
Query vs Filter Context
- Query context calculates relevance score—expensive, use for search ranking
- Filter context is yes/no—cacheable, use for exact conditions (status, date ranges)
- Combine:
bool.must for scoring, bool.filter for filtering without scoring - Range queries on dates/numbers almost always belong in filter, not query
Analyzers
standard analyzer lowercases and removes punctuation—fine for most textkeyword analyzer keeps exact string—use for codes, SKUs, emails- Language analyzers (
english) stem words—"running" matches "run" - Test analyzer with
_analyze endpoint before indexing—surprises in production hurt
Nested vs Object
- Object type flattens arrays—
{"tags": [{"key":"a","val":1}, {"key":"b","val":2}]} becomes tags.key: [a,b], tags.val: [1,2] - Flattened loses association—query
key=a AND val=2 incorrectly matches above - Use
nested type to preserve object boundaries—requires nested query wrapper - Nested is expensive—avoid for high-cardinality arrays
Pagination Traps
from + size limited to 10,000 hits—deep pagination failssearch_after for deep pagination—requires consistent sort, typically _id- Scroll API for bulk export—keeps point-in-time view, but ties up resources
- Don't use scroll for user pagination—search_after is correct choice
Bulk Operations
- Never index documents one-by-one—use
_bulk API, 5-15MB batches - Bulk format: newline-delimited JSON, action line then document line
- Check response for partial failures—bulk can succeed overall with individual doc errors
- Set
refresh=false during bulk loads—refresh after batch completes
Performance
_source: false with stored_fields if you don't need full document—reduces I/O- Use
filter for cacheable conditions—Elasticsearch caches filter results - Avoid leading wildcards (
*term)—forces full scan; use reverse field for suffix search profile: true shows query execution breakdown—find slow clauses
Sharding
- Shard size 10-50GB optimal—too small = overhead, too large = slow recovery
- Number of shards fixed at creation—can't reshard without reindexing
- Replicas for read throughput and availability—set based on query load
- Start with 1 shard for small indices—over-sharding kills performance
Index Management
- Use index templates—new indices get consistent mappings and settings
- Use aliases for zero-downtime reindexing—point alias to new index after reindex
- ILM (Index Lifecycle Management) for time-series—auto-rollover, delete old indices
- Close unused indices to free memory—closed index uses no heap
Aggregations
terms agg needs keyword field—text fields fail or give garbage- Default
size: 10 on terms agg—increase to get all buckets, or use composite - Cardinality is approximate (HyperLogLog)—exact count requires scanning all docs
- Nested aggs require
nested wrapper—matches nested query pattern
Common Errors
- "cluster_block_exception"—disk > 85%, cluster goes read-only; clear disk, reset with
_cluster/settings - "version conflict"—concurrent update; retry with
retry_on_conflict or use optimistic locking - "circuit_breaker_exception"—query uses too much memory; reduce aggregation scope
- Mapping explosion from dynamic fields—set
index.mapping.total_fields.limit and use strict mapping
版本历史
共 1 个版本
-
v1.0.0
当前
2026-03-28 22:51 安全 安全
安全检测
腾讯云安全 (Sanbu)
安全,无风险
查看报告
🔗 相关推荐
data-analysis
udiedrichsen
{"answer":"基于雅虎财经数据,分析股票与加密货币。支持投资组合管理、自选股预警、股息分析、8维评分、热门趋势扫描及传闻/早期信号探测。适用于股票分析、持仓追踪、财报异动、加密监控、热门股追踪或提前发掘非主流传闻。"}
★ 270
📥 56,990
data-analysis
mbpz
A股量化数据分析工具,基于AkShare库获取A股行情、财务数据、板块信息等。用于回答关于A股股票查询、行情数据、财务分析、选股等问题。
★ 165
📥 60,064
ai-intelligence
ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,358
📥 318,500