← 返回
未分类

SQL优化器

Professional SQL query optimization assistant. Analyzes slow queries, recommends index strategies, rewrites inefficient SQL, and explains execution plan metrics. Outputs structured Markdown tables.
Professional SQL query optimization assistant. Analyzes slow queries, recommends index strategies, rewrites inefficient SQL, and explains execution plan metrics. Outputs structured Markdown tables.
user_c1ff727a
未分类 community v1.0.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 92
下载
💾 0
安装
1
版本
#latest

概述

SQL Optimizer

Analyze, diagnose, and optimize SQL queries for better database performance.

Scope

  • Input: Any SQL query (SELECT, INSERT, UPDATE, DELETE, DDL), with optional context (table schema, index list, execution plan, database type).
  • Output: Structured Markdown table with Problem, Impact, and Suggestion columns; plus rewritten SQL where applicable.
  • Database support: MySQL, PostgreSQL, SQL Server, Oracle, SQLite, MariaDB.

Workflow

Step 1: Gather Context

When a user provides a SQL query, first identify what information is already given vs. what's missing. Ask for or infer:

  1. Database type — MySQL, PostgreSQL, SQL Server, Oracle, SQLite
  2. Table schemas — column types, NULLability, constraints
  3. Existing indexes — via SHOW INDEX, pg_indexes, sys.indexes, USER_INDEXES
  4. Execution planEXPLAIN ANALYZE, EXPLAIN (ANALYZE, BUFFERS), SET STATISTICS IO ON, DBMS_XPLAN
  5. Cardinality & data distribution — estimated row count, value distribution
  6. Query performance goals — reduce latency, reduce IO, reduce temp-disk usage, reduce lock contention

If the query alone is given without context, use SQL pattern analysis to detect common anti-patterns.

Step 2: Detect Performance Problems

Scan the query for all known anti-patterns:

CategoryAnti-patternSeverityDetection Rule
-------------------------------------------------
FilteringWHERE FUNC(col) = value (function on column)HighRegex: WHERE\s+\w+\( on column ref
FilteringWHERE col LIKE '%prefix' (leading wildcard)HighRegex: LIKE\s+'%
FilteringWHERE col IS NULL / IS NOT NULL on indexed columnMediumMost B-tree indexes don't store NULLs
FilteringImplicit type conversionHighWHERE varchar_col = 123 — check operand types
FilteringWHERE OR on different indexed columnsMediumOnly one index can be used per table ref
JoinMissing join index (FK column not indexed)HighCheck if join column has matching index
JoinJoin on expression (not raw column)HighPrevents index usage
JoinCross join without WHERE conditionCriticalAccidental Cartesian product
SubqueryCorrelated subquery executing row-by-rowHighSubquery references outer query alias
SubqueryIN (SELECT ...) returning large result setMediumMaterializes full subquery result
SubqueryNOT IN on nullable columnCriticalAlways returns empty result set
SortORDER BY on non-indexed column (no LIMIT)MediumFull sort on all matching rows
SortORDER BY RAND()HighTable scan + sort all rows
SortORDER BY with mixed ASC/DESC not matching indexMediumIndex can't satisfy sort order
AggGROUP BY on high-cardinality unindexed columnHighHash aggregate on all rows
AggHAVING col = val without aggregate functionMediumShould be WHERE filter
ScanSELECT * in OLTP queryLow-MediumUnnecessary column IO + network
ScanSELECT DISTINCT on PK or unique columnLowRedundant dedup
PageOFFSET with large valueHighRe-scans and discards all prior rows
PageLIMIT without ORDER BYLowNon-deterministic result set
CTECTE referenced multiple times causing re-evaluationMediumNot materialized in some databases
SetUNION instead of UNION ALL when no duplicatesMediumExtra sort for dedup
SetINTERSECT / EXCEPT on large setsMediumRequires full sort on both sides
LockFOR UPDATE on read-replica / high-contention tableMediumBlocks concurrent readers
LockSERIALIZABLE isolation where READ COMMITTED sufficesLowExtra locking overhead
DMLLarge batch UPDATE/DELETE without chunkingHighLong transaction, log blowup, lock escalation
DMLINSERT INTO t SELECT * without column listMediumFragile to schema changes

Step 3: Analyze Execution Plan (if provided)

Parse and explain these key plan metrics:

MetricWhat It MeansWarning Threshold
------------------------------------------
seq scanFull table scan — reads every row sequentiallyAny row > 10,000
index scanWalks B-tree leaf nodes, then fetches rows from heapFetched rows > 10% of table
index-only scanAll needed columns present in index, no heap accessAlways ideal
bitmap scan (PG)Builds bitmap of matching page locations from indexOK for moderate selectivity
rows vs. actual rowsPlanner estimate vs. actual rows processedEstimate off by > 10x
filterRows read but discarded by WHERE clauseFilter discards > 50%
sort (memory)Sort performed entirely in memory (work_mem / sort_buffer)OK under configured memory
sort (disk)Sort spilled to temp disk file (runs out of memory)Always problematic
nested loopFor each outer row, probe inner tableOuter > 1000 rows is risky
hash joinBuild hash table on one side, probe with otherOK for large unindexed joins
merge joinSort both sides, then merge-consumeGood for pre-sorted data
materializeCTE / subquery result written to temp storageRepeated access degrades perf
loopsNumber of times a plan node is executed> 1 on inner index scan = nested loop
buffers (shared hit)Pages found in buffer cacheLow = memory efficient
buffers (shared read)Pages read from disk (cache miss)High = need more shared_buffers / buffer pool
temp writtenData spilled to temp files> 0 = insufficient memory for sort/hash
planning timeTime spent in query optimization> 100ms for simple query
execution timeActual runtimeTarget depends on SLA

Execution Plan Keywords by Database

DatabaseKeywords to Watch
----------------------------
MySQLUsing temporary, Using filesort, Using where (after index), Using index condition
PostgreSQLParallel Seq Scan, Materialize, Sort Method: external merge, Rows Removed by Filter
SQL ServerTable Scan, Index Scan (not seek), Key Lookup (clustered), RID Lookup, Spool, Sort
OracleTABLE ACCESS FULL (large table), SORT (AGGREGATE), NESTED LOOPS (high cardinality)

Step 4: Output — Markdown Table

Format every diagnosis as a Markdown table with three columns:

| Problem | Impact | Suggestion |
|---------|--------|------------|
| … | … | … |

Sort rows by severity (Critical → High → Medium → Low).

Step 5: Rewrite SQL (when applicable)

After the diagnosis table, provide a rewritten query with:

  1. The original query in a SQL code block
  2. The optimized query with annotations
  3. A brief explanation of each change and its expected impact

Output Template

Always produce analysis in this structure:

1. Diagnosis

ProblemImpactSuggestion
-----------------------------

2. Rewritten Query

-- Original
SELECT * FROM orders WHERE YEAR(created_at) = 2024;

-- Optimized
SELECT * FROM orders
WHERE created_at >= '2024-01-01'
  AND created_at <  '2025-01-01';

Explanation: Replaced function-on-column YEAR(created_at) with range predicate, enabling index seek on created_at. Expected: seq scan → index range scan.

3. Index Recommendations

IndexTypeReason
---------------------
CREATE INDEX idx_orders_created ON orders(created_at)B-treeSupports the range filter in WHERE
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at)Composite (covering)Covers filter + sort in one scan

4. Execution Plan Summary (if plan provided)

OperatorEst. RowsActual RowsCostIssue
-----------------------------------------------
Seq Scan on orders100k500k12450Full table scan — missing index on WHERE column

Index Strategy Reference

Index Type Selection

Access PatternIndex TypeExample DDL
----------------------------------------
Equality on one columnB-tree single-columnCREATE INDEX ON t(col)
Equality + rangeB-tree composite, equality col firstCREATE INDEX ON t(a, b) for WHERE a=1 AND b>5
Multiple equality filtersB-tree composite, most selective firstCREATE INDEX ON t(a, b) for WHERE a=1 AND b=2
Sort order matchB-tree with matching ASC/DESCCREATE INDEX ON t(col DESC) for ORDER BY col DESC
Prefix / case-insensitive searchExpression index or trigram GiSTCREATE INDEX ON t(LOWER(col))
Full-text searchGIN (PG) / FULLTEXT (MySQL)CREATE FULLTEXT INDEX ON t(col)
JSON / JSONB field accessGIN, or B-tree on expressionCREATE INDEX ON t((json_col->>'field'))
Geospatial queriesGiST (PG) / SPATIAL (MySQL)CREATE INDEX ON t USING GIST(geo)
Covering queryB-tree with INCLUDECREATE INDEX ON t(a) INCLUDE (b, c) (PG)
Partial / filteredB-tree with WHERE clauseCREATE INDEX ON t(col) WHERE active=true

Cardinality & Selectivity

SelectivityRows MatchedRecommended Action
---------------------------------------------
High> 10%Index unlikely to help; consider scan + partition pruning
Moderate1–10%Composite index may help; verify with EXPLAIN
Selective< 1%Index very effective; ensure leading column matches filter

Index Maintenance

IssueImpactFix
--------------------
Missing index on FK columnNested loop join → seq scan on inner tableAdd index on FK column
Redundant indexesExtra write overhead, cache pressureDrop unused indexes (check pg_stat_user_indexes / sys.dm_db_index_usage_stats)
Index fragmentation (B-tree page splits)More IO per scan, larger index sizeRebuild / REINDEX during low load
Over-indexed table (many indexes on one table)Slow DML (each index updated per row)Consolidate: composite index can replace multiple single-col indexes
Stale index statisticsOptimizer chooses suboptimal planANALYZE (PG) / UPDATE STATISTICS (SQL Server) / OPTIMIZE (MySQL)

Database-Specific Optimization

MySQL

EXPLAIN ANALYZE SELECT ... \G          -- MySQL 8.0.18+, actual runtime
  • Using index = covering index, no table access — good
  • Using where = rows filtered after storage engine read — check selectivity
  • Using temporary = temp table created — avoid for large data
  • Using filesort = sort outside index — add composite index covering ORDER BY
  • Using index condition = index condition pushdown (ICP) — good, MySQL 5.6+
  • Impossible WHERE = WHERE always false — dead code
  • Covering index: If all queried columns are in the index, MySQL avoids heap lookup
  • MRR (Multi-Range Read): Sorts by PK before lookup — reduces random IO
  • BKA (Batched Key Access): Batches join lookups — reduces round trips
  • Buffer Pool: INNODB_BUFFER_POOL_SIZE should fit working set

PostgreSQL

EXPLAIN (ANALYZE, BUFFERS, TIMING) SELECT ...;
  • rows vs actual rows mismatch (> 10x) → stale stats, run ANALYZE
  • Parallel Seq Scan = normal for tables > 1 GB with sufficient max_parallel_workers_per_gather
  • Sort Method: external merge Disk → increase work_mem
  • CTEs are optimization fences (PG < 12): not inlined, always materialized. PG 12+ inlines CTEs, PG 16+ allows MATERIALIZED / NOT MATERIALIZED hint
  • NOT IN on nullable column → always returns 0 rows. Use NOT EXISTS
  • Autovacuum health critical: bloat → more page reads per scan
  • pg_stat_user_tables.n_dead_tup > 20% of n_live_tup → vacuum behind
  • pg_stat_user_indexes → check idx_scan to find unused indexes
  • pg_stat_statements → identify most time-consuming queries

SQL Server

SET STATISTICS IO ON;
SET STATISTICS TIME ON;
GO
SELECT ...;
  • PAGEIOLATCH_SH wait type = disk IO bottleneck (insufficient buffer pool)
  • LCK_M_* wait types = blocking / lock contention
  • CXPACKET wait = parallelism skew (one thread slower than rest)
  • Table Scan vs Clustered Index Scan — both full scans, but CI scan may be narrower if clustered PK is narrow
  • Key Lookup (clustered) / RID Lookup (heap) = non-covering index → expensive per-row lookups. Add INCLUDE columns
  • Missing index suggestions → green text in graphical plan, or sys.dm_db_missing_index_details
  • Parameter sniffing → plan cached for first parameter's distribution. Fix: OPTION (RECOMPILE) or OPTIMIZE FOR UNKNOWN
  • NOLOCK hint = dirty reads. Prefer READ COMMITTED SNAPSHOT database setting

Oracle

-- After query runs, get actual plan:
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS LAST +OUTLINE'));
  • TABLE ACCESS FULL — not always bad; Oracle multiblock reads are efficient for large tables
  • INDEX UNIQUE SCAN → ideally 1 row returned
  • INDEX RANGE SCAN → check number of rows scanned
  • NESTED LOOPS effective when driving set is small (< 10k rows) and inner table has unique index access
  • HASH JOIN preferred for large unindexed joins
  • SORT (AGGREGATE) without GROUP BY → aggregates all rows, verify cardinality first
  • Buffer gets vs consistent gets in AUTOTRACE TRACEONLY STATISTICS
  • SQL Profile / SQL Plan Baseline — lock in good plan without code changes
  • Star transformation — automatic rewrite for star-schema joins

Common Anti-Pattern Quick-Fix

Anti-PatternQuick FixExpected Gain
---------------------------------------
WHERE YEAR(col) = 2024WHERE col >= '2024-01-01' AND col < '2025-01-01'Index seek replaces seq scan
WHERE LIKE '%keyword'Add full-text index + MATCH ... AGAINST / @@Eliminate full scan
OFFSET 10000 LIMIT 20Keyset: WHERE id > 10000 LIMIT 20Skip 10000 discarded row reads
SELECT * in OLTPExplicit column list only needed columnsLess IO, less network, less memory
IN (SELECT ...) (large set)EXISTS (SELECT 1 ...)Short-circuits on first match
ORDER BY RAND()TABLESAMPLE / WHERE id IN (random selection)Full scan replaced by direct access
Large UPDATE/DELETE in one shotBatch in chunks of 1000 rowsReduce lock duration, log growth, blocking
OR across different indexed colsUNION ALL (each branch uses its own index)Index seek per branch instead of full scan
HAVING col = val (no aggregate)Move condition to WHEREFilter before GROUP BY, fewer rows to aggregate
Correlated subqueryRewrite as JOIN or window functionEliminate row-by-row execution
NOT IN (subquery)NOT EXISTSSafe with NULLs, often faster
Multiple CTE referencesMaterialize explicitly or use temp tableAvoid repeated evaluation

Examples

Example 1: Function on Column

Query:

SELECT * FROM orders
WHERE YEAR(created_at) = 2024
ORDER BY total_amount DESC;

Diagnosis:

ProblemImpactSuggestion
-----------------------------
YEAR(created_at) prevents index usageFull table scan forced, index on created_at unusableReplace with range predicate: created_at >= '2024-01-01' AND created_at < '2025-01-01'
ORDER BY total_amount not indexedFilesort on all matching rows (~500k)Add composite index: (created_at, total_amount)
SELECT * returns all columnsExtra IO on large text/blob columnsList only id, user_id, total_amount, status

Rewritten:

SELECT id, user_id, total_amount, status
FROM orders
WHERE created_at >= '2024-01-01'
  AND created_at <  '2025-01-01'
ORDER BY total_amount DESC;

Index:

CREATE INDEX idx_orders_created_amount ON orders(created_at, total_amount DESC);

Example 2: Missing Join Index

Query:

SELECT u.name, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2024-06-01'
GROUP BY u.id, u.name;

Diagnosis:

ProblemImpactSuggestion
-----------------------------
orders.user_id not indexedNested loop join seq-scans orders for each matching userAdd index: CREATE INDEX idx_orders_user_id ON orders(user_id)
GROUP BY u.id, u.nameExtra sort/hash on all columns in SELECTIn MySQL, GROUP BY on PK suffices if name is functionally dependent

Rewritten:

SELECT u.name, o.order_count
FROM users u
LEFT JOIN (SELECT user_id, COUNT(*) AS order_count
           FROM orders
           GROUP BY user_id) o ON u.id = o.user_id
WHERE u.created_at > '2024-06-01';

Index:

CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_users_created ON users(created_at);

版本历史

共 1 个版本

  • v1.0.0 Initial release 当前
    2026-05-17 17:39 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

developer-tools

Github

steipete
使用 `gh` CLI 与 GitHub 交互,通过 `gh issue`、`gh pr`、`gh run` 和 `gh api` 管理议题、PR、CI 运行及高级查询。
★ 672 📥 324,437
ai-intelligence

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,362 📥 318,920
ai-intelligence

self-improving agent

pskoett
捕获经验教训、错误和纠正,以实现持续改进。使用时机:(1)命令或操作意外失败;(2)用户纠正……
★ 4,061 📥 799,115