MySQL SQL优化
Optimizing SELECT Statements(优化 SELECT 语句)
本文翻译自 MySQL 官方文档 8.2.1 Optimizing SELECT Statements
Queries, in the form of SELECT
statements, perform all the lookup operations in the database. Tuning these statements is a top priority, whether to achieve sub-second response times for dynamic web pages, or to chop hours off the time to generate huge overnight reports.
SELECT 语句用于执行数据库中的查找操作,优化这些语句是最重要的,无论是实现动态网页的亚秒级时间响应,还是缩短生成隔夜报告的时间。
Besides SELECT
statements, the tuning techniques for queries also apply to constructs such as CREATE TABLE...AS SELECT
, INSERT INTO...SELECT
, and WHERE
clauses in DELETE
statements. Those statements have additional performance considerations because they combine write operations with the read-oriented query operations.
除了 SELECT
语句之外,查询调优技术还适用于构造诸如 CREATE TABLE...AS SELECT
、INSERT INTO...SELECT
和 DELETE
语句中的 WHERE
条件等语句。这些语句有额外的影响性能的因素,因为它们将写入操作与读取操作结合在了一起。
NDB Cluster supports a join pushdown optimization whereby a qualifying join is sent in its entirety to NDB Cluster data nodes, where it can be distributed among them and executed in parallel. For more information about this optimization, see Conditions for NDB pushdown joins.
NDB Cluster 支持连接下推优化,其中合格的连接被完整发送到 NDB Cluster 数据节点,可以在它们之间分发并并行执行。有关此优化的更多信息,请参阅 NDB 下推联接的条件。
The main considerations for optimizing queries are:
优化查询的主要因素有:
-
To make a slow SELECT … WHERE query faster, the first thing to check is whether you can add an index. Set up indexes on columns used in the WHERE clause, to speed up evaluation, filtering, and the final retrieval of results. To avoid wasted disk space, construct a small set of indexes that speed up many related queries used in your application.
为了使慢 SELECT … WHERE 查询更快,首先要检查是否可以添加索引。在 WHERE 子句中使用的列上设置索引,以加快结果的评估、过滤和最终检索。为了避免浪费磁盘空间,请构建少量的索引来加速应用程序中多数的相关查询。
Indexes are especially important for queries that reference different tables, using features such as joins and foreign keys. You can use the
EXPLAIN
statement to determine which indexes are used for aSELECT
. See Section 8.3.1, “How MySQL Uses Indexes” and Section 8.8.1, “Optimizing Queries with EXPLAIN”.索引对于多表间的联接和外键查询尤其重要。您可以使用
EXPLAIN
语句来确定SELECT
语句使用了哪些索引。请参阅第 8.3.1 节“MySQL 如何使用索引”和第 8.8.1 节“使用 EXPLAIN 优化查询”。 -
Isolate and tune any part of the query, such as a function call, that takes excessive time. Depending on how the query is structured, a function could be called once for every row in the result set, or even once for every row in the table, greatly magnifying any inefficiency.
隔离并调整查询语句的任一部分,例如需要过多时间的函数调用。根据查询的结构,可以是结果集中的每一行都要调用一次函数,也可以是整张表只调用一次函数,从而大大提高效率。
-
Minimize the number of full table scans in your queries, particularly for big tables.
最大限度地减少查询中全表扫描的次数,尤其是对于大表。
-
Keep table statistics up to date by using the
ANALYZE TABLE
statement periodically, so the optimizer has the information needed to construct an efficient execution plan.通过定期使用
ANALYZE TABLE
语句使表统计信息保持最新,以便优化器拥有构建高效执行计划所需的信息。 -
Learn the tuning techniques, indexing techniques, and configuration parameters that are specific to the storage engine for each table. Both
InnoDB
andMyISAM
have sets of guidelines for enabling and sustaining high performance in queries. For details, see Section 8.5.6, “Optimizing InnoDB Queries” and Section 8.6.1, “Optimizing MyISAM Queries”.了解每个表的存储引擎的调优技术、索引技术和参数配置。
InnoDB
和MyISAM
都有一套用于高性能查询的准则。有关详细信息,请参阅第 8.5.6 节“优化 InnoDB 查询”和第 8.6.1 节“优化 MyISAM 查询”。 -
Adjust the size and properties of the memory areas that MySQL uses for caching. With efficient use of the
InnoDB
buffer pool,MyISAM
key cache, and the MySQL query cache, repeated queries run faster because the results are retrieved from memory the second and subsequent times.调整 MySQL 用于缓存的内存区域的大小和属性。通过高效使用
InnoDB
缓冲池、MyISAM
键缓存和 MySQL 查询缓存,重复查询的运行速度会更快,因为第二次及后续查询都会从内存中检索结果。 -
Even for a query that runs fast using the cache memory areas, you might still optimize further so that they require less cache memory, making your application more scalable. Scalability means that your application can handle more simultaneous users, larger requests, and so on without experiencing a big drop in performance.
即使对于使用缓存快速运行的查询,仍可以进一步优化,以使它们使用更少的缓存,从而使您的应用程序更具可扩展性。可扩展性意味着您的应用程序可以处理更多的并发用户、更大的请求等,而不会出现性能大幅下降。
-
Deal with locking issues, where the speed of your query might be affected by other sessions accessing the tables at the same time.
处理锁问题,查询速度可能会受到其他会话在同一时间访问表的影响。
1 WHERE Clause Optimization (WHERE 子句优化)
This section discusses optimizations that can be made for processing WHERE
clauses. The examples use SELECT
statements, but the same optimizations apply for WHERE
clauses in DELETE
and UPDATE
statements.
本节讨论可用于处理 WHERE
子句的优化。这些示例使用 SELECT
语句,但同样适用于 DELETE
和 UPDATE
语句中的 WHERE
子句。
You might be tempted to rewrite your queries to make arithmetic operations faster, while sacrificing readability. Because MySQL does similar optimizations automatically, you can often avoid this work, and leave the query in a more understandable and maintainable form. Some of the optimizations performed by MySQL follow:
您可能会尝试重写查询语句以使算法运算更快,同时牺牲可读性。因为 MySQL 会自动执行类似的优化,所以您通常可以避免这项工作,并使用更易于理解和维护的形式。 MySQL 执行的一些优化如下:
-
Removal of unnecessary parentheses(删除不必要的括号):
((a AND b) AND c OR (((a AND b) AND (c AND d)))) -> (a AND b AND c) OR (a AND b AND c AND d)
-
Constant folding(常量合并):
(a<b AND b=c) AND a=5 -> b>5 AND b=c AND a=5
-
Constant condition removal(常量条件去除):
(b>=5 AND b=5) OR (b=6 AND 5=5) OR (b=7 AND 5=6) -> b=5 OR b=6
This takes place during preparation rather than during the optimization phase, which helps in simplification of joins. See Section 8.2.1.9, “Outer Join Optimization”, for further information and examples.
这发生在准备阶段而不是优化阶段,这有助于简化连接。有关更多信息和示例,请参阅第 8.2.1.9 节“外连接优化”。
-
Constant expressions used by indexes are evaluated only once(索引使用的常量表达式仅计算一次).
-
Comparisons of columns of numeric types with constant values are checked and folded or removed for invalid or out-of-rage values:
比较数字类型列与常量值,并合并或删除无效或超出范围的值:
# CREATE TABLE t (c TINYINT UNSIGNED NOT NULL); SELECT * FROM t WHERE c ≪ 256; -≫ SELECT * FROM t WHERE 1;
See Section 8.2.1.14, “Constant-Folding Optimization”, for more information.
有关详细信息,请参阅第 8.2.1.14 节“常量合并优化”。
-
COUNT(*)
on a single table without aWHERE
is retrieved directly from the table information forMyISAM
andMEMORY
tables. This is also done for anyNOT NULL
expression when used with only one table.针对单表的没有 WHERE 的 COUNT(*) 会直接从 MyISAM 和 MEMORY 表的表信息中检索。这也适用于任何针对单表的 NOT NULL 语句。
-
Early detection of invalid constant expressions. MySQL quickly detects that some
SELECT
statements are impossible and returns no rows.及早检测无效常量表达式。 MySQL 可以快速检测到某些
SELECT
语句是无效的的并且不返回任何行。 -
HAVING
is merged withWHERE
if you do not useGROUP BY
or aggregate functions (COUNT()
,MIN()
, and so on).如果不使用 GROUP BY 或聚合函数(COUNT()、MIN() 等),HAVING 将与 WHERE 合并。
-
For each table in a join, a simpler
WHERE
is constructed to get a fastWHERE
evaluation for the table and also to skip rows as soon as possible.对于连接中的每个表,构造一个更简单的 WHERE,以对表 WHERE 进行快速评估,并尽快跳过行。
-
All constant tables are read first before any other tables in the query. A constant table is any of the following:
在查询中首先读取所有常量表。常量表是以下任意一种:
-
An empty table or a table with one row.
一张空表或一张只有一行的表。
-
A table that is used with a
WHERE
clause on aPRIMARY KEY
or aUNIQUE
index, where all index parts are compared to constant expressions and are defined asNOT NULL
.WHERE
使用主键或唯一性索引的表,其中所有索引部分都与常量表达式进行比较并定义为非空。
All of the following tables are used as constant tables:
以下所有表均视为常量表:
SELECT * FROM t WHERE primary_key=1; SELECT * FROM t1,t2 WHERE t1.primary_key=1 AND t2.primary_key=t1.id;
-
-
The best join combination for joining the tables is found by trying all possibilities. If all columns in
ORDER BY
andGROUP BY
clauses come from the same table, that table is preferred first when joining.通过尝试所有可能性找到连接表的最佳连接组合。如果
ORDER BY
和GROUP BY
子句中的所有列都来自同一个表,则在连接时首选该表。 -
If there is an
ORDER BY
clause and a differentGROUP BY
clause, or if theORDER BY
orGROUP BY
contains columns from tables other than the first table in the join queue, a temporary table is created.如果
ORDER BY
和GROUP BY
子句不同,或者ORDER BY
或GROUP BY
中的列来自连接队列中非第一个个表,则会创建临时表。 -
If you use the
SQL_SMALL_RESULT
modifier, MySQL uses an in-memory temporary table.如果使用
SQL_SMALL_RESULT
修饰符,MySQL 将使用内存临时表。 -
Each table index is queried, and the best index is used unless the optimizer believes that it is more efficient to use a table scan. At one time, a scan was used based on whether the best index spanned more than 30% of the table, but a fixed percentage no longer determines the choice between using an index or a scan. The optimizer now is more complex and bases its estimate on additional factors such as table size, number of rows, and I/O block size.
每张表都要查询索引,并使用最适合的索引,除非优化器认为表扫描更有效。过去曾经根据最佳索引是否覆盖超过表的 30% 以上来决定使用扫描,但现在已经不再用固定百分比来决定是使用索引还是扫描。优化器现在更加复杂,根据其他因素进行评估,例如表大小、行数和 I/O 块大小。
-
In some cases, MySQL can read rows from the index without even consulting the data file. If all columns used from the index are numeric, only the index tree is used to resolve the query.
在某些情况下,MySQL 可以从索引中读取行,而无需读取数据文件。如果索引中使用的所有列都是数字,则仅使用索引树来解析查询。
Some examples of queries that are very fast(一些非常快的查询示例):
SELECT COUNT(*) FROM tbl_name;
SELECT MIN(key_part1),MAX(key_part1) FROM tbl_name;
SELECT MAX(key_part2) FROM tbl_name
WHERE key_part1=constant;
SELECT ... FROM tbl_name
ORDER BY key_part1,key_part2,... LIMIT 10;
SELECT ... FROM tbl_name
ORDER BY key_part1 DESC, key_part2 DESC, ... LIMIT 10;
MySQL resolves the following queries using only the index tree, assuming that the indexed columns are numeric:
MySQL 仅使用索引树解析以下查询,假设索引列是数字:
SELECT key_part1,key_part2 FROM tbl_name WHERE key_part1=val;
SELECT COUNT(*) FROM tbl_name
WHERE key_part1=val1 AND key_part2=val2;
SELECT MAX(key_part2) FROM tbl_name GROUP BY key_part1;
The following queries use indexing to retrieve the rows in sorted order without a separate sorting pass:
以下查询使用索引按排序顺序检索行,而无需进行单独的排序:
SELECT ... FROM tbl_name
ORDER BY key_part1,key_part2,... ;
SELECT ... FROM tbl_name
ORDER BY key_part1 DESC, key_part2 DESC, ... ;
2 Range Optimization (范围查询优化)
The range
access method uses a single index to retrieve a subset of table rows that are contained within one or several index value intervals. It can be used for a single-part or multiple-part index. The following sections describe conditions under which the optimizer uses range access.
范围访问方法使用单个索引来检索包含在一个或多个索引值区间内的表行的子集。它可用于单部分或多部分索引。以下部分描述了优化器使用范围访问的条件。
- Range Access Method for Single-Part Indexes 单区间索引范围查询
- Range Access Method for Multiple-Part Indexes 多索引范围查询
- Equality Range Optimization of Many-Valued Comparisons 相等范围查询
- Skip Scan Range Access Method 跳过扫描范围查询
- Range Optimization of Row Constructor Expressions 行构造函数表达式的范围优化
- Limiting Memory Use for Range Optimization 限制内存范围优化
Range Access Method for Single-Part Indexes
For a single-part index, index value intervals can be conveniently represented by corresponding conditions in the WHERE
clause, denoted as range conditions rather than “intervals.”
对于单部分索引,索引值区间可以方便地用 WHERE 子句中的相应条件表示,表示为范围条件而不是“区间”。
The definition of a range condition for a single-part index is as follows:
单部分索引的范围条件的定义如下:
-
For both
BTREE
andHASH
indexes, comparison of a key part with a constant value is a range condition when using the=
,<=>
,IN()
,IS NULL
, orIS NOT NULL
operators.对于 BTREE 和 HASH 索引,使用
=
、<=>
、IN()
、IS NULL
或IS NOT NULL
运算符时,与常量值的比较范围。 -
Additionally, for
BTREE
indexes, comparison of a key part with a constant value is a range condition when using the>
,<
,>=
,<=
,BETWEEN
,!=
, or<>
operators, orLIKE
comparisons if the argument toLIKE
is a constant string that does not start with a wildcard character. -
For all index types, multiple range conditions combined with
OR
orAND
form a range condition.