### 环境准备

$git clone https://github.com/datacharmer/test_db.git$ mysql -u username -p < employees.sql


mysql> desc employees;
+------------+---------------+------+-----+---------+-------+
| Field      | Type          | Null | Key | Default | Extra |
+------------+---------------+------+-----+---------+-------+
| emp_no     | int(11)       | NO   | PRI | NULL    |       |
| birth_date | date          | NO   |     | NULL    |       |
| first_name | varchar(14)   | NO   |     | NULL    |       |
| last_name  | varchar(16)   | NO   |     | NULL    |       |
| gender     | enum('M','F') | NO   |     | NULL    |       |
| hire_date  | date          | NO   |     | NULL    |       |
+------------+---------------+------+-----+---------+-------+
6 rows in set (0.01 sec)

mysql> select count(*) from employees;
+----------+
| count(*) |
+----------+
|   300024 |
+----------+


### 预备知识

An index is a distinct structure in the database that is built using the create index statement. It requires its own disk space and holds a copy of the indexed table data. That means that an index is pure redundancy. Creating an index does not change the table data; it just creates a new data structure that refers to the table. A database index is, after all, very much like the index at the end of a book: it occupies its own space, it is highly redundant, and it refers to the actual information stored in a different place.

The database combines two data structures to meet the challenge: a doubly linked list and a search tree. These two structures explain most of the database’s performance characteristics.

Databases use doubly linked lists to connect the so-called index leaf nodes. Each leaf node is stored in a database block or page; that is, the database’s smallest storage unit. All index blocks are of the same size —typically a few kilobytes. The database uses the space in each block to the extent possible and stores as many index entries as possible in each block. That means that the index order is maintained on two different levels: the index entries within each leaf node, and the leaf nodes among each other using a doubly linked list.

An index is pure redundancy. It contains only data that is also stored in the table. During write operations, the database must keep those redundancies consistent. Specifically, it means that insert , delete and update not only affect the table but also the indexes that hold a copy of the affected data.

An update statement must relocate the changed index entries to maintain the index order. For that, the database must remove the old entry and add the new one at the new location. The response time is basically the same as for the respective delete and insert statements together.

#### EXPLAIN

EXPLAIN 语句可以提供 MySQL 执行时的相关信息，可以从中得出是否有使用索引
PS 我这里使用 mycli 了，因为 mysql clint 用起来有点别扭

mysql> explain select first_name from employees where emp_no=20392\G
***************************[ 1. row ]***************************
id            | 1
select_type   | SIMPLE
table         | employees
partitions    | <null>
type          | const
possible_keys | PRIMARY
key           | PRIMARY
key_len       | 4
ref           | constze
rows          | 1
filtered      | 100.0
Extra         | <null>


type 字段从好到差依次为 system,const,eq_ref,ref,fulltext,ref_or_null,index_merge,unique_subquery,index_subquery,range,index,ALL 除了 ALL 之外，其他的有用到到索引。比如 const 代表使用唯一索引(UNIQUE)或者主键(PRIMARY KEY);rows 是执行计划中估算的扫描行数

### Concatenated Indexes

CREATE INDEX employee_name ON employees (first_name, last_name)

mysql> explain select hire_date from employees where first_name='Vishu'\G
***************************[ 1. row ]***************************
id            | 1
select_type   | SIMPLE
table         | employees
partitions    | <null>
type          | ref
possible_keys | employee_name
key           | employee_name
key_len       | 16
ref           | const
rows          | 240
filtered      | 100.0
Extra         | <null>


mysql> explain select hire_date from employees where last_name='Azumi'\G
***************************[ 1. row ]***************************
id            | 1
select_type   | SIMPLE
table         | employees
partitions    | <null>
type          | ALL
possible_keys | <null>
key           | <null>
key_len       | <null>
ref           | <null>
rows          | 299468
filtered      | 10.0
Extra         | Using where


The other option is to use two separate indexes, one for each column. Then the database must scan both indexes first and then combine the results. The duplicate index lookup alone already involves more effort because the database has to traverse two index trees. Additionally, the database needs a lot of memory and CPU time to combine the intermediate results.

#### Most Selective First Myth

The index definition should then mention the more selective column first so it can be used with an access predicate. That might be the origin of the “most selective first” myth but this rule only holds true if you cannot avoid a filter predicate.

### Using function in query

where 谓词条件中如果对 column 使用函数，则不会发挥处索引的效果

mysql> explain select hire_date from employees where left(first_name, 5)='Vishu'\G
***************************[ 1. row ]***************************
id            | 1
select_type   | SIMPLE
table         | employees
partitions    | <null>
type          | ALL
possible_keys | <null>
key           | <null>
key_len       | <null>
ref           | <null>
rows          | 299468
filtered      | 100.0
Extra         | Using where


1) 尽量转换输入参数而不是 column.(Do not convert the table column, instead convert the search term.)

TO_CHAR(sale_Date, 'YYYY-MM-DD') = '1970-01-01'

sale_date = TO_DATE('1970-01-01', 'YYYY-MM-DD')


2) 可借助 function-based-index 来解决，不过要注意此函数需要是纯函数，即对于相同的参数，函数产生的结果都是相同的，换句话说就是 deterministic function

CREATE INDEX emp_name ON employees (left(first_name, 5))


MySQL 没有 function-based-index 但是在 5.7 提供了 Generated Column 可以在此 column 上添加索引

mysql> create table t( id varchar(6) not null );
mysql> create index t_i on t (id);
mysql> insert into  t (id) values ('2');


mysql> explain select * from t where id = 2\G
***************************[ 1. row ]***************************
id            | 1
select_type   | SIMPLE
table         | t
partitions    | <null>
type          | index
possible_keys | t_i
key           | t_i
key_len       | 8
ref           | <null>
rows          | 1
filtered      | 100.0
Extra         | Using where; Using index


mysql> explain select * from t where id = '2'\G;
***************************[ 1. row ]***************************
id            | 1
select_type   | SIMPLE
table         | t
partitions    | <null>
type          | ref
possible_keys | t_i
key           | t_i
key_len       | 8
ref           | const
rows          | 1
filtered      | 100.0
Extra         | Using index


It is because converting a string to a number always gives an unambiguous result. This is not true the other way around. A number, formatted as text, can contain spaces, punctation, and leading zeros. A single value can be written in many ways: 42
042
0042
00042
The TOCHAR function returns only one string representation of the number. It will therefore only match the first of above listed strings. If we use TONUMBER , it matches all of them. That means there is not only a performance difference between the two variants but also a semantic difference!

SELECT ... FROM ... WHERE numeric_number = '42'


The database will consistently transform the string into a number. It does not apply a function on the potentially indexed column: a regular index will therefore work.

Use numeric types to store numbers.

### Like

mysql> explain select * from employees where first_name like '%Vishu'\G
***************************[ 1. row ]***************************
id            | 1
select_type   | SIMPLE
table         | employees
partitions    | <null>
type          | ALL
possible_keys | <null>
key           | <null>
key_len       | <null>
ref           | <null>
rows          | 299468
filtered      | 11.11
Extra         | Using where

mysql> explain select * from employees where first_name like 'Vi%'\G
***************************[ 1. row ]***************************
id            | 1
select_type   | SIMPLE
table         | employees
partitions    | <null>
type          | range
possible_keys | employee_name
key           | employee_name
key_len       | 16
ref           | <null>
rows          | 2429
filtered      | 100.0
Extra         | Using index condition


LIKE filters can only use the characters before the first wildcard during tree traversal.The remaining characters are just filter predicates that do not narrow the scanned index range. A single LIKE expression can therefore contain two predicate types: (1) the part before the first wildcard as an access predicate; (2) the other characters as a filter predicate. The more selective the prefix before the first wildcard is, the smaller the scanned index range becomes.

### Clustering Data (Second power of indexing)

#### Index Filter Predicates Used Intentionally

SELECT first_name, last_name, subsidiary_id, phone_number
FROM employees
WHERE subsidiary_id = ?
AND
UPPER(last_name) LIKE '%INA%';


#### Index-Only Scan

The index-only scan is one of the most powerful tuning methods of all. It not only avoids accessing the table to evaluate the where clause, but avoids accessing the table completely if the database can find the selected columns in the index itself. If an index prevents a table access it is also called a covering index. The performance advantage of an index-only scans depends on the number of accessed rows and the index clustering factor. The relevant factor is not how many rows the query delivers but how many rows the database must inspect to find them.

mysql> explain select last_name from employees where first_name='Vishu'\G
***************************[ 1. row ]***************************
id            | 1
select_type   | SIMPLE
table         | employees
partitions    | <null>
type          | ref
possible_keys | employee_name
key           | employee_name
key_len       | 16
ref           | const
rows          | 240
filtered      | 100.0
Extra         | Using index


Extra 显示的 Using index 含义如下

The column information is retrieved from the table using only information in the index tree without having to do an additional seek to read the actual row. This strategy can be used when the query uses only columns that are part of a single index.

### Sort (Third power of indexing)

mysql> explain select hire_date from employees order by first_name\G
***************************[ 1. row ]***************************
id            | 1
select_type   | SIMPLE
table         | employees
partitions    | <null>
type          | ALL
possible_keys | <null>
key           | <null>
key_len       | <null>
ref           | <null>
rows          | 299468
filtered      | 100.0
Extra         | Using filesort


In InnoDB, each record in a secondary index contains the primary key columns for the row, as well as the columns specified for the secondary index. InnoDB uses this primary key value to search for the row in the clustered index.

mysql> explain select hire_date from employees order by first_name limit 100\G
***************************[ 1. row ]***************************
id            | 1
select_type   | SIMPLE
table         | employees
partitions    | <null>
type          | index
possible_keys | <null>
key           | employee_name
key_len       | 34
ref           | <null>
rows          | 100
filtered      | 100.0
Extra         | <null>


select hire_date from employees where first_name like 'V%' order by last_name\G


mysql> explain select hire_date from employees order by first_name, last_name DESC limit 1\G
***************************[ 1. row ]***************************
id            | 1
select_type   | SIMPLE
table         | employees
partitions    | <null>
type          | ALL
possible_keys | <null>
key           | <null>
key_len       | <null>
ref           | <null>
rows          | 299468
filtered      | 100.0
Extra         | Using filesort


### Odd operation

WHERE ADDTIME(date_column, time_column)

WHERE ADDTIME(date_column, time_column)