1. Let's create a simple table:
CREATE TABLE IF NOT EXISTS `users` (
`id` int(11) NOT NULL auto_increment,
`birth_date` date NOT NULL,
`joined_date` timestamp NOT NULL default CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci AUTO_INCREMENT=1 ;
2. We need to insert some random data. To make it a more real-life case I inserted 1 000 000 rows, hope my notebook (which has the test mysql running) won't explode ;-)
INSERT INTO `example_table` (`name`, `birth_date`, `joined_date`) VALUES
[a few pretty much random records]
3. What we need to acomplish is the result of a simple sql query:
SELECT AVG(YEAR(`birth_date`)) FROM `users`
WHERE joined_date >= '2006-01-01 00:00:00'
First I'll try to calculate the average using the AVG() function without any indexes except the PK index. The explain plan looks something like this:
select type: simple,
table: users,
type: all,
possible keys: null,
key_len: null,
ref: null,
rows: 1000000,
extra: Using where
The "null" after the possible_keys indicates that there are no indexes to be used with this select. There is pretty much no difference if the query cache is primed or not if you have to fetch and search for the data making disk I/Os. Let us add two indexes: one for the `birth_year` field and one for the `joined_date` column and see how it looks on the explain plan:
ALTER TABLE `test`.`users` ADD INDEX `i_birth` ( `birth_date` );
ALTER TABLE `test`.`users` ADD INDEX `i_joined` ( `joined_date` );
Explain plan:
You can see that MySQL tried to find a fitting index to execute the query but still it took forever with an empty query cache, an eternity. That's clearly the wrong way to go.
Now we will create a covering index for the said query.
ALTER TABLE `test`.`users` ADD INDEX `i_birth_joined` ( `birth_date`, `joined_date` );
The index has all the fields the query needs to be executed so the explain plan :
[a few pretty much random records]
3. What we need to acomplish is the result of a simple sql query:
SELECT AVG(YEAR(`birth_date`)) FROM `users`
WHERE joined_date >= '2006-01-01 00:00:00'
First I'll try to calculate the average using the AVG() function without any indexes except the PK index. The explain plan looks something like this:
select type: simple,
table: users,
type: all,
possible keys: null,
key_len: null,
ref: null,
rows: 1000000,
extra: Using where
The "null" after the possible_keys indicates that there are no indexes to be used with this select. There is pretty much no difference if the query cache is primed or not if you have to fetch and search for the data making disk I/Os. Let us add two indexes: one for the `birth_year` field and one for the `joined_date` column and see how it looks on the explain plan:
ALTER TABLE `test`.`users` ADD INDEX `i_birth` ( `birth_date` );
ALTER TABLE `test`.`users` ADD INDEX `i_joined` ( `joined_date` );
Explain plan:
select type: simple,
table: users,
type: all,
possible keys: i_joined,
key_len: null,
ref: null,
rows: 1000000,
extra: Using where
You can see that MySQL tried to find a fitting index to execute the query but still it took forever with an empty query cache, an eternity. That's clearly the wrong way to go.
Now we will create a covering index for the said query.
ALTER TABLE `test`.`users` ADD INDEX `i_birth_joined` ( `birth_date`, `joined_date` );
The index has all the fields the query needs to be executed so the explain plan :
select type: simple,
table: users,
type: index,
possible keys: i_birth_joined,
key_len: 7,
ref: null,
rows: 1000000,
extra: Using where; Using index
The changes are quite clear. The covering index we created caused MySQL to use it not only in the search operation but also as a data provider. That speeds up thing a whole lot but it also slows down every INSERT/UPDATE operation with rebuilding a large index.
Summarizing:
While creating a covering indexes is a good practice to speed up select queries, you have to take into account the downside of building/rebuilding large indexes that cover a lot of data, the obvious extra space the indexes take up in memory and disk space, or finally the temptation to make too much indexes, instead of taking some slower queries to the background via e.g. a cron process that fills a cache.
3 comments:
just plain english please ;-)
Great post. You point out the insertion and disk space load induced by covering indexes. You might take a look at TokuDB, a storage engine for MySQL that supports very fast insertions into indexes, as well as compression. This combination makes covering indexes especially useful, because you can keep a bunch of them. Click here for a discussion of covering indexes, and some performance numbers for TokuDB. Furthermore, TokuDB supports a special kind of covering index, the clustering index, which is a clustering index on a different sort order than the primary key, and thus a covering index for any query that benefits from the chosen sort order.
Post a Comment