MySQL Indexes is one of the most efficient SQL tuning methods but it is neglected by many during development.
SQL performance problem is as old as SQL itself.
SQL queries are so simple and they are like reading an English sentence that doesn’t require any knowledge about the internal working of database. This level of abstraction becomes a drawback when we talk about SQL performance. This article explains about using MySQL Indexes for performance optimization
Let’s consider a simple example:
We create a table ‘employee’ and insert thousands of records.
CREATE TABLE `employee` ( `emp_id` int(11) unsigned NOT NULL AUTO_INCREMENT, `emp_name` VARCHAR(11) DEFAULT NULL, `emp_age` int(11) DEFAULT NULL, `emp_address` VARCHAR(11) DEFAULT NULL, CONSTRAINT employees_pk PRIMARY KEY (emp_id) ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
The above query creates a table ‘employee’ with the respective fields.
Now let’s look at a query, where we retrieve details of all employees named ‘John’.
SELECT * FROM employee WHERE emp_name = 'John'
Once we run the above query, what exactly happens behind the scenes?
The query should ideally return all the rows whose ‘emp_name’ field is ‘John‘. Our database would have to look at every single row in the ‘employee’ table to find all the rows wherein the ‘emp_name’ field is ‘John’. This implies we cannot stop once we find the first matching row but rather have to look through all the thousands of rows in this table and find all the rows with the name ‘John’. This method is called Full Table Scan.
Now you must be thinking, doing a Full table scan seems inefficient. It is almost like going through each row in the table and returning all matching rows.
We all know that if we use index then the query processing time will reduce. An index makes the query faster.
- What is an index?
- Why should we use index ?
- How does index work ?
- How does index improve query performance??
This blog post will try and answer the first two questions.
If you can answer all the above questions, you can skip this post.
Still There!! Good. Let’s discuss and start using Indexes.
What is an index ?
As the name denotes, index, in simple terms, is like an index of a book. Let’s consider you have a book on Java and you want to learn about “Inheritance”, then why would you flip through the entire book to find the topic. Instead, you can just goto the index of the book, which will tell you the exact pages where you can find information on “Inheritance”.
Technically, an index is a data structure that is built using ‘CREATE INDEX’ statement. Index requires its own disk space and holds a copy of the indexed table data which means it is purely redundant (like the index of a book).
Why should we use index ?
Let’s kick things off with a simple example. Let’s insert few rows (16 rows) to employee table.
INSERT INTO `employee` (`emp_id`, `emp_name`, `emp_age`, `emp_address`) VALUES (NULL, "employee1",23,""), (NULL, "employee2",25,""), .. .. (NULL, "employee16",61,"");
As discussed, since ‘emp_id’ is a primary key, index(Unique index) will be created when we create employee table.
Now let’s imagine we want to retrieve all employees with name ‘employee3’. The field ‘emp_name’ does not have an index yet.
Let’s see what happens when we create a select statement like this:
SELECT * FROM employee where emp_name = 'employee3';
The problem when you execute the above query is that MySQL checks every single record to see if ‘emp_name’ is ‘employee3’. This may be fast in this situation because we only have 16 records, but if the number of records increase, this method is extremely inefficient and slow.
Let’s discuss more in detail.
The important part here is the number of rows that were checked for when we ran the query. You guessed it right, 16 rows have been checked. Suppose, if you have 1000 rows, MySQL will have to check 1000 rows. 🙁
Now let’s try creating index for ‘emp_name’ field.
CREATE INDEX indx_name ON employee (emp_name);
Let’s run our query again now and check what goes on behind the scenes:
That’s right. Only 1 row has been examined and MySQL was able to give us all we needed because we created an index on ‘emp_name‘. If you compare the explain statements between the 2 queries, you can see that there is a change in ‘type’ column. We’ll discuss that in the next post. No Hurry! 🙂
Using index on join statements:
Let’s consider a scenario where an employee can write reviews about his company. One employee can write N number of reviews and N number of reviews will belong to one employee. (1 : N relation)
CREATE TABLE `review` ( `review_id` int(11) unsigned NOT NULL AUTO_INCREMENT, `emp_id` int(11) unsigned NOT NULL, `review_desc` VARCHAR(11) DEFAULT NULL, `created_at` DATETIME DEFAULT NULL, `updated_at` DATETIME DEFAULT NULL, PRIMARY KEY (`review_id`), KEY `review_FK` (`emp_id`), CONSTRAINT `review_FK` FOREIGN KEY (`emp_id`) REFERENCES `employee` (`id`) ON DELETE CASCADE ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Now let’s get all reviews written by employee1.
SELECT * FROM employee e, review r where r.emp_id = e.emp_id and e.emp_id = 1
This query joins 2 different tables on ‘emp_id‘. But, the point to note is that ‘review’ table doesn’t have an index on ‘emp_id‘, thereby making the join inefficient. We need a way so that MySQL can easily join these tables without having to loop through all records.
CREATE INDEX indx_emp_review ON review (emp_id);
This will increase the execution speed of the previous select query when you have more number of rows.
The explain statements clearly denotes the use of index. The number of rows scanned in the latter case is much lesser. Dont worry! As mentioned periously, we will discuss about this in detail in next post.
When should I create index?
- Create an index if you frequently want to retrieve less than about 15% of the rows in a large table.
- When you want to perform join operation, it’s better to index columns that are used to join.
Which columns should I Index ?
- Only index those columns that are required in WHERE and ORDER BY clauses.
- You should create indexes on columns that are used frequently to join tables.
- You should create indexes on columns that contain mostly unique values or if there are few duplicates.
- Columns that contain wide range of values should use regular index.
- Columns that contain small range of values should use bitmap index.(Note: We’ll discuss about regular and bitmap index in next post.)
When should indexes be avoided?
Although indexes enhance a database’s performance, there are times when they should be avoided.
- Indexes should not be used on columns that contain a high number of NULL values.
- You want to maximize insert performance. Every index on a table reduces insert and update performance because they must be updated each time the data changes.
- You are selecting a large (>10-20%) number of the rows in the table.
Remember that an index slows down inserts, updates and delete because the indexes need to be updated whenever the table is. So, the best practice is to add index for fields that are often used for a search and whose values do not change much.
Stick to this basic example:
An index on a student register number is better than one on the grades.
Hope this article gave you the basic understanding of an index. In a later article, let’s discuss about anatomy and types of index.