29 April 2016

SQL Performance: Indexes

We all have found performance issues doing queries more or less complex. It’s usual when we are facing this kind of issues the first idea is that are missing some indexes, but it’s always the problem? If we look at the definition of index:

“An alphabetized list of names, places, and subjects treated in a printed work, giving the page or pages on which each item is mentioned.”, probably we would say “definitely”, although the database’s indexes work on the same way which book’s indexes, if we don’t apply them correctly, instead of having performance improvement we might have the risk of don’t have any improvement or even a performance decrease. Looks awkward? Lets understand a little bit better how the indexes work.

B-tree Indexes

The B-TREE indexes, like the name says, have a balanced tree structure, this tree has an ordered list of values, divided into ranges, this is an excellent option for a wide range of queries that you usually do, for instance to find range of values (dates, ages, etc.) or even exact match. This is the index which is created when you execute the command “CREATE INDEX MY_INDEX_NAME ON MY_TABLE (PRIMARY_KEY)”.

You can look to this kind of indexes, like an inverted tree, this tree has three blocks: root, branches and leafs.

The upper block of the tree is the root, the root is connected to a several branches, and this branches are connected to several leafs. The leafs in this kind of index have the value of the index, for instance the primary key, and the ROWID. This allows knowing the direct localization of the row through the index.

You can say that this index is balanced because all the blocks of leafs have the same depth, this do that the retrieval of any record from anywhere take approximately the same amount of time.

One of the problems regarding the use of this type of indexes is that when you update the table, this requires the index maintenance, what it is an operation that have a high cost, because this is an ordered index and for that, if you want insert a new record in a leaf which don’t have more free space, it would be necessary create a new leaf and redistribute all the entries of the previous leaf to the new leaf and create a new branch, which also couldn’t have free space and you would need to create a new branch and update all the pointers. This is one of the main reasons that why you should disable all the indexes when you are doing ETL processes, once that the performance of DML operation can be highly affected with indexes’ maintenance. Although, if the index is being applied to a number sequence, for instance numeric primary keys, the maintenance of this kind of indexes is low, because, the new values will be always higher than the old ones, so the tree will grow up only to the right, it isn’t needed to do reorders.

BITMAP index

The bitmap indexes are a lot different of b-tree. These indexes create a bi-dimensional structure where each row represents all the distinct values within the index and the columns represent all the rows of the table. The following picture shows an indexed column through bitmap index:

When you try to get the all the values “AAA”, the database is able to identify easily which lines have the value that you are looking for, and retrieve the values. This index unlike the b-tree is able to index null values.

In conclusion, this index should be used when you have a low cardinality. You can measure the low cardinality if the distinct values on the column that you want index are lower than 1% of the total of rows on the table, but if you have values duplicated more than 100 times, you should take in consideration use this index.

The update of this index is heavier than the b-tree, in consequence of this when you try to update one table which is using this index, you will have some performance issues, so you shouldn’t use this type of index in tables with high volatility.

B-tree vs bitmap

To compare these two indexes was created the table “Employees”, having four columns: “EMPNO”, “ENAME”, “SAL” e “GENDER”:

Was inserted one million of records (randomly, the gender only contains the values “m”, “f” or “null”), to verify in a more efficient way the utilization of this indexes, because if you use a table with few records, you wouldn’t be able to see too many differences.

The first step is a creation of a index on the field gender. To do comparisions in parallel was create the table employees2, which is a copy of the first one, having the same values. This field has a low cardinality, so it is an excellent candidate to have a bitmap index, but lets see the differences of index this field with a bitmap index and a b-tree index.

Lets verify the space which these indexes are using:

 

Like you can see, the bitmap index is using a lot less space that b-tree, because the b-tree creates an entire tree, to index only 2 distinct values (yes, only 2, because the b-tree doesn’t index null values).

Lets see the performance:

> BITMAP Index

> B-TREE Index

You can verify that we are able to get the same results faster when you use the bitmap index, like the theory above shows.

On the next step, lets use the ENAME column,

If you see the space used by each index, you can conclude that in this case the bitmap index uses more space than the b-tree.

This happens because we have a bi-dimensional table of 1 million rows over 1 million columns, because you have 1 million distinct records on the column ename.

In terms of performance you can see that the results are almost the same. Here is better use b-tree, because of the space used by each index and because DML operations.

 

> Índice BITMAP:

> Indice B-TREE

Lets see how the indexes can influence the DML opeartions:

> BITMAP Index

Were inserted 1 million of records on the table which has the bitmap indexes, and you can verify that the insertion took more than 7 minutes.

 

> Índice B-TREE

If you do the same insertion on the table which has the b-tree index, you can verify that the insertion took a lot less time, only 2 minutes.

Conclusion

The indexes can improve the performance, but only when are correctly used. It is important evaluate which index should be used and to do some performance tests to verify that exists advantages in having indexes. If you don’t have a good increase of performance, you shouldn’t create an index, because the DML operation will take more time and you are using unnecessary space.

.

.

.

.

      Gonçalo Grilo
        Consultant
Blog