Clustered Index in SQL Server

What is a clustered index?

The clustered index is a special index which sorts and stores the data in the table or view based on their key values. There can be only one clustered index per table because the data rows themselves can be stored in only one order. If we talk in term of B-Tree design, the leaf nodes of the index store the data for the rest of the columns in the table so when a lookup is performed on this type of index there are no other structures that need to be referenced.

Why create clustered indexes?

Having clustered index in the table is beneficial; first, when querying the table, there is only required IO operations occurred except extra lookup to get any/all of the non-key column data. Second, it gives us a way to recognize the table data. If there is no clustered index on the table then data stored disorganized in a heap structure. When our data becomes defragmented due to DML operations then the only one way to fix this fragmentation is to reload all the data into a new table. With a clustered index in the table, we can run index reorganized or rebuild to address the fragmentation.

When clustered indexes should be created?

Generally, we use an index on those columns which heavily used in WHERE clause portion in the query. As we know query performs huge range scans on those columns in the WHERE clause can be beneficial from having a clustered index on these columns. In both cases, we will use fewer disk IO and in some cases (depending on the ORDER BY clause) we can save ourselves memory and CPU by not having to perform a sort since data is already ordered.

How to create a clustered index

A clustered index can be created on the table either through a primary key constraint or simply using create index statement. In both cases, we simply have to add the CLUSTERED keyword to either the ALTER TABLE or CREATE INDEX statement as shown in the below TSQL examples.

-- When we create a primary key

ALTERTABLESales.BookInventoryADDCONSTRAINTPK_BookInventory_BookID

PRIMARYKEYCLUSTERED (BookIDASC);

-- Without constraints

CREATECLUSTEREDINDEXIX_BookInventory_BookIDONSales.BookInventory(BookIDASC);

Here I will highlight of some important thing on the clustered index and will analysis of myth about the clustered index.

1- It is important to know; only one clustered index can be defined on each table in your database.

2- Changing the clustered index form one column to another column may be a bit complicated and does require some downtime. Since all non-clustered indexes on table reference the row data in the clustered index if we simply drop the current clustered index and create a new one then all of the non-clustered indexes will end up being updated twice.

3- When clustered index will be dropped, all data of the columns will move into heap it will be updating seconds time when new the new clustered index is created and all the column data move into the structure.

4- Dropped and create new index may probably increase the fragmentation which is poison for index. So we have to rebuild all the indexes.

5- Many people or new DBA think, we can create clustered index only on primary key column(s), this is not true, you can create clustered index on another column too which probably used more in WHERE clause. Suppose you have created a table and create an index on primary key which used rarely in WHERE clause while DateTime columns use mostly in the query, so it would be better if you create clustered index on DATETIME column.

6- Remember, it would be better if the clustered index column has unique data and sort in length.