SQL Server Indexes


This post is a discussion of the basics of indexes in SQL Server. It is based partly on a article by Robert Sheldon on 25 November 2008. That’s almost 10 years ago but when it comes to indexes, not much has changed, and the basics have not changed. The article is called SQL Server Index Basics.

Microsoft Docs Online has a list of the index types in an article called Indexes.

Indexes are SQL Server’s mechanisms for optimizing access to data. The indexes of underlying relational tables are very similar in purpose to an index section at the back of a book. There is also a cost associated with indexes; that is, indexes slow down insert, update, and delete operations.

Indexes are created on columns in tables or views. For example, if you create an index on the primary key and then search for a row of data based on one of the primary key values, SQL Server:

  1. finds that value in the index
  2. and then uses the index to quickly locate the entire row of data

Without the index, a table scan would have to be performed in order to locate the row, which can have a significant effect on performance.

You can create indexes on most columns in a table or a view. The exceptions are primarily those columns configured with large object (LOB) data types, such as image, text, and varchar(max).

An index is made up of a set of pages (index nodes) that are organized in a B-tree structure. This structure is hierarchical in nature, with the root node at the top of the hierarchy and the leaf nodes at the bottom, as shown.

When a query is issued against an indexed column, the query engine starts at the root node and navigates down through the intermediate nodes, with each layer of the intermediate level more granular than the one above. The leaf node will contain either the entire row of data or a pointer to that row, depending on whether the index is clustered or nonclustered.

A clustered index stores the actual data rows at the leaf level of the index. A table that has a clustered index is referred to as a clustered table. A table that has no clustered index is referred to as a heap. An important characteristic of the clustered index is that the indexed values are sorted in either ascending or descending order. As a result, there can be only one clustered index on a table or view. Microsoft states that “With few exceptions, every table should have a clustered index”.

Unlike a clustered indexed, the leaf nodes of a nonclustered index contain only the values from the indexed columns and row locators that point to the actual data rows, rather than contain the data rows themselves. This means that the query engine must take an additional step in order to locate the actual data. A nonclustered index has no effect on how the data in the table will be stored, it exists separately from the table. You can create multiple nonclustered indexes on a table.

Index Types

An index can be clustered or nonclustered. In addition, an index can be composite or unique. A composite index contains more than one column. Both clustered and nonclustered indexes can be composite indexes. A unique index is an index that ensures the uniqueness of each value in the indexed column. If the index is a composite, the uniqueness is enforced across the columns as a whole, not on the individual columns. For example, if you were to create an index on the FirstName and LastName columns in a table, the names together must be unique, but the individual names can be duplicated.

When you define a primary key constraint on one or more columns, SQL Server by default creates a unique, clustered index if a clustered index does not already exist on the table or view. However, you can override the default behavior and define a unique, nonclustered index on the primary key.

When you define a unique constraint, SQL Server automatically creates a unique, nonclustered index, unless you specify that a unique clustered index be created with the CLUSTERED keyword. This will only be successful if a clustered index does not already exist on the table. There can be at most one clustered index on a table or view.

A heap is a table that has no clustered index. The structure is called a heap because the data is not organized in any order, it is unordered. It is laid out as a bunch of pages and extents.

Nonclustered Index – from Books Online

Here is Microsoft’s definintion of a nonclustered index. Please read it slowly and carefully. “A nonclustered index can be defined on a table or view with a clustered index or on a heap. Each index row in the nonclustered index contains the nonclustered key value and a row locator. This locator points to the data row in the clustered index or heap having the key value. The rows in the index are stored in the order of the index key values, but the data rows are not guaranteed to be in any particular order unless a clustered index is created on the table.”

Clustered Index – from Books Online

Microsoft says: “A clustered index sorts and stores the data rows of the table or view in order based on the clustered index key. The clustered index is implemented as a B-tree index structure that supports fast retrieval of the rows, based on their clustered index key values.”