The SQL Post: SQL Server : Clustered Index Structures

Friday, September 16, 2011

SQL Server : Clustered Index Structures

In SQL Server, indexes are organized in the form of Binary trees (B-Tree), As a typical binary tree structure each data page is organized in the node. These nodes can be categorized in three different categories as

1. The Top Level node or Root Level node
2. Intermediate Level nodes (any node between root level and leaf level) and
3. The Bottom Level node or Leaf Level node.

In a clustered index, the leaf nodes contain the data pages while the root level and the intermediate level nodes contain index pages, which keeps the index rows. Each index row contains a key value and a pointer to either an intermediate level page in the B-tree, or a data row in the leaf level of the index. The pages in each level of the index are linked in a doubly-linked list.

By Default a Clustered Index has only one partition but it can have multiple partitions, each partition of the SQL Server contains one B-Tree. You can see the entry in the SYS.PARTITIONS system view.

Depending on the data types in the clustered index, each clustered index structure will have one or more allocation units in which to store and manage the data for a specific partition.

Partition can have multiple different type of the allocation unit depends upon the datatype used in the clustered index. Each clustered index will have at least one IN_ROW_DATA allocation unit per partition.

Here understanding allocation is important, An allocation unit is a collection of pages within a heap or B-tree used to manage data based on their page type.

There are three types of allocation units available in SQL Server Architecture, these are:

IN_ROW_DATA -- It can contain the data or index rows that contain all data, except large object (LOB) data.
LOB_DATA -- It can contain the large object data stored in one or more of these data types: text, ntext, image, xml, varchar(max), nvarchar(max), varbinary(max), or CLR user-defined types (CLR UDT).
ROW_OVERFLOW_DATA --It can contain variable length data stored in varchar, nvarchar, varbinary, or sql_variant columns that exceed the 8,060 byte row size limit.

The pages in the data chain and the rows in them are ordered on the value of the clustered index key. All inserts are made at the point where the key value in the inserted row fits in the ordering sequence among existing rows.

Information about the Index such as FileGroup, total no of allocated pages, used pages, data pages etc. can be found in the system view "SYS.SYSTEM_INTERNALS_ALLOCATION_UNITS"

SQL Server moves down the index to find the row corresponding to a clustered index key. To find a range of keys, SQL Server moves through the index to find the starting key value in the range and then scans through the data pages using the previous or next pointers. To find the first page in the chain of data pages, SQL Server follows the leftmost pointers from the root node of the index.

The above image shows the structure of a clustered index.

Note: This article has been written with the help of reference material available in Microsoft MSDN

The SQL Post

Pages

Friday, September 16, 2011

SQL Server : Clustered Index Structures

No comments:

Post a Comment