Understanding Index Internal in SQL Server

Discussion in 'SQL Server' started by MinalS, Jun 12, 2015.

  1. SQL Server B-tree indexes

    The indexes in SQL Server are organized in B-tree structure. B-tree means balanced tree. In B-tree, at the top there is a root node, leaf node at bottom. The indexes have two components as leaf level and non leaf level. The non leaf levels are useful for navigation. Every leaf level index has an entry for row in the table.

    They are stored in index key order. The non leaf level node at the higher level is small than the lower level. It has a pointer to the page.

    Analyzing indexes



    The dm_db_index_physical_stats DMV

    The dm_db_index_physical_stats provides user to check has clustered index, number of non clustered indexes, large object data, etc. The complete structure of the tree can be viewed. The DMV has five parameters with default value. 21 columns are returned if the filters are not applied to row and columns and the parameter are set to defaults.

    Code:
    SELECT * FROM sys.dm_db_index_physical_stats (NULL,NULL,NULL,NULL,NULL);
    
    The parameters considered by the DMV are as mentioned:
    1. database_id: It is the first parameter to specify the number. It is embedded to specify the database name. The value NULL indicates the function returns information about databases.
    2. object_id: The id must be a numeric value. If the value NULL is specified, all the objects information is extracted.
    3. index_id: The index id for a specific table is declared. The NULL value states all the indexes are needed.
    4. partition_number: It shows the partition number. NULL value indicates all the partition information is displayed
    5. mode: It shows the level of information user wants to return during function query. The three values are:
      • LIMITED: It is fast and scans the number of pages. It scans non – leaf level of an index. Index Allocation Maps are used for heap scan.
      • SAMPLED: It returns physical characteristics on a 1 percent of all the pages in index.
      • DETAILED: It scans all the pages and returns physical characteristics for all level of the index.
    The sys.dm_db_database_page_allocations

    The DBCC IND command is replaced using the sys.dm_db_database_page_allocations. The functions has five parameters as database_id, object_id, index_id, partition_number, and mode. The mode parameter has values as LIMITED and DETAILED.

    Index creation



    The CREATE INDEX command is used for creating B-tree indexes. The syntax is:

    Code:
    
    CREATE [ UNIQUE ] [ CLUSTERED | NONCLUSTERED ] INDEX index_name
    ON <object> ( column [ ASC | DESC ] [ ,…n ] )
    [ INCLUDE ( column_name [ ,…n ] ) ]
    [ WHERE <filter_predicate> ] 
    
    
    The index name, key definition, and index for the corresponding table are mandatory fields. The INCLUDE keyword is used for adding non key columns. The index is defined over the row set. The filter WHERE <filter_predicate> is used for limiting the rows defined.

    Structure of B-tree index

    The indexed pages for B-tree are similar to the data pages. The index rows are stored. The paged have fixed size as 8KB. A non clustered index has three allocation units as IN_ROW_DATA, ROW_OVERFLOW_DATA, and LOB_DATA.

    Clustered index structure

    Once the clustered index is created, the data is sorted in the order dependent on clustering key. The row structure is similar to heap. The server must guarantee unique values.

    The unique identifier is added only to manage the uniqueness. Create a table with fixed columns and add a non unique index on it.
    Code:
    CREATE TABLE Clustered_Data
    ( Srno 		INT 	NOT NULL, 
      GENDER CHAR(2)  NOT NULL,
      Age 		INT 	NOT NULL);
    GO
    CREATE CLUSTERED INDEX C1_data_Srno ON Clustered_data (Srno);
    
    Non - clustered index structure

    The non clustered index depends on factors as definition of non clustered index key, base table structure, featured like included columns or filtered indexes.

    Non - clustered index on heap

    Consider two tables ad Result and Resultheap. The Result table contains PRIMARY KEY constraint on the StudID column. The heap table does not have the clustered index.

    Consider a Resultheap table containing various columns.

    Code:
    CREATE TABLE Resultheap
    ( StudID 	INT 		NOT NULL  IDENTITY PRIMARY KEY NONCLUSTERED,
      Name 	NCHAR(20) 	NOT NULL,
      Age		INT 		NOT NULL,
    Marks		INT		NOT NULL);
    GO
    
    The data rows in the resultheap table are data rows are similar to the Result table. The sys.dm_db_index_physical_stats function is used to view the physical size.

    Indexes on computed columns and indexed views

    The computed column is not saved in the table data. Every time it is recomputed and a row is accessed. The view maintains a SELECT statement used for accessing the data every time.

    There are some constraints that need to be followed before creating indexes on computed columns or views. The values returned from the computed columns or rows in a view must be same. For assuring the similar values, they are differentiated into three types:
    1. The tables must satisfy condition on which the view is created
    2. The session level option number must be set to specific value
    3. The functions used in the computed column or view has certain limitations.
    SET OPTIONS

    The SET options can affect the result values of an expression or predicate. The values are SET CONCAT_NULL_YIELDS_NULL ON, SET QUOTED_IDENTIFIER ON, SET ANSI_NULLS ON, SET ANSI_PADDING ON, SET ANSI_WARNINGS_ON, and SET NUMERIC_ROUNDABORT OFF.

    User can check whether the SET options are appropriately set. The function SESSIONPROPERTY is used to test settings for the connection. The value 1 indicates the setting is ON, 0 states OFF.

    Indexes on computed columns

    The indexes are created on deterministic columns. The LOB data types as text, varchar(max), or XML cannot be used as column’s data type. Once the index on computed column, the six SET options must have correct values set.

    Consider an example to set the ANSI_NULLS.

    Code:
    	SET ANSI_NULLS OFF;
    	GO
    	CREATE TABLE data(no INT, amount int);
    	GO
    	CREATE INDEX in1 on data (no);
    	GO
    
    Indexed Views

    The indexed views are useful for summarizing the aggregates of large tables. Every time user refers a view, the aggregation for producing grouped results is re-executed. When an index is created on view, the aggregated data is saved in the leaf nodes.
    The clustered index must be created on a view. The index contains the data at the leaf level, it does not implement on the view.

    Creating an indexed view

    The view is created before building an index on a view.

    Code:
    CREATE VIEW VTotalCost
    WITH SCHEMABINDING
    AS SELECT SUM ( Price * Quantity) AS Amount,
    	OrderID
    FROM OrderDetails
    GROUP BY OrderID;
    
    For creating an indexed view, a unique clustered index is created. The indexed view is as shown below:

    Code:
    CREATE UNIQUE CLUSTERED INDEX VTotalCost_Idx ON VTotalCost (OrderID);
    
    Using Indexed view

    The advantage of the indexed view is that the queries do not refer to the view directly for using the index on view. We have created VTotalCost view. Use the SELECT statement as:

    Code:
    	SELECT OrderID, totol_amount = SUM ( Price * Quantity) 
    	FROM OrderDetails
    	GROUP BY OrderID;
    
    The query optimizer checks for the calculated sum of the values for OrderID in the created VTotalCost view.

    Managing B – tree structures

    The server manages the indexes. They are inserted to the correct position in a table with clustered index. Leaf – level nodes are added to the non clustered indexes.

    Dropping indexes



    The DROP INDEX command is used to remove the indexes created using the CREATE INDEX command. For dropping the index that has constraints, use the ALTER TABLE command. For removing the PRIMARY KEY constraint, remove the FOREIGN KEY constraint.

    Rebuilding an index



    The DBCC DBREINDEX command is replaced with the replacement of index. The option helps indexes to be partitioned or moved.

    Disabling an index



    Once the index is disabled, all the changes to the data are not maintained. Only one command is used for disabling index or indexes. The REBUILD option of the ALTER INDEX command is used for re-enabling it.

    Fragmentation

    Fragmentation is the effect useful for showing the changes occurring on the indexes. Internal and external are two different types of fragmentation. Internal fragmentation means the index pages are not useful at the leaf and non leaf level. The index requires more space than required. More pages are processed and more memory of the buffer pool is used. The advantage of internal fragmentation is rows are added on pages without page splits.

    External fragmentation states that the pages or extents contain leaf level of clustered and non clustered index. The index leaf – level page has row with the next index key. The fragmentation causes page splits and the efficiency of the scans of clustered and non clustered index is reduced.

    Detecting fragmentation

    The sys.dm_db_index_physical_stats is used for returning row for every level of an index. Once the table is partitioned, every partition is considered as a table. The DMV returns row for each level of partition of each index.

    The following columns provide fragmentation information.
    1. Forwarded_record_count: The forwarded records are present only on a heap. They are found only when update needs a row of variable length columns to increase the size that does not accommodate the original location.
    2. Ghost_record_count: The ghost records contain rows that exist on a page but are removed logically. The ghost records are cleaned by the background processes. No new records can be added till the process is on. The more ghost records are present, there is internal fragmentation.
    Removing fragmentation

    The fragmentation has an affect on the performance of the query. User can remove the fragmentation. Defragmentation is used for removing the logical fragmentation from leaf level of an index. The server provides an Intent Exclusive lock on the B-tree index. The locks are on pages that are modified. The ALTER INDEX command is used for defragmentation.

    Code:
    ALTER INDEX { index_name | ALL }
    	ON <object>
    	REORGANIZE
    	[ PARTITION = partition_number ] 
    	[ WITH ( LOB_COMPACTION = { ON | OFF } ) ]
    
    The partitioned indexes are supported with REORGANIZE option. One partition can be defragment by the user.

    The fillfactor is associated with every index. The initial fillfactor value is saved with the index metadata. The server checks for the value when the defragmentation is requested. If the current fillfactor value is greater than initial value, the server reestablishes the initial fillfactor value. The fullness of the page is not reduced by the server. The compaction algorithm is used to move the pages.
     

Share This Page

  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice