indexes

Number of pages in a table

How to find data file pages for a table:

SELECT
        T.NAME AS table_name,
        P.rows AS no_rows,
        SUM(AU.total_pages) AS total_pages,
        SUM(AU.used_pages) AS used_pages,
        (SUM(AU.total_pages) - SUM(AU.used_pages)) AS unused_pages
FROM
        sys.tables T
        INNER JOIN sys.indexes I ON T.OBJECT_ID = I.object_id
        INNER JOIN sys.partitions P ON I.object_id = P.OBJECT_ID AND I.index_id = P.index_id
        INNER JOIN sys.allocation_units AU ON P.partition_id = AU.container_id
GROUP BY
        T.name, P.rows
ORDER BY total_pages DESC

Using Indexes Effectively

Adding indexes on a specific table doesn't guarantee you the queries will use these indexes effectively. These are some of the query design rules you should follow to improve the use of indexes:

Avoid nonsargable search conditions
A sargable predicate in a query is one in which an index can be used.

Sargable =, >, >=, <, <=
BETWEEN
LIKE 'text%'
Nonsargable <>, !=
NOT EXISTS
NOT IN
OR
LIKE '%text'

Avoid arithmetic operators on the WHERE clause column
For example, assuming an index has been created on NoWeeks column, the use of the multiplication operator column in the first query prevents the optimizer from using (choosing) the index on the column:

-- the index will not be used
WHERE NoWeeks * 7 > @NoDays

-- the index will be used
WHERE NoWeeks > @NoDays / 7

Avoid functions on the WHERE clause column

-- the index will not be taken into account
WHERE SUBSTRING(CustomerName, 1, 1) = 'F' ;

-- the index will be taken into account
WHERE CustomerName LIKE 'F%' ;

Index fragmentation resolutions

After you detected fragmentation of an index, you can "resolve" fragmentation in an index by rearranging the index rows and pages so that their physical and logical orders match.

You achieve this through the following techniques:
1. Dropping and re-creating the index
If the index being dropped is a clustered index, then all the nonclustered indexes on the table have to be rebuilt after the cluster is dropped.


2. Creating the index with the DROP_EXISTING clause


3. Executing the ALTER INDEX REBUILD statement on the index
To be performed in case the fragmentation is over 30%.

ALTER INDEX <IX_IndexName> ON <TableName>
REBUILD;

or rebuild all indexes of a specified table:

ALTER INDEX ALL ON <TableName>
REBUILD;


4. Executing the ALTER INDEX REORGANIZE statement on the index
To be performed in case the index fragmentation is between 5% and 30%

ALTER INDEX <IX_IndexName> ON <TableName>
REORGANIZE;


Characteristics of Four Defragmentation Techniques

Drop and Create Index Create Index with DROP_ EXISTING ALTER INDEX REBUILD ALTER INDEX REORGANIZE
Blocking High High Medium Low
Defragment index with constraints Highly complex Moderately Easy Easy
Defragment multiple indexes together No No Yes Yes
Degree of defragmentation High High High Moderate to low
Statistics are updated Yes Yes Yes No
Apply new fill factor Yes Yes Yes No

Factors affecting index performance

Indexes and query performance are strongly related. It’s essential to ensure that the factors that affect the efficiency of an index are optimal:

  • Fill Factor
  • Statistics
  • Fragmentation
  • I/O Sub-System
  • Compression

Find missing indexes using DMVs

The following query determines which are the missing indexes and displays their column details based on dm_db_missing_index_group_stats DMV.

USE <DatabaseName>;

SELECT 
        REPLACE(ID.[Statement], '[' + DB_NAME(ID.database_id) + '].', '') as table_name,
        CAST(S.avg_total_user_cost * S.avg_user_impact * S.user_seeks as INT) as [score],
        ID.equality_columns, ID.inequality_columns, ID.included_columns,               
        'CREATE INDEX '
        + 'IX_' + OBJECT_NAME(ID.object_id) + '_' + CAST(ID.index_handle as VARCHAR)
        + ' ON ' + REPLACE(ID.[Statement], '[' + DB_NAME(ID.database_id) + '].', '')
        + ' (' + ISNULL (ID.equality_columns,'')
        + CASE
                WHEN ID.equality_columns IS NOT NULL AND ID.inequality_columns IS NOT NULL
                THEN ', ' + ID.inequality_columns
                ELSE ''
          END
        + ')'
        + ISNULL (' INCLUDE (' + ID.included_columns + ')', '') AS create_index_statement
FROM
        sys.dm_db_missing_index_details ID
        INNER JOIN sys.dm_db_missing_index_groups G
                ON ID.index_handle = G.index_handle
        INNER JOIN sys.dm_db_missing_index_group_stats S
                ON G.index_group_handle = S.group_handle
WHERE
        ID.database_id = DB_ID()
        AND OBJECTPROPERTY(ID.[object_id], 'IsMsShipped') = 0  
ORDER BY score DESC;

Improve performance using filtered indexes

A filtered index is a non-clustered index that contains only a subset of the number of rows contained in a table. You add a WHERE clause to reduce the number of rows that are stored at the leaf level.

CREATE NONCLUSTERED INDEX <Index_Name>
ON <Table Name>(<Column(s)>)
WHERE <Filter Condition>

A good real example it is use is on Status column, as you might have a large number of rows with Closed status that are not part of your current searches (the business wants to query for Open, Processing, Invoicing, etc)

CREATE NONCLUSTERED INDEX [IX_Orders_IdStatus_Filterered]
ON [dbo].[Orders] ([IdStatus])
WHERE ([IdStatus] IN ('O', 'P', 'I'))

Another well-suited scenarios is where you frequently have to filter out NULL values.

Pros:

  • By having fewer rows in an index, less I/O is done when that index is used.
  • Index size is dramatically reduced.
  • The statistics on these filtered indexes are filtered as well, which typically results in them being more accurate.
  • Cons:

  • Use certain expressions, such as BETWEEN, NOT IN, or a CASE statement.
  • Parameterized queries doesn't take advantage of filtered indexes.
  • Index design recommendation - how to choose optimal indexes

    Indexes should be considered on all columns that are frequently accessed by the WHERE, ORDER BY, GROUP BY, TOP, and DISTINCT clauses.

    Clustered index recommendations
    - It is highly recommended that every table should have at least a clustered index. However, there is also a very interesting topic to be taken into account on use-the-index-luke.com.
    - Consider having a clustered indexes when the data retrieval needs to be sorted.
    - Avoid creating a clustered index on columns that are highly updatable as row locator of all the non clustered indexes will be updated accordingly.
    - Create the clustered index first. In case the clustered index is created later => all non clustered indexes need to be rebuilt.
    - Keep clustered indexes narrow as this directly impacts index size on disk. This is very important as all non clustered indexes store the clustered keys as their row locator, or simpler said:
    clustered index row width = non clustered index column width + clustered index column width

    Nonclustered Index Recommendations
    - A nonclustered index is most useful when all you want to do is retrieve a small number of rows and columns from a large table.
    - Consider using INCLUDE clause in order to create covering indexes.
    - To improve the performance of a query, SQL Server can use multiple indexes on a table. Therefore, instead of creating wide index keys, consider creating multiple narrow indexes.
    - Foreign keys columns are good index candidates.

    Disk consideration
    - Place the table and index on separate disks.
    - Take into account index compression i.e. fewer pages and fewer index levels are needed to store the index.

    Column order matters in a composite index
    Using the most selective column first will help filter the index rows more efficiently.

    Avoid indexes on columns with lower selectivity
    If selectivity ratio > 0.85 => a table scan will be preferred.

    Take into account column uniqueness
    Creating an index on columns with a very low range of possible unique values (such as gender) will not benefit performance, because the query optimizer will not be able to use the index to effectively narrow down the rows to be returned.

    The data type of an index matters
    Unless they are absolutely necessary, minimize the use of wide data type columns with large sizes in an index. A large index key size increases the number of index pages, thereby increasing the amount of memory and disk activities required for the index.

    Indexed or materialized views
    - A database view can be materialized on the disk by creating a unique clustered index on the view. After a unique clustered index is created on the view, the view’s result set is materialized immediately and persisted in physical storage in the database, saving the overhead of performing costly operations during query execution.
    - Aggregations can be precomputed and persisted in the indexed view to minimize expensive computations during query execution.
    - Available only in Enterprise Edition.

    Other aspects to be taken into account
    - Considering using the Database Engine Tuning Advisor tool provided by SQL Server that helps you determine the correct indexes in a database for a given SQL workload ( a trace file or a table or, new with SQL Server 2012, you can use the queries that exist in the plan cache).
    - ColumnStore indexes. Used to index information by columns rather than by rows.
    - Spatial indexes.
    - XML indexes.
    - Index computed columns.
    - Consider using filtered indexes.
    - Resolve Key or RID lookups by using a clustered index or by using a covering index or by using a index join (i.e. an index intersection between two or more indexes to fully cover a query).

    Find out the disk size of an index

    SELECT
            OBJECT_NAME(I.OBJECT_ID) AS table_name,
            I.name AS index_name,  
            I.type_desc,
            AU.used_pages,
        8 * SUM(AU.used_pages) AS 'index_size_KB',         
        CAST(8 * SUM(AU.used_pages) / 1024.0 AS DECIMAL(18,2)) AS 'Index size (MB)'
    FROM
        sys.indexes I
            JOIN sys.partitions P ON P.OBJECT_ID = I.OBJECT_ID AND P.index_id = I.index_id
            JOIN sys.allocation_units AU ON AU.container_id = P.partition_id
    --WHERE
    --      OBJECT_NAME(I.OBJECT_ID) = '<tableName>'        
    GROUP BY
        I.object_id, I.name, I.type_desc, AU.used_pages
    ORDER BY
        table_name;

    Counting all the rows in a database

    A quick count of all rows in all tables in a database can be done using sysindexes table, a system table containing one row for each index and table in the database.

    The following query shows the rows count for each table in database. List of tables can be found by interrogating sysobjects system table (it contains one row for each object created within a database: constraint, default, log, rule, stored procedure, and so on) . The 'rows' column in the system table 'sysindexes' holds the number of committed rows in the table.

    SELECT SCHEMA_NAME(O.schema_id) as schema_name, O.name as table_name, I.[rows] as no_rows
    FROM
        sys.objects O
        INNER JOIN sysindexes I ON O.object_id = I.id
    WHERE
            I.indid < 2
            AND O.is_ms_shipped = 0
    ORDER BY no_rows DESC

    I added indid < 2 condition for restricting the query to looking at the clustered index or heap info, and ignore subsequent indices. Note that only the results of committed transactions are held in the rows column (uncommitted transactions are not).

    When statistics really change?

    Stats change:

    • When a table with no rows gets a row
    • When a table has fewer than 500 rows and is increased by 500 or more rows
    • Through SQL Server's auto-updating mechanism based on database option AUTO_UPDATE_STATISTICS (which by default is enabled and it performs synchronously), statistics get automatically changed when 500 + 20% of the table rows change. E.g. for a table with 100k rows, stats will get invalidated when 20500 rows are updated. Read more about this.
    • In addition to statistics on indexes, SQL Server can build statistics on columns with no indexes. See AUTO_CREATE_STATISTICS option.
    • If trace flag 2371 is turned on, then updates are dynamic Read more about this.
    • When you rebuild an index (equivalent to FULLSCAN).
    • Or you can decide to manually update the statistics.

    Pages

    Subscribe to RSS - indexes