Index design recommendation - how to choose optimal indexes

Indexes should be considered on all columns that are frequently accessed by the WHERE, ORDER BY, GROUP BY, TOP, and DISTINCT clauses.

Clustered index recommendations
- It is highly recommended that every table should have at least a clustered index. However, there is also a very interesting topic to be taken into account on
- Consider having a clustered indexes when the data retrieval needs to be sorted.
- Avoid creating a clustered index on columns that are highly updatable as row locator of all the non clustered indexes will be updated accordingly.
- Create the clustered index first. In case the clustered index is created later => all non clustered indexes need to be rebuilt.
- Keep clustered indexes narrow as this directly impacts index size on disk. This is very important as all non clustered indexes store the clustered keys as their row locator, or simpler said:
clustered index row width = non clustered index column width + clustered index column width

Nonclustered Index Recommendations
- A nonclustered index is most useful when all you want to do is retrieve a small number of rows and columns from a large table.
- Consider using INCLUDE clause in order to create covering indexes.
- To improve the performance of a query, SQL Server can use multiple indexes on a table. Therefore, instead of creating wide index keys, consider creating multiple narrow indexes.
- Foreign keys columns are good index candidates.

Disk consideration
- Place the table and index on separate disks.
- Take into account index compression i.e. fewer pages and fewer index levels are needed to store the index.

Column order matters in a composite index
Using the most selective column first will help filter the index rows more efficiently.

Avoid indexes on columns with lower selectivity
If selectivity ratio > 0.85 => a table scan will be preferred.

Take into account column uniqueness
Creating an index on columns with a very low range of possible unique values (such as gender) will not benefit performance, because the query optimizer will not be able to use the index to effectively narrow down the rows to be returned.

The data type of an index matters
Unless they are absolutely necessary, minimize the use of wide data type columns with large sizes in an index. A large index key size increases the number of index pages, thereby increasing the amount of memory and disk activities required for the index.

Indexed or materialized views
- A database view can be materialized on the disk by creating a unique clustered index on the view. After a unique clustered index is created on the view, the view’s result set is materialized immediately and persisted in physical storage in the database, saving the overhead of performing costly operations during query execution.
- Aggregations can be precomputed and persisted in the indexed view to minimize expensive computations during query execution.
- Available only in Enterprise Edition.

Other aspects to be taken into account
- Considering using the Database Engine Tuning Advisor tool provided by SQL Server that helps you determine the correct indexes in a database for a given SQL workload ( a trace file or a table or, new with SQL Server 2012, you can use the queries that exist in the plan cache).
- ColumnStore indexes. Used to index information by columns rather than by rows.
- Spatial indexes.
- XML indexes.
- Index computed columns.
- Consider using filtered indexes.
- Resolve Key or RID lookups by using a clustered index or by using a covering index or by using a index join (i.e. an index intersection between two or more indexes to fully cover a query).

Find out the disk size of an index

        OBJECT_NAME(I.OBJECT_ID) AS table_name, AS index_name,  
    8 * SUM(AU.used_pages) AS 'index_size_KB',         
    CAST(8 * SUM(AU.used_pages) / 1024.0 AS DECIMAL(18,2)) AS 'Index size (MB)'
    sys.indexes I
        JOIN sys.partitions P ON P.OBJECT_ID = I.OBJECT_ID AND P.index_id = I.index_id
        JOIN sys.allocation_units AU ON AU.container_id = P.partition_id
--      OBJECT_NAME(I.OBJECT_ID) = '<tableName>'        
    I.object_id,, I.type_desc, AU.used_pages

Counting all the rows in a database

A quick count of all rows in all tables in a database can be done using sysindexes table, a system table containing one row for each index and table in the database.

The following query shows the rows count for each table in database. List of tables can be found by interrogating sysobjects system table (it contains one row for each object created within a database: constraint, default, log, rule, stored procedure, and so on) . The 'rows' column in the system table 'sysindexes' holds the number of committed rows in the table.

SELECT SCHEMA_NAME(O.schema_id) as schema_name, as table_name, I.[rows] as no_rows
    sys.objects O
    INNER JOIN sysindexes I ON O.object_id =
        I.indid < 2
        AND O.is_ms_shipped = 0

I added indid < 2 condition for restricting the query to looking at the clustered index or heap info, and ignore subsequent indices. Note that only the results of committed transactions are held in the rows column (uncommitted transactions are not).

When statistics really change?

Stats change:

  • When a table with no rows gets a row
  • When a table has fewer than 500 rows and is increased by 500 or more rows
  • Through SQL Server's auto-updating mechanism based on database option AUTO_UPDATE_STATISTICS (which by default is enabled and it performs synchronously), statistics get automatically changed when 500 + 20% of the table rows change. E.g. for a table with 100k rows, stats will get invalidated when 20500 rows are updated. Read more about this.
  • In addition to statistics on indexes, SQL Server can build statistics on columns with no indexes. See AUTO_CREATE_STATISTICS option.
  • If trace flag 2371 is turned on, then updates are dynamic Read more about this.
  • When you rebuild an index (equivalent to FULLSCAN).
  • Or you can decide to manually update the statistics.

Get statistics info about an index

DBCC SHOW_STATISTICS returns info about 3 components:
- header
- density vector
- histogram = is a statistical construct that shows how often data falls into varying categories. The histogram stored by SQL Server consists of a sampling of data distribution for a column or an index key of up to 200 rows.

DBCC SHOW_STATISTICS('<tablename>', '<indexname>');

You can limit the statistics information displayed by using WITH HISTOGRAM clause:

DBCC SHOW_STATISTICS('<tablename>', '<indexname>')

Best practices for creating a computed column in SQL Server table

A computed column is computed from an expression that can use other columns in the same table.

ALTER TABLE mytable ADD fullname AS firstname + ' ' + lastname

By marking a computed column as PERSISTED, you can create an index it.

ALTER TABLE mytable ADD fullname AS firstname + ' ' + lastname PERSISTED
CREATE INDEX IX_mytable_fullname ON MyTable (fullname)

Detecting fragmentation for all indexes in SQL Server

The first step in deciding which defragmentation method to use is to analyze the index to determine the degree of fragmentation. By using sys.dm_db_index_physical_stats DMV, you can detect fragmentation in a specific index, all indexes on a table or indexed view, all indexes in a database, or all indexes in all databases. For partitioned indexes, sys.dm_db_index_physical_stats also provides fragmentation information for each partition.

The result set returned by the sys.dm_db_index_physical_stats dynamic management function includes the following columns:
avg_fragmentation_in_percent: the percent of logical fragmentation (out-of-order pages in the index)
fragment_count: the number of fragments (physically consecutive leaf pages) in the index
avg_fragment_size_in_pages: average number of pages in one fragment in an index
avg_record_size_in_bytes: This number simply represents a useful measure for the amount of data stored within the record (index or heap).

     OBJECT_NAME(S.OBJECT_ID) as TableName, as IndexName,
     sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL, NULL, NULL) as S
     INNER JOIN sys.indexes AS I ON S.OBJECT_ID = I.OBJECT_ID AND S.index_id = I.index_id
     S.database_id = DB_ID()
     TableName, IndexName

After the degree of fragmentation is known, check how you can "resolve" the fragmentation.

Index usability - tune your index strategy

The view sys.dm_db_index_usage_stats returns statistical counts of the various index operations that have occurred to an index over time.

     object_name(i.object_id) as TableName
     , as [Index]
     , user_seeks + user_scans + user_lookups + user_updates as usability
     , user_seeks, user_scans, user_lookups, user_updates
     , i.index_id, i.type_desc, i.is_unique, i.is_primary_key, i.is_unique
     sys.indexes i    
     INNER JOIN sys.dm_db_index_usage_stats s ON s.index_id = i.index_id AND s.object_id = i.object_id
     AND is not NULL
     AND s.database_id = DB_ID() -- current DB
     --AND object_name(i.object_id) = '<TableName>'
     usability, [Index], TableName;


Subscribe to RSS - indexes