T-SQL

Remove duplicates

De-duplication of data is a common need, especially when dealing with data-quality issues in environments that end up with duplicate rows due to lack of enforcement of uniqueness with constraints. As an example, the following code prepares sample data with duplicate orders in a table called MyItem:

IF EXISTS (SELECT 1 FROM sys.tables WHERE name = 'MyItems')
        DROP TABLE dbo.MyItems
GO
               
               
CREATE TABLE dbo.MyItems
(
        IdItem INT IDENTITY(1,1),
        Name VARCHAR(50),
        Description VARCHAR(256)
)
GO

INSERT INTO dbo.MyItems(Name, Description)
VALUES
        ('Item1', 'Description 1'),
        ('Item1', 'Description 1'),
        ('Item2', 'Description 2'),
        ('Item3', 'Description 3'),
        ('Item3', 'Description 3'),
        ('Item3', 'Description 3 - different'),
        ('Item4', 'Description 4')
       
SELECT *
FROM dbo.MyItems
GO

An option is to filter either just the distinct rows or all but the distinct rows. You compute both ROW_NUMBER and RANK based on Name + Description ordering, like so:

SELECT
        IdItem, Name, Description,
        ROW_NUMBER() OVER (ORDER BY Name, Description) as row_num,
        RANK() OVER (ORDER BY Name, Description) as rnk
FROM dbo.MyItems
GO

Suppose that you need to de-duplicate the data, keeping only one occurrence of each unique Name+Description combination.

WITH cte
AS
(
        SELECT
                IdItem, Name, Description,
                ROW_NUMBER() OVER (ORDER BY Name, Description) as row_num,
                RANK() OVER (ORDER BY Name, Description) as rnk
        FROM dbo.MyItems
)
DELETE FROM cte
WHERE row_num != rnk

SELECT IdItem, Name, Description
FROM dbo.MyItems

When you’re done, run the following code for cleanup:

IF EXISTS (SELECT 1 FROM sys.tables WHERE name = 'MyItems')
        DROP TABLE dbo.MyItems
GO

Pagination in SQL Server

Solution for SQL Server 2008
The ROW_NUMBER function can be used for paging purposes. You assign row numbers to the result rows based on the desired ordering, and then filter the right range of row numbers based on given page-number and page-size arguments.

DECLARE
        @pagenum AS INT = 2,
        @pagesize AS INT = 10;
       
WITH cte AS
(
        SELECT ROW_NUMBER() OVER( ORDER BY Column1) AS rownum, Column1, Column2, Column3
        FROM dbo.MyTable
)
SELECT Column1, Column2, Column3
FROM cte
WHERE rownum BETWEEN (@pagenum - 1) * @pagesize + 1 AND @pagenum * @pagesize
ORDER BY rownum;

For optimal performance, you want to have an index defined on the window ordering elements as the index keys (and eventually include in the index the rest of the attributes that appear in the query for coverage purposes).

Solution for SQL Server 2012 or SQL Server 2014
An alternative solution to paging is to use the new OFFSET/FETCH filtering option:

DECLARE
        @pagenum AS INT = 2,
        @pagesize AS INT = 10;

SELECT Column1, Column2, Column3
FROM dbo.MyTable
ORDER BY Column1
OFFSET (@pagenum - 1) * @pagesize ROWS FETCH NEXT @pagesize ROWS ONLY;

Islands problem - identifying consecutive ranges

I will present a T-SQL query that uses ROW_NUMBER for identifying consecutive ranges of existing values (also known as islands problem).
Given the below script, the expected ranges are: [1,2], [11-14], [25], [32-33].

SET NOCOUNT ON;
IF OBJECT_ID('dbo.MyTable') IS NOT NULL
        DROP TABLE dbo.MyTable;
GO

CREATE TABLE dbo.MyTable
(
col1 INT NOT NULL
CONSTRAINT PK_MyTable PRIMARY KEY
);

INSERT INTO dbo.MyTable(col1)
VALUES (1),(2),(11),(12),(13),(14),(25),(31),(32),(33);
GO

Step 1: use the ROW_NUMBER function to calculate row numbers based on col1 ordering

SELECT col1, ROW_NUMBER() OVER (ORDER BY col1) as row_num
FROM dbo.MyTable

Step 2: identify islands
The difference between two columns is constant and unique for each island as shown in this query:

SELECT col1, col1 - ROW_NUMBER() OVER (ORDER BY col1) as grp
FROM dbo.MyTable

Step 3: complete solution

;WITH cte
AS
(
        SELECT col1, col1 - ROW_NUMBER() OVER (ORDER BY col1) as grp
        FROM dbo.MyTable
)
SELECT MIN(col1) as start_range, MAX(col1) as end_range
FROM cte
GROUP BY grp

DMVs used to analyze the execution plan cache

You can obtain a lot of information about the execution plans in the procedure cache by accessing various dynamic management objects.

sys.dm_exec_cached_plans: obtain information about the execution plans in the plan cache.

sys.dm_exec_query_plan(plan_handle): retrieve the XML representation of the plan.

sys.dm_exec_sql_text(plan_handle): retrieve the original query text.

sys.dm_exec_query_stats: get aggregate performance metrics about the cached plan.

sys.dm_exec_requests: see execution plans for queries that are currently executing.

Find locks in SQL Server

dm_tran_locks contains info about all current locks, granted or pending.

select *
from sys.dm_tran_locks

Finding queries that are using table scans

The T-SQL script used to discover queries that are using table scans by searching within query plan for TableScan tag:

SELECT TOP 20
        DB_NAME(ST.dbid)AS database_name
        , ST.text as sql_Statement
        , CP.usecounts
        , QP.query_plan
FROM
        sys.dm_exec_cached_plans CP
        CROSS APPLY sys.dm_exec_sql_text(CP.plan_handle) ST
        CROSS APPLY sys.dm_exec_query_plan(CP.plan_handle) QP
WHERE
        ST.dbid = DB_ID()
        AND CAST(QP.query_plan AS NVARCHAR(MAX)) LIKE '%<TableScan%'
ORDER BY CP.usecounts DESC

Finding the queries that use the most I/O

Running the SQL script given in the following listing will identify the top 20 queries that use the most I/O.

SELECT TOP 20
        DB_NAME(QT.dbid) as database_name
        , QS.total_logical_reads + QS.total_logical_writes as total_io
        , QS.execution_count   
        , SUBSTRING(QT.text, (qs.statement_start_offset/2)+1,
                ((      CASE qs.statement_end_offset
                                WHEN -1 THEN DATALENGTH(QT.text)
                                ELSE qs.statement_end_offset
                        END - QS.statement_start_offset)/2) + 1) as statement_text
        , QT.text as parent_statement_text     
        , O.name
        , O.type_desc
        , QP.query_plan
FROM
        sys.dm_exec_query_stats QS
        CROSS APPLY sys.dm_exec_sql_text(QS.sql_handle) QT
        CROSS APPLY sys.dm_exec_query_plan(QS.plan_handle) QP
        LEFT JOIN sys.objects O ON O.object_id = QT.objectid
WHERE QT.dbid = DB_ID()
ORDER BY total_io DESC

Find missing indexes using DMVs

The following query determines which are the missing indexes and displays their column details based on dm_db_missing_index_group_stats DMV.

USE <DatabaseName>;

SELECT 
        REPLACE(ID.[Statement], '[' + DB_NAME(ID.database_id) + '].', '') as table_name,
        CAST(S.avg_total_user_cost * S.avg_user_impact * S.user_seeks as INT) as [score],
        ID.equality_columns, ID.inequality_columns, ID.included_columns,               
        'CREATE INDEX '
        + 'IX_' + OBJECT_NAME(ID.object_id) + '_' + CAST(ID.index_handle as VARCHAR)
        + ' ON ' + REPLACE(ID.[Statement], '[' + DB_NAME(ID.database_id) + '].', '')
        + ' (' + ISNULL (ID.equality_columns,'')
        + CASE
                WHEN ID.equality_columns IS NOT NULL AND ID.inequality_columns IS NOT NULL
                THEN ', ' + ID.inequality_columns
                ELSE ''
          END
        + ')'
        + ISNULL (' INCLUDE (' + ID.included_columns + ')', '') AS create_index_statement
FROM
        sys.dm_db_missing_index_details ID
        INNER JOIN sys.dm_db_missing_index_groups G
                ON ID.index_handle = G.index_handle
        INNER JOIN sys.dm_db_missing_index_group_stats S
                ON G.index_group_handle = S.group_handle
WHERE
        ID.database_id = DB_ID()
        AND OBJECTPROPERTY(ID.[object_id], 'IsMsShipped') = 0  
ORDER BY score DESC;

Get database restore history

Returns last 10 restores performed on your server:

SELECT TOP 10
        RH.destination_database_name AS [Database],
        RH.user_name AS [Restored By],
        RH.restore_date AS [Restore Started],
        BMF.physical_device_name AS [Restored From],
        RF.destination_phys_name AS [Restored To],
        RH.*
FROM
        msdb.dbo.restorehistory RH
        INNER JOIN msdb.dbo.backupset BS ON RH.backup_set_id = BS.backup_set_id
        INNER JOIN msdb.dbo.restorefile RF ON RH.restore_history_id = RF.restore_history_id
        INNER JOIN msdb.dbo.backupmediafamily BMF ON BMF.media_set_id = BS.media_set_id
--WHERE destination_database_name = '<DatabaseName>'
ORDER BY RH.restore_history_id DESC
GO

Execute a SSIS package using T-SQL

DECLARE @execution_id BIGINT;
DECLARE @use32bitruntime BIT = CAST(0 AS BIT);

-- A new execution operation is created.
EXEC catalog.create_execution
@folder_name = N'<FolderName>',
@project_name = N'<ProjectName>',
@package_name = N'<PackageName.dtsx>',
@use32bitruntime = @use32bitruntime,
@reference_id = NULL,
@execution_id = @execution_id OUTPUT;

-- Set execution properties
EXEC catalog.set_execution_parameter_value
@execution_id,
@object_type = 50,
@parameter_name = N'LOGGING_LEVEL',
@parameter_value = 1;

-- Execution is started asynchronous
EXEC catalog.start_execution @execution_id;
GO

Pages

Subscribe to RSS - T-SQL