Wednesday, February 22, 2012

Best Practices are for paging large datasets for SQL Server based databases.

by Ashsh Upadhyay


Paging is one of the common scenario in the industry for any projects dealing large records and have a UI to pool data for view or some other purpose. It used be a very challenging job for any developer working in SQL Server 2000 or earlier version but then Microsoft introduced ROW_NUMBER() in SQL server 2005 which has ability generate record no for any given set of data based on the order provided (ASC/DESC). This became quite handy for a database developer to return records based on the paging out of a large record set. Let’s discuss about the paging in detail and evaluate available techniques/options in SQL Server.


1.
Local Temporary table

This technique is not very efficient because of the scope of the temporary table is only for the current session as soon as your connection is closed your table will be dropped. If your application has use non-connection pooled environments and you can use a dedicated connection, in that case you can, keep your connection open and create a #Temp table an IDENTITY column and your Key column for the result. Then do paging using this temporary table, this technique will give you fastest output.



2. Global Temporary table
In today’s world most of the applications use a common connection pool environment so the another option for the paging would be to use global temporary table (##Temp table), the only problem with this approach is that, this table will be available across the sessions. So if your data is bit sensitive I would suggest not to use this table but if your data is not sensitive you can opt this option. You can use the same method to generate data in the table as discussed above.


3. Intermediate table
Intermediate table is nothing but a permanent table that you can create in your own database with some generic naming pattern so that those tables can be identified easily. Suppose you are working in an environment when you need to show a huge record set on the UI but you don’t want to put load on your temp table (may be due to tempdb is already overloaded due to high volume of transactions/query processing), in this scenario you can create a table in the same way we discussed for the temporary table (keep an IDENTITY column and a key column out of your result set) and use the paging mechanism to return data as per demand.

Obviously being to many users in the database, multiple copy of these intermediate table can be created, I would suggest to run a nightly job to dropped all such table, you may increase or decrease frequency of the cleanup job as per your requirement.


4. ROW_Number() with CTE
Another option would be to generate the RowID with the help of ROW_NUMBER() function and a common table expression and return the request dataset based on PageNo and pageSize. Keep in mind that this method will be expensive specially if you are dealing with large set of records as every time you will be executing your query which might be an expensive query itself. Another important aspect is if you are working in OLTP environment and your application is inserting new records your result set might get affected as new records can be a part of the query that you have submitted, this in-turn can affect your paging in other words can return the records for a page which might not be expected for that page based on previous execution.



Here is the example to simulate how paging can be implemented using ROW_NUMBER function using a derived table and Common table expression (CTE).



-- Create the test tabl

CREATE TABLE Test

(

TestID INT IDENTITY(1,1) PRIMARY KEY,

TestRecord VARCHAR(25),

CreatedOn DATETIME

)

GO

-- Generate some test record

DECLARE @i INT = 1

WHILE @i < 10001

BEGIN

INSERT INTO Test (TestRecord, CreatedOn) SELECT 'Test Record : '+ CAST(@i as VARCHAR), GETDATE()

SET @i = @i + 1

END

GO

SELECT * from Test



-- Create the a stored procedure to get record for paging

CREATE PROCEDURE [dbo].[USP_Get_RecordForPaging]

(

@PageNo INT,

@PageSize INT

)

AS

SET NOCOUNT ON

SELECT TOP(@PageSize) RowID,TestRecord, CreatedOn,TotalRows FROM

(

SELECT RowID = ROW_NUMBER() OVER (ORDER BY CreatedOn DESC),

TestRecord, CreatedOn, TotalRows = COUNT(*) OVER()

FROM Test

)

A WHERE A.RowID > ((@PageNo-1)*@PageSize)

SET NOCOUNT OFF

--Test Script to get record

EXEC USP_Get_RecordForPaging @PageNo = 1, @PageSize = 50

-- Create the a stored procedure to get record for paging

CREATE PROCEDURE [dbo].[USP_Get_RecordForPaging_CTE]

(

@PageNo INT,

@PageSize INT

)

AS

SET NOCOUNT ON

;WITH CTE_Paging AS

(

SELECT RowID = ROW_NUMBER() OVER (ORDER BY CreatedOn DESC),

TestRecord, CreatedOn, TotalRows = COUNT(*) OVER()

FROM Test

)

SELECT TOP(@PageSize) RowID,TestRecord, CreatedOn,TotalRows FROM CTE_Paging

WHERE RowID > ((@PageNo-1)*@PageSize)

SET NOCOUNT OFF

--Test Script to get record

EXEC USP_Get_RecordForPaging_CTE @PageNo = 1, @PageSize = 50


Above script is good for SQL Server 2005, 2008, 2008 R2 and 2012.

Microsoft has introduced a new and smarter way to handle the paging situation. Take a look at the below script.

SELECT *
FROM Test
ORDER BY CreatedOn
OFFSET 200 ROWS
FETCH NEXT 50 ROWS ONLY;


In the above script you can see that The OFFSET keyword and FETCH NEXT keyword allow the you to only retrieve certain range of data from the database. Basically OFFSET  gives query a start index while FETCH NEXT allow you to provise the no of records you want to retrive.

No comments:

Post a Comment