'Break up a SQL Server 2008 query into batches
I'm trying to prepare some data for deletion by a 3rd party, and unfortunately they can only process data in batches of 2000 records. I have 100k records and may need to divide-and-export this data several more times, so I'd like to automate the process somehow.
Is there a reasonably easy way to do this using SQL Server 2008? I'm not running a complex query -- it's not too far off from SELECT PKID FROM Sometable ORDER BY PKID -- and while I can probably do this using a cursor, I'd like to know if there's a better way.
Solution 1:[1]
I think you can take advantage of using ROW_NUMBER and then using BETWEEN to specify a range of rows that you like. Alternatively you could use PKID if you knew there wasn't gaps, or didn't care about the gaps
e.g.
SELECT ...
FROM
(SELECT ...
ROW_NUMBER() OVER(ORDER BY PKID ) as RowNum
FROM Sometable e
) t
WHERE RowNum BETWEEN @startRowIndex AND (@startRowIndex + @maximumRows) - 1
This is often used for paging results. 4GuysFromRolla have a good article on it
Solution 2:[2]
You could work out the ranges in a while @@ROWCOUNT loop to target the rows required. It may work better than ROW_NUMBER() which would have to keep numbering from the start.
declare @startid int
declare @endid int
-- get one range, these are efficient as they go over the PKID key by range
select top(1) @startid = pkid from sometable order by pkid -- 1 key visited
select top(2000) @endid = pkid from sometable order by pkid -- 2000 keys visited
-- note: top 2000 may end up with the 514th id if that is the last one
while @@ROWCOUNT > 0
begin
insert otherdb.dbo.backupcopy
select * from sometable
where pkid between @startid and @endid
select top(1) @startid = pkid from sometable
WHERE pkid > @endid -- binary locate
order by pkid
select top(2000) @endid = pkid from sometable
WHERE pkid > @endid -- binary locate, then forward range lookup, max 2000 keys
order by pkid
end
Solution 3:[3]
I ended up using a combination of the approaches provided by cyberkiwi and Adam. I didn't need to use ROW_NUMBER only because I used an IDENTITY column in a table data type instead.
Here's a redacted version of the code I used -- it worked like a charm. Thanks again to everyone for all the help!
use Testing
GO
SET NOCOUNT ON
declare
@now datetime = GETDATE(),
@batchsize int = 2000,
@bcpTargetDir varchar(500) = '\\SomeServer\Upload\',
@csvQueryServer varchar(500) = '.\SQLExpress',
@rowcount integer,
@nowstring varchar(100),
@batch_id int,
@startid int,
@endid int,
@oidCSV varchar(max),
@csvQuery varchar(max),
@bcpFilename varchar(200),
@bcpQuery varchar(1000)
declare @tblBatchRanges table (
batch_id integer NOT NULL IDENTITY(1,1) PRIMARY KEY,
oid_start integer NOT NULL,
oid_end integer NOT NULL,
csvQuery varchar(max)
)
-- Create a unique timestamp-based string, which will be used to name the exported files.
select @nowstring = CONVERT(varchar, @now, 112) + '-' + REPLACE(CONVERT(varchar, @now, 114), ':', '')
--
select top(1) @startid = oid from Testing..MyObjectIds order by oid
select top(@batchsize) @endid = oid from Testing..MyObjectIds order by oid
select @rowcount = @@ROWCOUNT
while (@rowcount > 0) begin
-- Create a CSV of all object IDs in the batch, using the STUFF() function
select @csvQuery = 'select stuff((select distinct '','' + CAST(oid as varchar) from Testing..MyObjectIds where oid between ' + CAST(@startid as varchar) + ' and ' + CAST(@endid as varchar) + ' order by '','' + CAST(oid as varchar) for xml path('''')),1,1,'''')'
-- Log the info and get the batch ID.
insert into @tblBatchRanges (oid_start, oid_end, csvQuery)
values (@startid, @endid, @oidCSV, @csvQuery)
select @batch_id = @@IDENTITY
-- Advance @startid and @endid so that they point to the next batch
select top(1) @startid = oid
from Testing..MyObjectIds
where oid > @endid
order by oid
select top(@batchsize) @endid = oid
from Testing..MyObjectIds
where oid > @endid
order by oid
select @rowcount = @@ROWCOUNT
-- Export the current batch to a file.
select @bcpFilename = 'MyExport-' + @nowstring + '-' + cast(@batch_id as varchar) + '.txt'
select @bcpQuery = 'bcp "' + @csvQuery + '" QUERYOUT "' + @bcpTargetDir + @bcpFilename + '" -S ' + @csvQueryServer + ' -T -c'
exec master..xp_cmdshell @bcpquery
end
SET NOCOUNT OFF
--Check all of the logged info.
select oid_start, oid_end, csvQuery from @tblBatchRanges
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Conrad Frix |
| Solution 2 | RichardTheKiwi |
| Solution 3 |
