'Estimate the size of a dataset returned from SQL

We've got a system that seems to be consuming a lot of data, it uses Dapper for database queries and Seq for logging. I was wondering if other than with SQL Profiler whether there was a way to add logging to Dapper to log the size of the dataset returned in MB -so we can flag large datasets for review?

This question has been asked a while ago but I was wondering whether there was now a way of doing it without wireshark and ideally without iterating over the rows/cells?



Solution 1:[1]

Not really a complete answer, but might help you out.

sys.dm_exec_query_stats and sys.dm_exec_connections might help you trace large result sets. For example:

SELECT * FROM sys.dm_exec_query_stats 
CROSS APPLY sys.dm_exec_sql_text(sql_handle)

(Units are pages of 8k).

This sort of gives you what you're using wireshark for at the moment (kinda :)

SELECT * FROM sys.dm_exec_connections
CROSS APPLY sys.dm_exec_sql_text(most_recent_sql_handle)
ORDER BY num_writes DESC

https://msdn.microsoft.com/en-us/library/ms189741.aspx

https://msdn.microsoft.com/en-AU/library/ms181509.aspx

Solution 2:[2]

You can estimate the size required by a row summing up each column type size, then multiply by the number of rows. It should be accurate if you don't have TEXT / VARCHAR in your query:

int rowSize = 0;
foreach(DataColumn dc in Dataset1.Tables[0].Columns) {
    rowSize += sizeof(dc.DataType);
}
int dataSize = rowSize * Dataset1.Tables[0].Rows.Count;

In case you need a more accurate figure, sum up the size of each individual value using Marshal.SizeOf:

int dataSize = 0;
foreach(DataRow dr in Dataset1.Tables[0].Rows) 
{
    int rowSize = 0;
    for (int i = 0; i < Dataset1.Tables[0].Columns.Count; i++)
    {
        rowSize += System.Runtime.InteropServices.Marshal.SizeOf(dr[i]);
    }
    dataSize += rowSize;
}

Ideas for performance gain if high accuracy is not a concern:

  1. Compute the size of just a sample. Let's say, instead of iterating through all rows, pick 1 in every 100, then multiply your result by 100 in the end.
  2. Use [Marshal.SizeOf]((https://msdn.microsoft.com/en-us/library/y3ybkfb3.aspx) to compute the size of each DataRow dr instead of iterating through all it's values. It will give you a higher number since a DataRow object has additional properties, but that's something you can tweak by subtracting the size of an empty DataRow.
  3. Know the average size of a single row beforehand, by it's columns, and just multiply by the number of rows.

Solution 3:[3]

We got around this limitation by just capping/limiting the query size. This prevents us from having to worry about size and doing double queries. The PR we used that you could also use is https://github.com/DapperLib/Dapper/pull/1758/files

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Liesel
Solution 2 Paulo Amaral
Solution 3 Andrew G