'Estimate the size of a dataset returned from SQL
We've got a system that seems to be consuming a lot of data, it uses Dapper for database queries and Seq for logging. I was wondering if other than with SQL Profiler whether there was a way to add logging to Dapper to log the size of the dataset returned in MB -so we can flag large datasets for review?
This question has been asked a while ago but I was wondering whether there was now a way of doing it without wireshark and ideally without iterating over the rows/cells?
Solution 1:[1]
Not really a complete answer, but might help you out.
sys.dm_exec_query_stats and sys.dm_exec_connections might help you trace large result sets. For example:
SELECT * FROM sys.dm_exec_query_stats
CROSS APPLY sys.dm_exec_sql_text(sql_handle)
(Units are pages of 8k).
This sort of gives you what you're using wireshark for at the moment (kinda :)
SELECT * FROM sys.dm_exec_connections
CROSS APPLY sys.dm_exec_sql_text(most_recent_sql_handle)
ORDER BY num_writes DESC
Solution 2:[2]
You can estimate the size required by a row summing up each column type size, then multiply by the number of rows. It should be accurate if you don't have TEXT / VARCHAR in your query:
int rowSize = 0;
foreach(DataColumn dc in Dataset1.Tables[0].Columns) {
rowSize += sizeof(dc.DataType);
}
int dataSize = rowSize * Dataset1.Tables[0].Rows.Count;
In case you need a more accurate figure, sum up the size of each individual value using Marshal.SizeOf:
int dataSize = 0;
foreach(DataRow dr in Dataset1.Tables[0].Rows)
{
int rowSize = 0;
for (int i = 0; i < Dataset1.Tables[0].Columns.Count; i++)
{
rowSize += System.Runtime.InteropServices.Marshal.SizeOf(dr[i]);
}
dataSize += rowSize;
}
Ideas for performance gain if high accuracy is not a concern:
- Compute the size of just a sample. Let's say, instead of iterating through all rows, pick 1 in every 100, then multiply your result by 100 in the end.
- Use [Marshal.SizeOf]((https://msdn.microsoft.com/en-us/library/y3ybkfb3.aspx) to compute the size of each DataRow
drinstead of iterating through all it's values. It will give you a higher number since a DataRow object has additional properties, but that's something you can tweak by subtracting the size of an empty DataRow. - Know the average size of a single row beforehand, by it's columns, and just multiply by the number of rows.
Solution 3:[3]
We got around this limitation by just capping/limiting the query size. This prevents us from having to worry about size and doing double queries. The PR we used that you could also use is https://github.com/DapperLib/Dapper/pull/1758/files
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Liesel |
| Solution 2 | Paulo Amaral |
| Solution 3 | Andrew G |
