'Why is aggregation on a string field faster than aggregation on an int field?
I have a table with the following columns:
.create-merge table events
(
Time: datetime,
SiteId: int,
SiteCode: string,
...
)
Site ID and code both provide unique value for a site, theoretically it does not matter which one to use unless I need certain data type in the output. However I see a noticeable difference in performance between the queries:
events | summarize count() by SiteCode
~ 300 ms on a 150M rows table
events | summarize count() by SiteId
~ 560 ms on a 150M rows table
The difference is small in the absolute value, but the string one is almost two times faster than the integer one (for consistent results, I issue requests from a client in the same region). The string code consists of 10-20 characters and intuitively seems to have larger footprint in the computer memory as opposed to 4-byte integer, hence I would expect longer processing of the string one, but it works conversely.
What could be the reason for that? I am missing something fundamental from ADX internals?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
