'Why is aggregation on a string field faster than aggregation on an int field?

I have a table with the following columns:

.create-merge table events
(
  Time: datetime,
  SiteId: int,
  SiteCode: string,
  ...
)

Site ID and code both provide unique value for a site, theoretically it does not matter which one to use unless I need certain data type in the output. However I see a noticeable difference in performance between the queries:

events | summarize count() by SiteCode
~ 300 ms on a 150M rows table
events | summarize count() by SiteId
~ 560 ms on a 150M rows table

The difference is small in the absolute value, but the string one is almost two times faster than the integer one (for consistent results, I issue requests from a client in the same region). The string code consists of 10-20 characters and intuitively seems to have larger footprint in the computer memory as opposed to 4-byte integer, hence I would expect longer processing of the string one, but it works conversely.

What could be the reason for that? I am missing something fundamental from ADX internals?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source