'What is the performance of bag_unpack in Kusto?

Some questions regarding the performance and use of bag_unpack in Kusto:

  • How performant is using bag_unpack over a large data set on a column where all the top-level properties are the same (so each row has a dynamic bag with the same property name so each resulting column from the bag unpack would have a value)?
  • How performant is using bag_unpack over a large data set on a column where the top-level properties are not all the same, so not garaunteed to have the same top-level properties (so there will be some rows where the unpacked columns have empty/no value)?
  • Does bag_unpack use any methods that are considered slow in terms of performance like extract_json() or parse_json(), or is it much quicker than those methods?

Given the following two examples, which is more optimized or would execute faster for a large dataset? (or would they be about the same?):

//#1
datatable(d:dynamic)
[
    dynamic({"Name": "John", "Age":20}),
    dynamic({"Name": "Dave", "Age":40}),
    dynamic({"Name": "Jasmine", "Age":30}),
]
| evaluate bag_unpack(d)
//#2
datatable(d:dynamic)
[
    dynamic({"Name": "John", "Age":20}),
    dynamic({"Name": "Dave", "Age":40}),
    dynamic({"Name": "Jasmine", "Age":30}),
]
| project Age = d.Age, Name = d.Name


Solution 1:[1]

The second option is always preferred.

When the set of properties you wish to project is known in advance, it would be more performant to explicitly project only those, and not use the bag_unpack() plugin. (*)

The latter requires 2 passes over the data -

  1. For building the output schema (by analyzing which properties exist in the dynamic payloads across all queried records).

  2. Executing the query with the now-known output schema.


(*) Asides for performance, this has the advantage of the output schema always being stable, which may be required by some automated scenarios - even in case the source data changes, and there are new properties added to the dynamic payload (or old ones removed), that won't affect the output schema.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1