'Best way to query across entities in Azure Table Storage

We have stored a larger number of data entities in Azure Table Storage, with all current read and write access using the partition and row keys. This works well for us, but we have now been met with a requirement for doing some querying across all the data entities, extracting some statistical data. The querying may include switching on some data stored inside columns in the table entities, so we may need a little bit of processing logic per entity data item.

What would be our options for running such queries? Is there a way to do this without pulling all data across the network and into memory of some processor doing the querying? Is there something akin to "MapReduce" that is able to run a job across all entities in place, or at least without pulling too much data to avoid some costs?

To be clear, speed is as such not an issue since this is for statistical purposes, so the immediate goal would be low cost and secondly an easy programming model.



Solution 1:[1]

Unfortunately Azure Table Storage does not support things like joins directly, so all the processing needs to take place in your code. What we've done in the past is having a timer-triggered Azure Function to calculate statistics and putting those into a separate table. Putting the Azure Function into the same region as you Storage Account (or perhaps even depending on that Storage Account) gives you the least latency.

To further reduce bandwidth you can select the columns you want to process in every query so not the whole row is being loaded.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Thomas