'How to convert a Table Client object to Pyspark Dataframe in an Azure Tables query?
I'm trying to convert a Table Client object in Azure to Pyspark Data Frame but It doesn't work.
I've tried:
from azure.data.tables import TableClient
table_name = "Tablename"
my_filter = "DateTimeUTC ge datetime'2022-02-28T00:00:00Z' and DateTimeUTC le datetime'2022-02-28T01:59:00Z'"
table_client = TableClient.from_connection_string(conn_str="DefaultEndpointsProtocol=https;
AccountName=Accountname;AccountKey=key", table_name=table_name)
entities = table_client.query_entities(my_filter)
df = spark.read.option("multiline","true").json(entities)
But it didn't work. Even I can't calculate de length of entities with the error:
*AttributeError: 'ItemPaged' object has no attribute 'keys'*
When I print entities iterating over there, my data looks like:
{'PartitionKey': '10000', 'RowKey': '20220228091315', 'Acceleration': 0.0, 'Altitude': 971, 'BatteryVoltage': 13.35, 'DateTimeUTC': TablesEntityDatetime(2022, 2, 28, 9, 13, 15, tzinfo=datetime.timezone.utc)}
{'PartitionKey': '10000', 'RowKey': '20220228091820', 'Acceleration': 0.0, 'Altitude': 980, 'BatteryVoltage': 13.35, 'DateTimeUTC': TablesEntityDatetime(2022, 2, 28, 9, 18, 20, tzinfo=datetime.timezone.utc)}
.
.
.
I want to have a pyspark DF to apply common libraries, functions and methods.
Solution 1:[1]
If your table is not too big could you try:
df = spark.read.option("multiline","true").json(
[dict(row) for row in entities]
)
It seems entities is an iterator of ItemPaged and spark is trying to read the keys of those objects like a dict to form a schema. Simply parsing them to dicts should be enough for it to work.
I'm also uncertain about whether spark can recognize TablesEntityDatetime objects.
But let's do things one at a time.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | scr |
