'Read CosmosDb items from pyspark (Databricks) with an inconsistent model
Let's say I have two items in CosmosDb:
{
"Test": {
"InconsistentA": 10
},
"Common": 1
}
{
"Test": {
"InconsistentB": 10
},
"Common": 2
}
How to read this data so to have the following schema:
- Test: string (the JSON string of the inconsistent part of the model)
- Common: int (the consistent part of the model)
I don't know in advance what the model is and the spark CosmosDb driver (com.microsoft.azure.cosmosdb.spark) only reads X first items in CosmosDb to infer the schema.
What I have tried is enforcing the schema this way:
|-- Test: string (nullable = true)
|-- Common: integer (nullable = true)
But the result of the Test column is:
'{ InconsistentA=10 }'
Instead of:
'{ "InconsistentA": 10 }'
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
