'Read CosmosDb items from pyspark (Databricks) with an inconsistent model

Let's say I have two items in CosmosDb:

{
  "Test": {
    "InconsistentA": 10
  },
  "Common": 1
}
{
  "Test": {
    "InconsistentB": 10
  },
  "Common": 2
}

How to read this data so to have the following schema:

  • Test: string (the JSON string of the inconsistent part of the model)
  • Common: int (the consistent part of the model)

I don't know in advance what the model is and the spark CosmosDb driver (com.microsoft.azure.cosmosdb.spark) only reads X first items in CosmosDb to infer the schema.

What I have tried is enforcing the schema this way:

|-- Test: string (nullable = true)
|-- Common: integer (nullable = true)

But the result of the Test column is:

'{ InconsistentA=10 }'

Instead of:

'{ "InconsistentA": 10 }'


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source