'JOLT shift through properties with different names
I have a JSON:
{
"relations": {
"advertiser_id": {
"9968": {
"name": "Advance/Unicredit",
"id": 9968
},
"10103": {
"name": "Advance/ ORIMI",
"id": 10103
}
},
"campaign_id": {
"256292": {
"name": "Interests_Aidata",
"id": 256292,
"advertiser_id": 9968
},
"257717": {
"name": "G_14.04",
"id": 257717,
"advertiser_id": 10103
}
}
}
}
I thought that it's an easy shift operation, but I'm stuck because of all values inside random property names like "9968": I don't understand how to move through json with these different propertie names.
Expected Output:
[
{
"name": "Interests_Aidata",
"id": 256292,
"advertiser_id": 9968
},
{
"name": "G_14.04",
"id": 257717,
"advertiser_id": 10103
}
]
UPDATE
Is it possible to add top-level (under relations) advertiser_id or campaign_id as additional propety like in an example?
[
{
"name": "Interests_Aidata",
"id": 256292,
"advertiser_id": 9968,
"entity_type": "campaign_id"
},
{
"name": "G_14.04",
"id": 257717,
"advertiser_id": 10103,
"entity_type": "campaign_id"
}
]
Solution 1:[1]
since
def __init__(self, ...):
...
# this variable
data_lake_paths = GoogleCloudStoragePaths(self.destination_table)
is outside of any class method that accepts a self parameter, python will set this as a class variable and not an instance variable, so no self parameter will be passed (that's why the Undefined Variable 'self' occurs).
Put that inside of the __init__ method or another method and it should work,.
Solution 2:[2]
Variables defined straight into the body of a class cannot use self, only those in methods can. Instead, just move the definition of that variable into the __init__ function. If you need to have the variable even when __init__ has not been run, then just set it to None in the class body.
code:
class MySQLBatchPipeline:
'''
Class to generate MySQL batch pipelines that store CSV's
in GCS then import them into BigQuery
'''
export_format='CSV'
DESTINATION_TABLE_FORMAT = self.get_environment() + '.{dataset}.{table}'
# Declare here if necessary
data_lake_paths = None
def __init__(
self,
dag,
sql_directory,
gcp_project_id,
mysql_connection_id,
source_schema,
source_table,
gcs_connection_id,
bq_connection_id,
gcs_bucket,
destination_staging_schema,
destination_schema,
destination_table,
environment,
time_delay,
country,
max_file_size: int=int(50e6),
):
self.dag = dag,
self.sql_directory = sql_directory,
self.gcp_project_id = gcp_project_id,
self.mysql_connection_id = mysql_connection_id,
self.source_schema = source_schema,
self.source_table = source_table,
self.gcs_connection_id = gcs_connection_id,
self.bq_connection_id = bq_connection_id,
self.gcs_bucket = gcs_bucket,
self.destination_staging_schema = destination_staging_schema,
self.destination_schema = destination_schema,
self.destination_table = destination_table,
self.time_delay = time_delay,
self.environment = environment,
self.max_file_size = max_file_size,
self.queries = self.get_pipeline_queries()
self.schema_file = self.get_schema_files()
self.country = country
# Define here
data_lake_paths = GoogleCloudStoragePaths(self.destination_table)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | 404kuso |
| Solution 2 | Lecdi |
