'Iteration over list in PySpark
I am not sure how could I reproduce this python code into PySpark, any ideas? It iterate over a list of dicts, than if that idx_key happen again it append and then compare to the next list, I am wondering if an inner join could reproduce it. But I would have a cross result, no?
def get_new_contacts(b2b_master_data, new_potential_contact_data):
master_contact_data = {}
# Generating a dictionary where key is the idx_key and values are
# contacts associated with the idx_key
for record in list(b2b_master_data):
idx_key = record["idx_key"]
if idx_key not in master_contact_data:
master_contact_data[idx_key] = [record]
else:
master_contact_data[idx_key].append(record)
for record in list(new_potential_contact_data):
idx_key = record["idx_key"]
if idx_key in master_contact_data:
potential_matches = master_contact_data[idx_key]
return potential_matches ```
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
