'Storing a billion long snowflake query into a json file
I am running an sql query from a python script using the snowflake python connector and sqlalchemy. The query returns more than 6 billion rows of information (each row basically represents a link between two users in a graph). I want to store the results of this query into a json file since this file is an input for another algorithm.
First I tried storing the results of the query with a simple for loop, but as expected the python script crashed. The code is below:
from sqlalchemy import create_engine
from snowflake.sqlalchemy import URL
import json
links_output_file = 'links_file.json'
friendships = '''select source as source, target as target, 1 as weight
from database;
'''
result = engine.execute(friendships)
link_list = []
for _, connection in enumerate(result):
link_dict = {"source": connection[0], "target": connection[1], "weight": connection[2]}
links_list.append(link_dict)
with open(links_output_file, 'w') as f:
f.write(json.dumps(links_list))
After that, I tried doing just simple list comprehension but, although it was faster, python still crashed. Does someone know of a way to obtain this json file or go directly from the snowflake query results to the json file?
Thanks for all your help!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
