'How to divide up a BigQuery table to smaller ones using groupby
Below is a demonstration of what I am aiming for in Pandas.
import pandas as pd
import numpy as np
Create a mock dataframe with 4 columns:
mockdf = pd.DataFrame({'col0':['x','y','y','y','x','x'],
'col1':['a','a','b','b','b','a'],
'col2':np.random.RandomState(42).normal(size=6),
'col3':np.random.RandomState(43).normal(size=6)})
mockdf is:
col0 col1 col2 col3
0 x a 0.496714 0.257400
1 y a -0.138264 -0.908481
2 y b 0.647689 -0.378503
3 y b 1.523030 -0.534916
4 x b -0.234153 0.858073
5 x a -0.234137 -0.413010
I perform a groupby using col0 and col1. and create a dict where the key is the given group keys, the value is the dataframe:
gs = {'_'.join(key): value for key, value in mockdf.groupby(['col0','col1'])}
then, for example, gs['x_a'] is:
col0 col1 col2 col3
0 x a 0.496714 0.25740
5 x a -0.234137 -0.41301
Uploading mockdf to BigQuery via:
mockdf.to_gbq(f"ourtable.mockdf", project_id="ourproject")
Dataframe uploads successfully.
How can I split the BigQuery table into smaller ones using groupby? Ideally, the names of these new tables should reflect which group they belong to.
Similar but different questions:
Split a table into multiple tables in BigQuery SQL: no groupby is used here
How Split a Large events table in google bigquery into multiple table based on event_type?: question not reproducible, answer performs groupby but does not load groups into new tables.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
