'How to divide up a BigQuery table to smaller ones using groupby

Below is a demonstration of what I am aiming for in Pandas.

import pandas as pd
import numpy as np

Create a mock dataframe with 4 columns:

mockdf = pd.DataFrame({'col0':['x','y','y','y','x','x'],
                       'col1':['a','a','b','b','b','a'],
                       'col2':np.random.RandomState(42).normal(size=6),
                       'col3':np.random.RandomState(43).normal(size=6)})

mockdf is:

  col0 col1      col2      col3
0    x    a  0.496714  0.257400
1    y    a -0.138264 -0.908481
2    y    b  0.647689 -0.378503
3    y    b  1.523030 -0.534916
4    x    b -0.234153  0.858073
5    x    a -0.234137 -0.413010

I perform a groupby using col0 and col1. and create a dict where the key is the given group keys, the value is the dataframe:

gs = {'_'.join(key): value for key, value in mockdf.groupby(['col0','col1'])}

then, for example, gs['x_a'] is:

  col0 col1      col2     col3
0    x    a  0.496714  0.25740
5    x    a -0.234137 -0.41301

Uploading mockdf to BigQuery via:

mockdf.to_gbq(f"ourtable.mockdf", project_id="ourproject")

Dataframe uploads successfully.

How can I split the BigQuery table into smaller ones using groupby? Ideally, the names of these new tables should reflect which group they belong to.


Similar but different questions:



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source