'Combining multiple data files with different dimensions and 1D and 2D variables in Python

I have a bunch of NetCDF files that all have different dimensions because the data contains a certain number of circular contours which are made using 50 longitude/latitude observations for each contour (so day 1 final may be (5,50) and day 2 (8,50) and day 3 (12,50)) and I want to combine all of them into a single dataset. Each dataset has the longitudes and latitudes to make the circle contours and variables describing values within them.Here is what the file looks like in Panoply and you can see that a few of the variable are Geo2D.

You cant use xr.mfdataset easily due to the different dimensions but I have written this chunk of code which combines them:

# List all matching files
files = sorted(glob.glob('/data/watkinson/Spring2022/Satellite/data/eddies/Cyclonic/*.nc'))
num_files = len(files)#number of Cyclonic eddy files for the loop
#print(files)
#print(num_files)

ds = xr.open_dataset(files[0])#load in the first Cyclonic eddy file from the directory
df = ds.to_dataframe()#convert xarray Dataset to panda dataframe
#print(df)

n  = 1
while n < num_files:
    # Load a single dataset
    xs = xr.open_dataset(files[n])
    df2 = xs.to_dataframe()

    # Add the dataset to the list
    df = df.append(df2)
    n=n+1

This combines all of the files but I have a problem with the Geo2D variables being separated so the 50 different longitudes are put in separate rows for each observation. Here is what the panda dataframe looks like after running the code above.

You can see that dataframe is 2488300 rows x 24 columns but I want it to be 49766x24 (2488300/50).

How can I combine these files and keep the 2D variables from seperating into 50 rows for every observation? So then, each row represents one complete eddy observation instead of 50 rows?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source