'Python good practice with NetCDF4 shared dimensions across groups

This question is conceptual in place of a direct error.

I am working with the python netcdf4 api to translate and store binary datagram packets from multiple sensors packaged in a single file. My question is in reference to Scope of dimensions and best use practices. According to the Netcdf4 convention and metadata docs, dimension scope is such that all child groups have access to a dimension defined in the parent group (http://cfconventions.org/Data/cf-conventions/cf-conventions-1.9/cf-conventions.html#_scope).

Context: The multiple sensors datapackets are written to a binary file. Timing adjustments are handled prior to writing the binary file such that we can trust the time stamp of a data packet. Time sampling rates are not synchonious. Sensor 1 samples at say 1Hz. Sensor 2 samples at 100Hz. Sensor 1 and 2 measure a number of different variables.

Questions:

  1. Do I define a single, unlimited time dimension at the root level and create multiple variables using that dimension, or create individual time dimensions at the group level. Psuedo-code below.

In setting up the netcdf I would use the following code:

import netcdf4

data_set = netcdf4.Dataset(file_name, mode='w')
dim_time = data_set.createDimension(dimname='time', size=None)

grp_1 = data_set.createGroup('grp1')
var_time_1 = grp_1.createVariable(varname='sensor_1_time', datatype='f8', dimensions=(time,))
var_time_1[:] = sensor_1_time    # timestamp of data values from sensor_1
var_1 = grp_1.createVariable(varname='sensor_1_data', datatype='f8', 
dimensions=(time,))
var_1[:] = sensor_1_data    # data values from sensor 1

grp_2 = data_set.createGroup('grp2')
var_time_2 = grp_2.createVariable(varname='sensor_2_time', datatype='f8', dimensions=(time,))
var_time_2[:] = sensor_2_time
var_2 = grp_2.createVariable(varname='sensor_2_data', datatype='f8', dimension=(time,))
var_2[:] = sensor_2_data    # data values from sensor 2

  1. The group separation is not necessarily by sensor but by logical data grouping. In the case that data from two sensors falls into multiple groups, is it best to replicate the time array in each group or is it acceptable to reference to other groups using the Scope mechanism.
import netcdf4

data_set = netcdf4.Dataset(file_name, mode='w')
dim_time = data_set.createDimension(dimname='time', size=None)

grp_env = data_set.createGroup('env_data')
sensor_time_1 = grp_env.createVariable(varname='sensor_1_time', datatype='f8', dimensions=(time,))
sensor_time_1[:] = sensor_1_time    # timestamp of data values from sensor_1
env_1 = grp_env.createVariable(varname='sensor_1_data', datatype='f8', dimensions=(time,))
env_1[:] = sensor_1_data    # data values from sensor 1

env_2 = grp_1.createVariable(varname='sensor_2_data', datatype='f8', dimensions=(time,))
env_2.coordinates = "/grp_platform/sensor_time_1"

grp_platform = data_set.createGroup('platform')
sensor_time_2 = grp_platform.createVariable(varname='sensor_2_time', datatype='f8', dimensions=(time,))
sensor_time_2[:] = sensor_2_time
plt_2 = grp_platform.createVariable(varname='sensor_2_data', datatype='f8', dimension=(time,))
var_2[:] = sensor_2_data    # data values from sensor 2

Most examples do not deal with these cross group functionality and I can't seem to find the best practices. I'd love some advice, or even a push in the right direction.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source