'How to only select winter months of daily data in xarray?
I have gridded daily temperature data but am only interested in the winter months.
from netCDF4 import Dataset as netcdf_dataset
import numpy as np
import xarray as xr
#open NASA GISS gridded temperature netcdf file
df = xr.open_dataset('BerkeleyEarth.nc')
#pull out temperature variable
air=df.temperature
#select only winter months
WinterAir = air[(air.time.dt.month >= 12) | (air.time.dt.month <= 2)]
When I try to select the months this way I get the following error message: AttributeError: 'DataArray' object has no attribute 'month'. How do I select only winter months?
Solution 1:[1]
I was able to do this by:
# select only winter months
WinterAir = air[(df.month >= 12) | (df.month <= 2)]
Solution 2:[2]
The reason this doesn't work for your data specifically is that you don't have a datetime coordinate time; instead, you have a dimension time without any coordinate data labeling it, and then you have data variables with a variety of date components. Because of this, you can reference the month data variable directly and use that to slice your data.
You could always construct a datetime coordinate using the day, month, and year values in your data and assign that as the time coordinate, and then the usual time series functionality built into xarray would work.
As an example, here's a dataset similar to yours in structure:
In [6]: dates = pd.date_range("2020-01-01", "2020-12-31", freq="D")
...:
...: ds = xr.Dataset(
...: coords={"lon": [-135, -45, 45, 135], "lat": [-45, 45]},
...: data_vars={
...: "day": (("time",), dates.day),
...: "month": (("time",), dates.month),
...: "year": (("time",), dates.year),
...: "temperature": (
...: ("lat", "lon", "time"),
...: np.random.random(size=(2, 4, len(dates))),
...: ),
...: },
...: )
In [7]: ds
Out[7]:
<xarray.Dataset>
Dimensions: (time: 366, lat: 2, lon: 4)
Coordinates:
* lon (lon) int64 -135 -45 45 135
* lat (lat) int64 -45 45
Dimensions without coordinates: time
Data variables:
day (time) int64 1 2 3 4 5 6 7 8 9 ... 23 24 25 26 27 28 29 30 31
month (time) int64 1 1 1 1 1 1 1 1 1 1 ... 12 12 12 12 12 12 12 12 12
year (time) int64 2020 2020 2020 2020 2020 ... 2020 2020 2020 2020
temperature (lat, lon, time) float64 0.2308 0.3257 ... 0.3501 0.009162
Note that time is a special "dimension without coordinates" - this means that there are no labels on the time dimension, and xarray does not know anything about "time" except that it has a certain shape and is the dimension indexing several of your data variables. Importantly, in your data, time is not a datetime type.
Because month is a data variable in the dataset, you need to reference month directly, as you found, and the DatetimeAccessor ds.time.dt is not available:
In [8]: ds.loc[{"time": ds.month == 2}]
Out[8]:
<xarray.Dataset>
Dimensions: (time: 29, lat: 2, lon: 4)
Coordinates:
* lon (lon) int64 -135 -45 45 135
* lat (lat) int64 -45 45
Dimensions without coordinates: time
Data variables:
day (time) int64 1 2 3 4 5 6 7 8 9 ... 21 22 23 24 25 26 27 28 29
month (time) int64 2 2 2 2 2 2 2 2 2 2 2 2 ... 2 2 2 2 2 2 2 2 2 2 2
year (time) int64 2020 2020 2020 2020 2020 ... 2020 2020 2020 2020
temperature (lat, lon, time) float64 0.2821 0.08776 0.2018 ... 0.929 0.4774
If the time dimension had a corresponding coordinate of type datetime, e.g. by assigning the previous dates array to the time coord, everything would work as you expect:
In [10]: dates = pd.date_range("2020-01-01", "2020-12-31", freq="D")
...:
...: ds = xr.Dataset(
...: coords={"lon": [-135, -45, 45, 135], "lat": [-45, 45], "time": dates},
...: data_vars={
...: "temperature": (
...: ("lat", "lon", "time"),
...: np.random.random(size=(2, 4, len(dates))),
...: ),
...: },
...: )
In [11]: ds
Out[11]:
<xarray.Dataset>
Dimensions: (lat: 2, lon: 4, time: 366)
Coordinates:
* lon (lon) int64 -135 -45 45 135
* lat (lat) int64 -45 45
* time (time) datetime64[ns] 2020-01-01 2020-01-02 ... 2020-12-31
Data variables:
temperature (lat, lon, time) float64 0.09064 0.5252 ... 0.08733 0.6283
Now the xarray datetime accessors work the way you'd expect
In [12]: ds.loc[{"time": ds.time.dt.month == 2}]
Out[12]:
<xarray.Dataset>
Dimensions: (lat: 2, lon: 4, time: 29)
Coordinates:
* lon (lon) int64 -135 -45 45 135
* lat (lat) int64 -45 45
* time (time) datetime64[ns] 2020-02-01 2020-02-02 ... 2020-02-29
Data variables:
temperature (lat, lon, time) float64 0.3407 0.6847 0.3073 ... 0.8578 0.1335
See xarray's docs on Coordinates and working with time series data for more info.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Adrian Tompkins |
| Solution 2 | Michael Delgado |

