Time series data¶

A major use case for xarray is multi-dimensional time-series data. Accordingly, we’ve copied many of features that make working with time-series data in pandas such a joy to xarray. In most cases, we rely on pandas for the core functionality.

Creating datetime64 data¶

xarray uses the numpy dtypes datetime64[ns] and timedelta64[ns] to represent datetime data, which offer vectorized (if sometimes buggy) operations with numpy and smooth integration with pandas.

To convert to or create regular arrays of datetime64 data, we recommend using pandas.to_datetime() and pandas.date_range():

In [1]: pd.to_datetime(['2000-01-01', '2000-02-02'])
Out[1]: DatetimeIndex(['2000-01-01', '2000-02-02'], dtype='datetime64[ns]', freq=None)

In [2]: pd.date_range('2000-01-01', periods=365)
Out[2]: 
DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-03', '2000-01-04',
               '2000-01-05', '2000-01-06', '2000-01-07', '2000-01-08',
               '2000-01-09', '2000-01-10',
               ...
               '2000-12-21', '2000-12-22', '2000-12-23', '2000-12-24',
               '2000-12-25', '2000-12-26', '2000-12-27', '2000-12-28',
               '2000-12-29', '2000-12-30'],
              dtype='datetime64[ns]', length=365, freq='D')

Alternatively, you can supply arrays of Python datetime objects. These get converted automatically when used as arguments in xarray objects:

In [3]: import datetime

In [4]: xr.Dataset({'time': datetime.datetime(2000, 1, 1)})
Out[4]: 
<xarray.Dataset>
Dimensions:  ()
Data variables:
    time     datetime64[ns] 2000-01-01

When reading or writing netCDF files, xarray automatically decodes datetime and timedelta arrays using CF conventions (that is, by using a units attribute like 'days since 2000-01-01').

Note

When decoding/encoding datetimes for non-standard calendars or for dates before year 1678 or after year 2262, xarray uses the cftime library. It was previously packaged with the netcdf4-python package under the name netcdftime but is now distributed separately. cftime is an optional dependency of xarray.

You can manual decode arrays in this form by passing a dataset to decode_cf():

In [5]: attrs = {'units': 'hours since 2000-01-01'}

In [6]: ds = xr.Dataset({'time': ('time', [0, 1, 2, 3], attrs)})

In [7]: xr.decode_cf(ds)
Out[7]: 
<xarray.Dataset>
Dimensions:  (time: 4)
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 ... 2000-01-01T03:00:00
Data variables:
    *empty*

One unfortunate limitation of using datetime64[ns] is that it limits the native representation of dates to those that fall between the years 1678 and 2262. When a netCDF file contains dates outside of these bounds, dates will be returned as arrays of cftime.datetime objects and a CFTimeIndex will be used for indexing. CFTimeIndex enables a subset of the indexing functionality of a pandas.DatetimeIndex and is only fully compatible with the standalone version of cftime (not the version packaged with earlier versions netCDF4). See Non-standard calendars and dates outside the Timestamp-valid range for more information.

Datetime indexing¶

xarray borrows powerful indexing machinery from pandas (see Indexing and selecting data).

This allows for several useful and succinct forms of indexing, particularly for datetime64 data. For example, we support indexing with strings for single items and with the slice object:

In [8]: time = pd.date_range('2000-01-01', freq='H', periods=365 * 24)

In [9]: ds = xr.Dataset({'foo': ('time', np.arange(365 * 24)), 'time': time})

In [10]: ds.sel(time='2000-01')
Out[10]: 
<xarray.Dataset>
Dimensions:  (time: 744)
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 ... 2000-01-31T23:00:00
Data variables:
    foo      (time) int64 0 1 2 3 4 5 6 7 8 ... 736 737 738 739 740 741 742 743

In [11]: ds.sel(time=slice('2000-06-01', '2000-06-10'))
Out[11]: 
<xarray.Dataset>
Dimensions:  (time: 240)
Coordinates:
  * time     (time) datetime64[ns] 2000-06-01 ... 2000-06-10T23:00:00
Data variables:
    foo      (time) int64 3648 3649 3650 3651 3652 ... 3883 3884 3885 3886 3887

You can also select a particular time by indexing with a datetime.time object:

In [12]: ds.sel(time=datetime.time(12))
Out[12]: 
<xarray.Dataset>
Dimensions:  (time: 365)
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01T12:00:00 ... 2000-12-30T12:00:00
Data variables:
    foo      (time) int64 12 36 60 84 108 132 ... 8628 8652 8676 8700 8724 8748

For more details, read the pandas documentation.

Datetime components¶

Similar to pandas, the components of datetime objects contained in a given DataArray can be quickly computed using a special .dt accessor.

In [13]: time = pd.date_range('2000-01-01', freq='6H', periods=365 * 4)

In [14]: ds = xr.Dataset({'foo': ('time', np.arange(365 * 4)), 'time': time})

In [15]: ds.time.dt.hour
Out[15]: 
<xarray.DataArray 'hour' (time: 1460)>
array([ 0,  6, 12, ...,  6, 12, 18])
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 ... 2000-12-30T18:00:00

In [16]: ds.time.dt.dayofweek
Out[16]: 
<xarray.DataArray 'dayofweek' (time: 1460)>
array([5, 5, 5, ..., 5, 5, 5])
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 ... 2000-12-30T18:00:00

The .dt accessor works on both coordinate dimensions as well as multi-dimensional data.

xarray also supports a notion of “virtual” or “derived” coordinates for datetime components implemented by pandas, including “year”, “month”, “day”, “hour”, “minute”, “second”, “dayofyear”, “week”, “dayofweek”, “weekday” and “quarter”:

In [17]: ds['time.month']
Out[17]: 
<xarray.DataArray 'month' (time: 1460)>
array([ 1,  1,  1, ..., 12, 12, 12])
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 ... 2000-12-30T18:00:00

In [18]: ds['time.dayofyear']
Out[18]: 
<xarray.DataArray 'dayofyear' (time: 1460)>
array([  1,   1,   1, ..., 365, 365, 365])
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 ... 2000-12-30T18:00:00

For use as a derived coordinate, xarray adds 'season' to the list of datetime components supported by pandas:

In [19]: ds['time.season']
Out[19]: 
<xarray.DataArray 'season' (time: 1460)>
array(['DJF', 'DJF', 'DJF', ..., 'DJF', 'DJF', 'DJF'], dtype='<U3')
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 ... 2000-12-30T18:00:00

In [20]: ds['time'].dt.season
Out[20]: 
<xarray.DataArray 'season' (time: 1460)>
array(['DJF', 'DJF', 'DJF', ..., 'DJF', 'DJF', 'DJF'], dtype='<U3')
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 ... 2000-12-30T18:00:00

The set of valid seasons consists of ‘DJF’, ‘MAM’, ‘JJA’ and ‘SON’, labeled by the first letters of the corresponding months.

You can use these shortcuts with both Datasets and DataArray coordinates.

In addition, xarray supports rounding operations floor, ceil, and round. These operations require that you supply a rounding frequency as a string argument.

In [21]: ds['time'].dt.floor('D')
Out[21]: 
<xarray.DataArray 'floor' (time: 1460)>
array(['2000-01-01T00:00:00.000000000', '2000-01-01T00:00:00.000000000',
       '2000-01-01T00:00:00.000000000', ..., '2000-12-30T00:00:00.000000000',
       '2000-12-30T00:00:00.000000000', '2000-12-30T00:00:00.000000000'],
      dtype='datetime64[ns]')
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 ... 2000-12-30T18:00:00

Resampling and grouped operations¶

Datetime components couple particularly well with grouped operations (see GroupBy: split-apply-combine) for analyzing features that repeat over time. Here’s how to calculate the mean by time of day:

In [22]: ds.groupby('time.hour').mean()
Out[22]: 
<xarray.Dataset>
Dimensions:  (hour: 4)
Coordinates:
  * hour     (hour) int64 0 6 12 18
Data variables:
    foo      (hour) float64 728.0 729.0 730.0 731.0

For upsampling or downsampling temporal resolutions, xarray offers a resample() method building on the core functionality offered by the pandas method of the same name. Resample uses essentially the same api as resample in pandas.

For example, we can downsample our dataset from hourly to 6-hourly:

In [23]: ds.resample(time='6H')
Out[23]: <xarray.core.resample.DatasetResample at 0x7f21ef78ee80>

This will create a specialized Resample object which saves information necessary for resampling. All of the reduction methods which work with Resample objects can also be used for resampling:

In [24]: ds.resample(time='6H').mean()
Out[24]: 
<xarray.Dataset>
Dimensions:  (time: 1460)
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 ... 2000-12-30T18:00:00
Data variables:
    foo      (time) float64 0.0 1.0 2.0 3.0 ... 1.457e+03 1.458e+03 1.459e+03

You can also supply an arbitrary reduction function to aggregate over each resampling group:

In [25]: ds.resample(time='6H').reduce(np.mean)
Out[25]: 
<xarray.Dataset>
Dimensions:  (time: 1460)
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 ... 2000-12-30T18:00:00
Data variables:
    foo      (time) float64 0.0 1.0 2.0 3.0 ... 1.457e+03 1.458e+03 1.459e+03

For upsampling, xarray provides four methods: asfreq, ffill, bfill, and interpolate. interpolate extends scipy.interpolate.interp1d and supports all of its schemes. All of these resampling operations work on both Dataset and DataArray objects with an arbitrary number of dimensions.

For more examples of using grouped operations on a time dimension, see Toy weather data.

Non-standard calendars and dates outside the Timestamp-valid range¶

Through the standalone cftime library and a custom subclass of pandas.Index, xarray supports a subset of the indexing functionality enabled through the standard pandas.DatetimeIndex for dates from non-standard calendars commonly used in climate science or dates using a standard calendar, but outside the Timestamp-valid range (approximately between years 1678 and 2262).

Note

As of xarray version 0.11, by default, cftime.datetime objects will be used to represent times (either in indexes, as a CFTimeIndex, or in data arrays with dtype object) if any of the following are true:

The dates are from a non-standard calendar
Any dates are outside the Timestamp-valid range.

Otherwise pandas-compatible dates from a standard calendar will be represented with the np.datetime64[ns] data type, enabling the use of a pandas.DatetimeIndex or arrays with dtype np.datetime64[ns] and their full set of associated features.

For example, you can create a DataArray indexed by a time coordinate with dates from a no-leap calendar and a CFTimeIndex will automatically be used:

In [26]: from itertools import product

In [27]: from cftime import DatetimeNoLeap

In [28]: dates = [DatetimeNoLeap(year, month, 1) for year, month in
   ....:          product(range(1, 3), range(1, 13))]
   ....: 

In [29]: da = xr.DataArray(np.arange(24), coords=[dates], dims=['time'], name='foo')

xarray also includes a cftime_range() function, which enables creating a CFTimeIndex with regularly-spaced dates. For instance, we can create the same dates and DataArray we created above using:

In [30]: dates = xr.cftime_range(start='0001', periods=24, freq='MS', calendar='noleap')

In [31]: da = xr.DataArray(np.arange(24), coords=[dates], dims=['time'], name='foo')

For data indexed by a CFTimeIndex xarray currently supports:

Partial datetime string indexing using strictly ISO 8601-format partial datetime strings:

In [32]: da.sel(time='0001')
Out[32]: 
<xarray.DataArray 'foo' (time: 12)>
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
Coordinates:
  * time     (time) object 0001-01-01 00:00:00 ... 0001-12-01 00:00:00

In [33]: da.sel(time=slice('0001-05', '0002-02'))
Out[33]: 
<xarray.DataArray 'foo' (time: 10)>
array([ 4,  5,  6,  7,  8,  9, 10, 11, 12, 13])
Coordinates:
  * time     (time) object 0001-05-01 00:00:00 ... 0002-02-01 00:00:00

Access of basic datetime components via the dt accessor (in this case just “year”, “month”, “day”, “hour”, “minute”, “second”, “microsecond”, “season”, “dayofyear”, and “dayofweek”):

In [34]: da.time.dt.year
Out[34]: 
<xarray.DataArray 'year' (time: 24)>
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
Coordinates:
  * time     (time) object 0001-01-01 00:00:00 ... 0002-12-01 00:00:00

In [35]: da.time.dt.month
Out[35]: 
<xarray.DataArray 'month' (time: 24)>
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12,  1,  2,  3,  4,  5,  6,
        7,  8,  9, 10, 11, 12])
Coordinates:
  * time     (time) object 0001-01-01 00:00:00 ... 0002-12-01 00:00:00

In [36]: da.time.dt.season
Out[36]: 
<xarray.DataArray 'season' (time: 24)>
array(['DJF', 'DJF', 'MAM', 'MAM', 'MAM', 'JJA', 'JJA', 'JJA', 'SON', 'SON',
       'SON', 'DJF', 'DJF', 'DJF', 'MAM', 'MAM', 'MAM', 'JJA', 'JJA', 'JJA',
       'SON', 'SON', 'SON', 'DJF'], dtype='<U3')
Coordinates:
  * time     (time) object 0001-01-01 00:00:00 ... 0002-12-01 00:00:00

In [37]: da.time.dt.dayofyear
Out[37]: 
<xarray.DataArray 'dayofyear' (time: 24)>
array([  1,  32,  60,  91, 121, 152, 182, 213, 244, 274, 305, 335,   1,  32,
        60,  91, 121, 152, 182, 213, 244, 274, 305, 335])
Coordinates:
  * time     (time) object 0001-01-01 00:00:00 ... 0002-12-01 00:00:00

In [38]: da.time.dt.dayofweek
Out[38]: 
<xarray.DataArray 'dayofweek' (time: 24)>
array([1, 4, 4, 0, 2, 5, 0, 3, 6, 1, 4, 6, 2, 5, 5, 1, 3, 6, 1, 4, 0, 2, 5, 0])
Coordinates:
  * time     (time) object 0001-01-01 00:00:00 ... 0002-12-01 00:00:00

Group-by operations based on datetime accessor attributes (e.g. by month of the year):

In [39]: da.groupby('time.month').sum()
Out[39]: 
<xarray.DataArray 'foo' (month: 12)>
array([12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34])
Coordinates:
  * month    (month) int64 1 2 3 4 5 6 7 8 9 10 11 12

Interpolation using cftime.datetime objects:

In [40]: da.interp(time=[DatetimeNoLeap(1, 1, 15), DatetimeNoLeap(1, 2, 15)])
Out[40]: 
<xarray.DataArray 'foo' (time: 2)>
array([0.451613, 1.5     ])
Coordinates:
  * time     (time) object 0001-01-15 00:00:00 0001-02-15 00:00:00

Interpolation using datetime strings:

In [41]: da.interp(time=['0001-01-15', '0001-02-15'])
Out[41]: 
<xarray.DataArray 'foo' (time: 2)>
array([0.451613, 1.5     ])
Coordinates:
  * time     (time) object 0001-01-15 00:00:00 0001-02-15 00:00:00

Differentiation:

In [42]: da.differentiate('time')
Out[42]: 
<xarray.DataArray 'foo' (time: 24)>
array([3.733572e-07, 3.943755e-07, 3.943755e-07, 3.796819e-07, 3.796819e-07,
       3.796819e-07, 3.796819e-07, 3.733572e-07, 3.796819e-07, 3.796819e-07,
       3.796819e-07, 3.796819e-07, 3.733572e-07, 3.943755e-07, 3.943755e-07,
       3.796819e-07, 3.796819e-07, 3.796819e-07, 3.796819e-07, 3.733572e-07,
       3.796819e-07, 3.796819e-07, 3.796819e-07, 3.858025e-07])
Coordinates:
  * time     (time) object 0001-01-01 00:00:00 ... 0002-12-01 00:00:00

And serialization:

In [43]: da.to_netcdf('example-no-leap.nc')

In [44]: xr.open_dataset('example-no-leap.nc')
Out[44]: 
<xarray.Dataset>
Dimensions:  (time: 24)
Coordinates:
  * time     (time) object 0001-01-01 00:00:00 ... 0002-12-01 00:00:00
Data variables:
    foo      (time) int64 ...

Note

While much of the time series functionality that is possible for standard dates has been implemented for dates from non-standard calendars, there are still some remaining important features that have yet to be implemented, for example:

Resampling along the time dimension for data indexed by a CFTimeIndex (GH2191, GH2458)
Built-in plotting of data with cftime.datetime coordinate axes (GH2164).

For some use-cases it may still be useful to convert from a CFTimeIndex to a pandas.DatetimeIndex, despite the difference in calendar types (e.g. to allow the use of some forms of resample with non-standard calendars). The recommended way of doing this is to use the built-in to_datetimeindex() method:

In [45]: modern_times = xr.cftime_range('2000', periods=24, freq='MS', calendar='noleap')

In [46]: da = xr.DataArray(range(24), [('time', modern_times)])

In [47]: da
Out[47]: 
<xarray.DataArray (time: 24)>
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20, 21, 22, 23])
Coordinates:
  * time     (time) object 2000-01-01 00:00:00 ... 2001-12-01 00:00:00

In [48]: datetimeindex = da.indexes['time'].to_datetimeindex()

In [49]: da['time'] = datetimeindex

In [50]: da.resample(time='Y').mean('time')
Out[50]: 
<xarray.DataArray (time: 2)>
array([ 5.5, 17.5])
Coordinates:
  * time     (time) datetime64[ns] 2000-12-31 2001-12-31

However in this case one should use caution to only perform operations which do not depend on differences between dates (e.g. differentiation, interpolation, or upsampling with resample), as these could introduce subtle and silent errors due to the difference in calendar types between the dates encoded in your data and the dates stored in memory.