Quick overview¶

Here are some quick examples of what you can do with xarray.DataArray objects. Everything is explained in much more detail in the rest of the documentation.

To begin, import numpy, pandas and xarray using their customary abbreviations:

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: import xarray as xr

Create a DataArray¶

You can make a DataArray from scratch by supplying data in the form of a numpy array or list, with optional dimensions and coordinates:

In [4]: xr.DataArray(np.random.randn(2, 3))
Out[4]: 
<xarray.DataArray (dim_0: 2, dim_1: 3)>
array([[ 1.643563, -1.469388,  0.357021],
       [-0.6746  , -1.776904, -0.968914]])
Dimensions without coordinates: dim_0, dim_1

In [5]: data = xr.DataArray(np.random.randn(2, 3), coords={'x': ['a', 'b']}, dims=('x', 'y'))

In [6]: data
Out[6]: 
<xarray.DataArray (x: 2, y: 3)>
array([[-1.294524,  0.413738,  0.276662],
       [-0.472035, -0.01396 , -0.362543]])
Coordinates:
  * x        (x) <U1 'a' 'b'
Dimensions without coordinates: y

If you supply a pandas Series or DataFrame, metadata is copied directly:

In [7]: xr.DataArray(pd.Series(range(3), index=list('abc'), name='foo'))
Out[7]: 
<xarray.DataArray 'foo' (dim_0: 3)>
array([0, 1, 2])
Coordinates:
  * dim_0    (dim_0) object 'a' 'b' 'c'

Here are the key properties for a DataArray:

# like in pandas, values is a numpy array that you can modify in-place
In [8]: data.values
Out[8]: 
array([[-1.295,  0.414,  0.277],
       [-0.472, -0.014, -0.363]])

In [9]: data.dims
Out[9]: ('x', 'y')

In [10]: data.coords
Out[10]: 
Coordinates:
  * x        (x) <U1 'a' 'b'

# you can use this dictionary to store arbitrary metadata
In [11]: data.attrs
Out[11]: OrderedDict()

Indexing¶

xarray supports four kind of indexing. These operations are just as fast as in pandas, because we borrow pandas’ indexing machinery.

# positional and by integer label, like numpy
In [12]: data[[0, 1]]
Out[12]: 
<xarray.DataArray (x: 2, y: 3)>
array([[-1.294524,  0.413738,  0.276662],
       [-0.472035, -0.01396 , -0.362543]])
Coordinates:
  * x        (x) <U1 'a' 'b'
Dimensions without coordinates: y

# positional and by coordinate label, like pandas
In [13]: data.loc['a':'b']
Out[13]: 
<xarray.DataArray (x: 2, y: 3)>
array([[-1.294524,  0.413738,  0.276662],
       [-0.472035, -0.01396 , -0.362543]])
Coordinates:
  * x        (x) <U1 'a' 'b'
Dimensions without coordinates: y

# by dimension name and integer label
In [14]: data.isel(x=slice(2))
Out[14]: 
<xarray.DataArray (x: 2, y: 3)>
array([[-1.294524,  0.413738,  0.276662],
       [-0.472035, -0.01396 , -0.362543]])
Coordinates:
  * x        (x) <U1 'a' 'b'
Dimensions without coordinates: y

# by dimension name and coordinate label
In [15]: data.sel(x=['a', 'b'])
Out[15]: 
<xarray.DataArray (x: 2, y: 3)>
array([[-1.294524,  0.413738,  0.276662],
       [-0.472035, -0.01396 , -0.362543]])
Coordinates:
  * x        (x) <U1 'a' 'b'
Dimensions without coordinates: y

Computation¶

Data arrays work very similarly to numpy ndarrays:

In [16]: data + 10
Out[16]: 
<xarray.DataArray (x: 2, y: 3)>
array([[  8.705476,  10.413738,  10.276662],
       [  9.527965,   9.98604 ,   9.637457]])
Coordinates:
  * x        (x) <U1 'a' 'b'
Dimensions without coordinates: y

In [17]: np.sin(data)
Out[17]: 
<xarray.DataArray (x: 2, y: 3)>
array([[-0.962079,  0.402035,  0.273146],
       [-0.454699, -0.013959, -0.354653]])
Coordinates:
  * x        (x) <U1 'a' 'b'
Dimensions without coordinates: y

In [18]: data.T
Out[18]: 
<xarray.DataArray (y: 3, x: 2)>
array([[-1.294524, -0.472035],
       [ 0.413738, -0.01396 ],
       [ 0.276662, -0.362543]])
Coordinates:
  * x        (x) <U1 'a' 'b'
Dimensions without coordinates: y

In [19]: data.sum()
Out[19]: 
<xarray.DataArray ()>
array(-1.4526610277231344)

However, aggregation operations can use dimension names instead of axis numbers:

In [20]: data.mean(dim='x')
Out[20]: 
<xarray.DataArray (y: 3)>
array([-0.883279,  0.199889, -0.042941])
Dimensions without coordinates: y

Arithmetic operations broadcast based on dimension name. This means you don’t need to insert dummy dimensions for alignment:

In [21]: a = xr.DataArray(np.random.randn(3), [data.coords['y']])

In [22]: b = xr.DataArray(np.random.randn(4), dims='z')

In [23]: a
Out[23]: 
<xarray.DataArray (y: 3)>
array([-0.006154, -0.923061,  0.895717])
Coordinates:
  * y        (y) int64 0 1 2

In [24]: b
Out[24]: 
<xarray.DataArray (z: 4)>
array([ 0.805244, -1.206412,  2.565646,  1.431256])
Dimensions without coordinates: z

In [25]: a + b
Out[25]: 
<xarray.DataArray (y: 3, z: 4)>
array([[ 0.79909 , -1.212565,  2.559492,  1.425102],
       [-0.117817, -2.129472,  1.642585,  0.508195],
       [ 1.700961, -0.310694,  3.461363,  2.326973]])
Coordinates:
  * y        (y) int64 0 1 2
Dimensions without coordinates: z

It also means that in most cases you do not need to worry about the order of dimensions:

In [26]: data - data.T
Out[26]: 
<xarray.DataArray (x: 2, y: 3)>
array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])
Coordinates:
  * x        (x) <U1 'a' 'b'
Dimensions without coordinates: y

Operations also align based on index labels:

In [27]: data[:-1] - data[:1]
Out[27]: 
<xarray.DataArray (x: 1, y: 3)>
array([[ 0.,  0.,  0.]])
Coordinates:
  * x        (x) <U1 'a'
Dimensions without coordinates: y

GroupBy¶

xarray supports grouped operations using a very similar API to pandas:

In [28]: labels = xr.DataArray(['E', 'F', 'E'], [data.coords['y']], name='labels')

In [29]: labels
Out[29]: 
<xarray.DataArray 'labels' (y: 3)>
array(['E', 'F', 'E'], 
      dtype='<U1')
Coordinates:
  * y        (y) int64 0 1 2

In [30]: data.groupby(labels).mean('y')
Out[30]: 
<xarray.DataArray (x: 2, labels: 2)>
array([[-0.508931,  0.413738],
       [-0.417289, -0.01396 ]])
Coordinates:
  * x        (x) <U1 'a' 'b'
  * labels   (labels) object 'E' 'F'

In [31]: data.groupby(labels).apply(lambda x: x - x.min())
Out[31]: 
<xarray.DataArray (x: 2, y: 3)>
array([[ 0.      ,  0.427698,  1.571185],
       [ 0.822489,  0.      ,  0.931981]])
Coordinates:
  * x        (x) <U1 'a' 'b'
  * y        (y) int64 0 1 2
    labels   (y) <U1 'E' 'F' 'E'

pandas¶

Xarray objects can be easily converted to and from pandas objects:

In [32]: series = data.to_series()

In [33]: series
Out[33]: 
x  y
a  0   -1.294524
   1    0.413738
   2    0.276662
b  0   -0.472035
   1   -0.013960
   2   -0.362543
dtype: float64

# convert back
In [34]: series.to_xarray()
Out[34]: 
<xarray.DataArray (x: 2, y: 3)>
array([[-1.294524,  0.413738,  0.276662],
       [-0.472035, -0.01396 , -0.362543]])
Coordinates:
  * x        (x) object 'a' 'b'
  * y        (y) int64 0 1 2

Datasets¶

xarray.Dataset is a dict-like container of aligned DataArray objects. You can think of it as a multi-dimensional generalization of the pandas.DataFrame:

In [35]: ds = xr.Dataset({'foo': data, 'bar': ('x', [1, 2]), 'baz': np.pi})

In [36]: ds
Out[36]: 
<xarray.Dataset>
Dimensions:  (x: 2, y: 3)
Coordinates:
  * x        (x) <U1 'a' 'b'
Dimensions without coordinates: y
Data variables:
    foo      (x, y) float64 -1.295 0.4137 0.2767 -0.472 -0.01396 -0.3625
    baz      float64 3.142
    bar      (x) int64 1 2

Use dictionary indexing to pull out Dataset variables as DataArray objects:

In [37]: ds['foo']
Out[37]: 
<xarray.DataArray 'foo' (x: 2, y: 3)>
array([[-1.294524,  0.413738,  0.276662],
       [-0.472035, -0.01396 , -0.362543]])
Coordinates:
  * x        (x) <U1 'a' 'b'
Dimensions without coordinates: y

Variables in datasets can have different dtype and even different dimensions, but all dimensions are assumed to refer to points in the same shared coordinate system.

You can do almost everything you can do with DataArray objects with Dataset objects (including indexing and arithmetic) if you prefer to work with multiple variables at once.

NetCDF¶

NetCDF is the recommended binary serialization format for xarray objects. Users from the geosciences will recognize that the Dataset data model looks very similar to a netCDF file (which, in fact, inspired it).

You can directly read and write xarray objects to disk using to_netcdf(), open_dataset() and open_dataarray():

In [38]: ds.to_netcdf('example.nc')

In [39]: xr.open_dataset('example.nc')
Out[39]: 
<xarray.Dataset>
Dimensions:  (x: 2, y: 3)
Coordinates:
  * x        (x) object 'a' 'b'
Dimensions without coordinates: y
Data variables:
    foo      (x, y) float64 -1.295 0.4137 0.2767 -0.472 -0.01396 -0.3625
    baz      float64 3.142
    bar      (x) int64 1 2