Quick overview

Here are some quick examples of what you can do with xray’s DataArray object. Everything is explained in much more detail in the rest of the documentation.

To begin, import numpy, pandas and xray:

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: import xray

Create a DataArray

You can make a DataArray from scratch by supplying data in the form of a numpy array or list, with optional dimensions and coordinates:

In [4]: xray.DataArray(np.random.randn(2, 3))
Out[4]: 
<xray.DataArray (dim_0: 2, dim_1: 3)>
array([[-1.34431181,  0.84488514,  1.07576978],
       [-0.10904998,  1.64356307, -1.46938796]])
Coordinates:
  * dim_0    (dim_0) int64 0 1
  * dim_1    (dim_1) int64 0 1 2

In [5]: xray.DataArray(np.random.randn(2, 3), [('x', ['a', 'b']), ('y', [-2, 0, 2])])
Out[5]: 
<xray.DataArray (x: 2, y: 3)>
array([[ 0.35702056, -0.6746001 , -1.77690372],
       [-0.96891381, -1.29452359,  0.41373811]])
Coordinates:
  * x        (x) |S1 'a' 'b'
  * y        (y) int64 -2 0 2

You can also pass in pandas data structures directly:

In [6]: df = pd.DataFrame(np.random.randn(2, 3), index=['a', 'b'], columns=[-2, 0, 2])

In [7]: df.index.name = 'x'

In [8]: df.columns.name = 'y'

In [9]: foo = xray.DataArray(df, name='foo')

In [10]: foo
Out[10]: 
<xray.DataArray 'foo' (x: 2, y: 3)>
array([[ 0.27666171, -0.47203451, -0.01395975],
       [-0.36254299, -0.00615357, -0.92306065]])
Coordinates:
  * x        (x) object 'a' 'b'
  * y        (y) int64 -2 0 2

Here are the key properties for a DataArray:

# like in pandas, values is a numpy array that you can modify in-place
In [11]: foo.values
Out[11]: 
array([[ 0.27666171, -0.47203451, -0.01395975],
       [-0.36254299, -0.00615357, -0.92306065]])

In [12]: foo.dims
Out[12]: ('x', 'y')

In [13]: foo.coords['y']
Out[13]: 
<xray.DataArray 'y' (y: 3)>
array([-2,  0,  2])
Coordinates:
  * y        (y) int64 -2 0 2

# you can use this dictionary to store arbitrary metadata
In [14]: foo.attrs
Out[14]: OrderedDict()

Indexing

xray supports four kind of indexing. These operations are just as fast as in pandas, because we borrow pandas’ indexing machinery.

# positional and by integer label, like numpy
In [15]: foo[[0, 1]]
Out[15]: 
<xray.DataArray 'foo' (x: 2, y: 3)>
array([[ 0.27666171, -0.47203451, -0.01395975],
       [-0.36254299, -0.00615357, -0.92306065]])
Coordinates:
  * x        (x) object 'a' 'b'
  * y        (y) int64 -2 0 2

# positional and by coordinate label, like pandas
In [16]: foo.loc['a':'b']
Out[16]: 
<xray.DataArray 'foo' (x: 2, y: 3)>
array([[ 0.27666171, -0.47203451, -0.01395975],
       [-0.36254299, -0.00615357, -0.92306065]])
Coordinates:
  * x        (x) object 'a' 'b'
  * y        (y) int64 -2 0 2

# by dimension name and integer label
In [17]: foo.isel(x=slice(2))
Out[17]: 
<xray.DataArray 'foo' (x: 2, y: 3)>
array([[ 0.27666171, -0.47203451, -0.01395975],
       [-0.36254299, -0.00615357, -0.92306065]])
Coordinates:
  * x        (x) object 'a' 'b'
  * y        (y) int64 -2 0 2

# by dimension name and coordinate label
In [18]: foo.sel(x=['a', 'b'])
Out[18]: 
<xray.DataArray 'foo' (x: 2, y: 3)>
array([[ 0.27666171, -0.47203451, -0.01395975],
       [-0.36254299, -0.00615357, -0.92306065]])
Coordinates:
  * x        (x) object 'a' 'b'
  * y        (y) int64 -2 0 2

Computation

Data arrays work very similarly to numpy ndarrays:

In [19]: foo + 10
Out[19]: 
<xray.DataArray 'foo' (x: 2, y: 3)>
array([[ 10.27666171,   9.52796549,   9.98604025],
       [  9.63745701,   9.99384643,   9.07693935]])
Coordinates:
  * y        (y) int64 -2 0 2
  * x        (x) object 'a' 'b'

In [20]: np.sin(foo)
Out[20]: 
<xray.DataArray 'foo' (x: 2, y: 3)>
array([[ 0.27314584, -0.45469925, -0.0139593 ],
       [-0.35465307, -0.00615353, -0.7974521 ]])
Coordinates:
  * y        (y) int64 -2 0 2
  * x        (x) object 'a' 'b'

In [21]: foo.T
Out[21]: 
<xray.DataArray 'foo' (y: 3, x: 2)>
array([[ 0.27666171, -0.36254299],
       [-0.47203451, -0.00615357],
       [-0.01395975, -0.92306065]])
Coordinates:
  * x        (x) object 'a' 'b'
  * y        (y) int64 -2 0 2

In [22]: foo.sum()
Out[22]: 
<xray.DataArray 'foo' ()>
array(-1.501089766692079)

However, aggregation operations can use dimension names instead of axis numbers:

In [23]: foo.mean(dim='x')
Out[23]: 
<xray.DataArray 'foo' (y: 3)>
array([-0.04294064, -0.23909404, -0.4685102 ])
Coordinates:
  * y        (y) int64 -2 0 2

Arithmetic operations broadcast based on dimension name, so you don’t need to insert dummy dimensions for alignment:

In [24]: bar = xray.DataArray(np.random.randn(3), [foo.coords['y']])

In [25]: zzz = xray.DataArray(np.random.randn(4), dims='z')

In [26]: bar
Out[26]: 
<xray.DataArray (y: 3)>
array([ 0.8957173 ,  0.80524403, -1.20641178])
Coordinates:
  * y        (y) int64 -2 0 2

In [27]: zzz
Out[27]: 
<xray.DataArray (z: 4)>
array([ 2.56564595,  1.43125599,  1.34030885, -1.1702988 ])
Coordinates:
  * z        (z) int64 0 1 2 3

In [28]: bar + zzz
Out[28]: 
<xray.DataArray (y: 3, z: 4)>
array([[ 3.46136325,  2.32697329,  2.23602615, -0.27458149],
       [ 3.37088997,  2.23650001,  2.14555288, -0.36505477],
       [ 1.35923416,  0.2248442 ,  0.13389707, -2.37671058]])
Coordinates:
  * y        (y) int64 -2 0 2
  * z        (z) int64 0 1 2 3

GroupBy

xray supports grouped operations using a very similar API to pandas:

In [29]: labels = xray.DataArray(['E', 'F', 'E'], [foo.coords['y']], name='labels')

In [30]: labels
Out[30]: 
<xray.DataArray 'labels' (y: 3)>
array(['E', 'F', 'E'], 
      dtype='|S1')
Coordinates:
  * y        (y) int64 -2 0 2

In [31]: foo.groupby(labels).mean('y')
Out[31]: 
<xray.DataArray 'foo' (x: 2, labels: 2)>
array([[ 0.13135098, -0.47203451],
       [-0.64280182, -0.00615357]])
Coordinates:
  * x        (x) object 'a' 'b'
  * labels   (labels) |S1 'E' 'F'

In [32]: foo.groupby(labels).apply(lambda x: x - x.min())
Out[32]: 
<xray.DataArray 'foo' (x: 2, y: 3)>
array([[ 1.19972237,  0.        ,  0.9091009 ],
       [ 0.56051766,  0.46588094,  0.        ]])
Coordinates:
  * x        (x) object 'a' 'b'
  * y        (y) int64 -2 0 2
    labels   (y) |S1 'E' 'F' 'E'

Convert to pandas

A key feature of xray is robust conversion to and from pandas objects:

In [33]: foo.to_series()
Out[33]: 
x  y 
a  -2    0.276662
    0   -0.472035
    2   -0.013960
b  -2   -0.362543
    0   -0.006154
    2   -0.923061
Name: foo, dtype: float64

In [34]: foo.to_pandas()
Out[34]: 
y        -2         0         2
x                              
a  0.276662 -0.472035 -0.013960
b -0.362543 -0.006154 -0.923061