Indexing and selecting data

Similarly to pandas objects, xray objects support both integer and label based lookups along each dimension. However, xray objects also have named dimensions, so you can optionally use dimension names instead of relying on the positional ordering of dimensions.

This in total, xray supports four different kinds of indexing, as described below and summarized in this table:

Dimension lookup Index lookup DataArray syntax Dataset syntax
Positional By integer arr[:, 0] not available
Positional By label arr.loc[:, 'IA'] not available
By name By integer arr.isel(space=0) ds.isel(space=0)
By name By label arr.sel(space='IA') ds.sel(space='IA')

Array indexing

Indexing a DataArray directly works (mostly) just like it does for numpy arrays, except that the returned object is always another DataArray:

In [1]: arr = xray.DataArray(np.random.rand(4, 3),
   ...:                      [('time', pd.date_range('2000-01-01', periods=4)),
   ...:                       ('space', ['IA', 'IL', 'IN'])])
   ...: 

In [2]: arr[:2]
Out[2]: 
<xray.DataArray (time: 2, space: 3)>
array([[ 0.12696983,  0.96671784,  0.26047601],
       [ 0.89723652,  0.37674972,  0.33622174]])
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 2000-01-02
  * space    (space) |S2 'IA' 'IL' 'IN'

In [3]: arr[0, 0]
Out[3]: 
<xray.DataArray ()>
array(0.12696983303810094)
Coordinates:
    space    |S2 'IA'
    time     datetime64[ns] 2000-01-01

In [4]: arr[:, [2, 1]]
Out[4]: 
<xray.DataArray (time: 4, space: 2)>
array([[ 0.26047601,  0.96671784],
       [ 0.33622174,  0.37674972],
       [ 0.12310214,  0.84025508],
       [ 0.44799682,  0.37301223]])
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04
  * space    (space) |S2 'IN' 'IL'

xray also supports label-based indexing, just like pandas. Because we use a pandas.Index under the hood, label based indexing is very fast. To do label based indexing, use the loc attribute:

In [5]: arr.loc['2000-01-01':'2000-01-02', 'IA']
Out[5]: 
<xray.DataArray (time: 2)>
array([ 0.12696983,  0.89723652])
Coordinates:
    space    |S2 'IA'
  * time     (time) datetime64[ns] 2000-01-01 2000-01-02

You can perform any of the label indexing operations supported by pandas, including indexing with individual, slices and arrays of labels, as well as indexing with boolean arrays. Like pandas, label based indexing in xray is inclusive of both the start and stop bounds.

Setting values with label based indexing is also supported:

In [6]: arr.loc['2000-01-01', ['IL', 'IN']] = -10

In [7]: arr
Out[7]: 
<xray.DataArray (time: 4, space: 3)>
array([[  0.12696983, -10.        , -10.        ],
       [  0.89723652,   0.37674972,   0.33622174],
       [  0.45137647,   0.84025508,   0.12310214],
       [  0.5430262 ,   0.37301223,   0.44799682]])
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04
  * space    (space) |S2 'IA' 'IL' 'IN'

Indexing with labeled dimensions

With labeled dimensions, we do not have to rely on dimension order and can use them explicitly to slice data with the sel() and isel() methods:

# index by integer array indices
In [8]: arr.isel(space=0, time=slice(None, 2))
Out[8]: 
<xray.DataArray (time: 2)>
array([ 0.12696983,  0.89723652])
Coordinates:
    space    |S2 'IA'
  * time     (time) datetime64[ns] 2000-01-01 2000-01-02

# index by dimension coordinate labels
In [9]: arr.sel(time=slice('2000-01-01', '2000-01-02'))
Out[9]: 
<xray.DataArray (time: 2, space: 3)>
array([[  0.12696983, -10.        , -10.        ],
       [  0.89723652,   0.37674972,   0.33622174]])
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 2000-01-02
  * space    (space) |S2 'IA' 'IL' 'IN'

The arguments to these methods can be any objects that could index the array along the dimension given by the keyword, e.g., labels for an individual value, Python slice() objects or 1-dimensional arrays.

Note

We would love to be able to do indexing with labeled dimension names inside brackets, but Python does yet not support indexing with keyword arguments like arr[space=0]. One alternative we are considering is allowing for indexing with a dictionary, arr[{'space': 0}] (see GH187.

Warning

Do not try to assign values when using isel or sel:

# DO NOT do this
arr.isel(space='0') = 0

Depending on whether the underlying numpy indexing returns a copy or a view, the method will fail, and when it fails, it will fail silently. Until we support indexing with dictionaries (see the note above), you should explicitly construct a tuple to do positional indexing if you want to do assignment with labeled dimensions:

# this is safer
indexer = tuple(0 if d == 'space' else slice(None) for d in arr.dims)
arr[indexer] = 0

Dataset indexing

We can also use these methods to index all variables in a dataset simultaneously, returning a new dataset:

In [10]: ds = arr.to_dataset()

In [11]: ds.isel(space=[0], time=[0])
Out[11]: 
<xray.Dataset>
Dimensions:  (space: 1, time: 1)
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01
  * space    (space) |S2 'IA'
Variables:
    None     (time, space) float64 0.127

In [12]: ds.sel(time='2000-01-01')
Out[12]: 
<xray.Dataset>
Dimensions:  (space: 3)
Coordinates:
    time     datetime64[ns] 2000-01-01
  * space    (space) |S2 'IA' 'IL' 'IN'
Variables:
    None     (space) float64 0.127 -10.0 -10.0

Positional indexing on a dataset is not supported because the ordering of dimensions in a dataset is somewhat ambiguous (it can vary between different arrays).

Indexing details

Like pandas, whether array indexing returns a view or a copy of the underlying data depends entirely on numpy:

  • Indexing with a single label or a slice returns a view.
  • Indexing with a vector of array labels returns a copy.

Attributes are persisted in array indexing:

In [13]: arr2 = arr.copy()

In [14]: arr2.attrs['units'] = 'meters'

In [15]: arr2[0, 0].attrs
Out[15]: OrderedDict([('units', 'meters')])

Indexing with xray objects has one important difference from indexing numpy arrays: you can only use one-dimensional arrays to index xray objects, and each indexer is applied “orthogonally” along independent axes, instead of using numpy’s advanced broadcasting. This means you can do indexing like this, which would require slightly more awkward syntax with numpy arrays:

In [16]: arr[arr['time.day'] > 1, arr['space'] != 'IL']
Out[16]: 
<xray.DataArray (time: 3, space: 2)>
array([[ 0.89723652,  0.33622174],
       [ 0.45137647,  0.12310214],
       [ 0.5430262 ,  0.44799682]])
Coordinates:
  * time     (time) datetime64[ns] 2000-01-02 2000-01-03 2000-01-04
  * space    (space) |S2 'IA' 'IN'

This is a much simpler model than numpy’s advanced indexing, and is basically the only model that works for labeled arrays. If you would like to do array indexing, you can always index .values directly instead:

In [17]: arr.values[arr.values > 0.5]
Out[17]: array([ 0.89723652,  0.84025508,  0.5430262 ])

Align and reindex

xray’s reindex, reindex_like and align impose a DataArray or Dataset onto a new set of coordinates corresponding to dimensions. The original values are subset to the index labels still found in the new labels, and values corresponding to new labels not found in the original object are in-filled with NaN.

To reindex a particular dimension, use reindex():

In [18]: arr.reindex(space=['IA', 'CA'])
Out[18]: 
<xray.DataArray (time: 4, space: 2)>
array([[ 0.12696983,         nan],
       [ 0.89723652,         nan],
       [ 0.45137647,         nan],
       [ 0.5430262 ,         nan]])
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04
  * space    (space) |S2 'IA' 'CA'

The reindex_like() method is a useful shortcut. To demonstrate, we will make a subset DataArray with new values:

In [19]: foo = arr.rename('foo')

In [20]: baz = (10 * arr[:2, :2]).rename('baz')

In [21]: baz
Out[21]: 
<xray.DataArray 'baz' (time: 2, space: 2)>
array([[   1.26969833, -100.        ],
       [   8.97236524,    3.76749716]])
Coordinates:
  * space    (space) |S2 'IA' 'IL'
  * time     (time) datetime64[ns] 2000-01-01 2000-01-02

Reindexing foo with baz selects out the first two values along each dimension:

In [22]: foo.reindex_like(baz)
Out[22]: 
<xray.DataArray 'foo' (time: 2, space: 2)>
array([[  0.12696983, -10.        ],
       [  0.89723652,   0.37674972]])
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 2000-01-02
  * space    (space) object 'IA' 'IL'

The opposite operation asks us to reindex to a larger shape, so we fill in the missing values with NaN:

In [23]: baz.reindex_like(foo)
Out[23]: 
<xray.DataArray 'baz' (time: 4, space: 3)>
array([[   1.26969833, -100.        ,           nan],
       [   8.97236524,    3.76749716,           nan],
       [          nan,           nan,           nan],
       [          nan,           nan,           nan]])
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04
  * space    (space) object 'IA' 'IL' 'IN'

The align() function lets us perform more flexible database-like 'inner', 'outer', 'left' and 'right' joins:

In [24]: xray.align(foo, baz, join='inner')
Out[24]: 
(<xray.DataArray 'foo' (time: 2, space: 2)>
 array([[  0.12696983, -10.        ],
        [  0.89723652,   0.37674972]])
 Coordinates:
   * time     (time) datetime64[ns] 2000-01-01 2000-01-02
   * space    (space) object 'IA' 'IL',
 <xray.DataArray 'baz' (time: 2, space: 2)>
 array([[   1.26969833, -100.        ],
        [   8.97236524,    3.76749716]])
 Coordinates:
   * time     (time) datetime64[ns] 2000-01-01 2000-01-02
   * space    (space) object 'IA' 'IL')

In [25]: xray.align(foo, baz, join='outer')
Out[25]: 
(<xray.DataArray 'foo' (time: 4, space: 3)>
 array([[  0.12696983, -10.        , -10.        ],
        [  0.89723652,   0.37674972,   0.33622174],
        [  0.45137647,   0.84025508,   0.12310214],
        [  0.5430262 ,   0.37301223,   0.44799682]])
 Coordinates:
   * time     (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04
   * space    (space) object 'IA' 'IL' 'IN',
 <xray.DataArray 'baz' (time: 4, space: 3)>
 array([[   1.26969833, -100.        ,           nan],
        [   8.97236524,    3.76749716,           nan],
        [          nan,           nan,           nan],
        [          nan,           nan,           nan]])
 Coordinates:
   * time     (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04
   * space    (space) object 'IA' 'IL' 'IN')

Both reindex_like and align work interchangeably between DataArray and Dataset objects, and with any number of matching dimension names:

In [26]: ds
Out[26]: 
<xray.Dataset>
Dimensions:  (space: 3, time: 4)
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04
  * space    (space) |S2 'IA' 'IL' 'IN'
Variables:
    None     (time, space) float64 0.127 -10.0 -10.0 0.8972 0.3767 0.3362 0.4514 0.8403 ...

In [27]: ds.reindex_like(baz)
Out[27]: 
<xray.Dataset>
Dimensions:  (space: 2, time: 2)
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 2000-01-02
  * space    (space) object 'IA' 'IL'
Variables:
    None     (time, space) float64 0.127 -10.0 0.8972 0.3767

In [28]: other = xray.DataArray(['a', 'b', 'c'], dims='other')

# this is a no-op, because there are no shared dimension names
In [29]: ds.reindex_like(other)
Out[29]: 
<xray.Dataset>
Dimensions:  (space: 3, time: 4)
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04
  * space    (space) |S2 'IA' 'IL' 'IN'
Variables:
    None     (time, space) float64 0.127 -10.0 -10.0 0.8972 0.3767 0.3362 0.4514 0.8403 ...