Terminology#
Xarray terminology differs slightly from CF, mathematical conventions, and
pandas; so we’ve put together a glossary of its terms. Here, arr
refers to an xarray DataArray
in the examples. For more
complete examples, please consult the relevant documentation.
- DataArray#
A multi-dimensional array with labeled or named dimensions.
DataArray
objects add metadata such as dimension names, coordinates, and attributes (defined below) to underlying “unlabeled” data structures such as numpy and Dask arrays. If its optionalname
property is set, it is a named DataArray.- Dataset#
A dict-like collection of
DataArray
objects with aligned dimensions. Thus, most operations that can be performed on the dimensions of a singleDataArray
can be performed on a dataset. Datasets have data variables (see Variable below), dimensions, coordinates, and attributes.- Variable#
A NetCDF-like variable consisting of dimensions, data, and attributes which describe a single array. The main functional difference between variables and numpy arrays is that numerical operations on variables implement array broadcasting by dimension name. Each
DataArray
has an underlying variable that can be accessed viaarr.variable
. However, a variable is not fully described outside of either aDataset
or aDataArray
.Note
The
Variable
class is low-level interface and can typically be ignored. However, the word “variable” appears often enough in the code and documentation that is useful to understand.- Dimension#
In mathematics, the dimension of data is loosely the number of degrees of freedom for it. A dimension axis is a set of all points in which all but one of these degrees of freedom is fixed. We can think of each dimension axis as having a name, for example the “x dimension”. In xarray, a
DataArray
object’s dimensions are its named dimension axes, and the name of thei
-th dimension isarr.dims[i]
. If an array is created without dimension names, the default dimension names aredim_0
,dim_1
, and so forth.- Coordinate#
An array that labels a dimension or set of dimensions of another
DataArray
. In the usual one-dimensional case, the coordinate array’s values can loosely be thought of as tick labels along a dimension. There are two types of coordinate arrays: dimension coordinates and non-dimension coordinates (see below). A coordinate namedx
can be retrieved fromarr.coords[x]
. ADataArray
can have more coordinates than dimensions because a single dimension can be labeled by multiple coordinate arrays. However, only one coordinate array can be a assigned as a particular dimension’s dimension coordinate array. As a consequence,len(arr.dims) <= len(arr.coords)
in general.- Dimension coordinate#
A one-dimensional coordinate array assigned to
arr
with both a name and dimension name inarr.dims
. Dimension coordinates are used for label-based indexing and alignment, like the index found on apandas.DataFrame
orpandas.Series
. In fact, dimension coordinates usepandas.Index
objects under the hood for efficient computation. Dimension coordinates are marked by*
when printing aDataArray
orDataset
.- Non-dimension coordinate#
A coordinate array assigned to
arr
with a name inarr.coords
but not inarr.dims
. These coordinates arrays can be one-dimensional or multidimensional, and they are useful for auxiliary labeling. As an example, multidimensional coordinates are often used in geoscience datasets when the data’s physical coordinates (such as latitude and longitude) differ from their logical coordinates. However, non-dimension coordinates are not indexed, and any operation on non-dimension coordinates that leverages indexing will fail. Printingarr.coords
will print all ofarr
’s coordinate names, with the corresponding dimension(s) in parentheses. For example,coord_name (dim_name) 1 2 3 ...
.- Index#
An index is a data structure optimized for efficient selecting and slicing of an associated array. Xarray creates indexes for dimension coordinates so that operations along dimensions are fast, while non-dimension coordinates are not indexed. Under the hood, indexes are implemented as
pandas.Index
objects. The index associated with dimension namex
can be retrieved byarr.indexes[x]
. By construction,len(arr.dims) == len(arr.indexes)
- name#
The names of dimensions, coordinates, DataArray objects and data variables can be anything as long as they are hashable. However, it is preferred to use
str
typed names.- scalar#
By definition, a scalar is not an array and when converted to one, it has 0 dimensions. That means that, e.g.,
int
,float
, andstr
objects are “scalar” whilelist
ortuple
are not.- duck array#
Duck arrays are array implementations that behave like numpy arrays. They have to define the
shape
,dtype
andndim
properties. For integration withxarray
, the__array__
,__array_ufunc__
and__array_function__
protocols are also required.