xarray.Dataset

class xarray.Dataset(data_vars=None, coords=None, attrs=None, compat='broadcast_equals')

A multi-dimensional, in memory, array database.

A dataset resembles an in-memory representation of a NetCDF file, and consists of variables, coordinates and attributes which together form a self describing dataset.

Dataset implements the mapping interface with keys given by variable names and values given by DataArray objects for each variable name.

One dimensional variables with name equal to their dimension are index coordinates used for label based indexing.

__init__(data_vars=None, coords=None, attrs=None, compat='broadcast_equals')

To load data from a file or file-like object, use the open_dataset function.

Parameters:
data_vars : dict-like, optional

A mapping from variable names to DataArray objects, Variable objects or tuples of the form (dims, data[, attrs]) which can be used as arguments to create a new Variable. Each dimension must have the same length in all variables in which it appears.

coords : dict-like, optional

Another mapping in the same form as the variables argument, except the each item is saved on the dataset as a “coordinate”. These variables have an associated meaning: they describe constant/fixed/independent quantities, unlike the varying/measured/dependent quantities that belong in variables. Coordinates values may be given by 1-dimensional arrays or scalars, in which case dims do not need to be supplied: 1D arrays will be assumed to give index values along the dimension with the same name.

attrs : dict-like, optional

Global attributes to save on this dataset.

compat : {‘broadcast_equals’, ‘equals’, ‘identical’}, optional

String indicating how to compare variables of the same name for potential conflicts when initializing this dataset:

  • ‘broadcast_equals’: all values must be equal when variables are broadcast against each other to ensure common dimensions.
  • ‘equals’: all values and dimensions must be the same.
  • ‘identical’: all values, dimensions and attributes must be the same.

Methods

__init__([data_vars, coords, attrs, compat]) To load data from a file or file-like object, use the open_dataset function.
all([dim]) Reduce this Dataset’s data by applying all along some dimension(s).
any([dim]) Reduce this Dataset’s data by applying any along some dimension(s).
apply(func[, keep_attrs, args]) Apply a function over the data variables in this dataset.
argmax([dim, skipna]) Reduce this Dataset’s data by applying argmax along some dimension(s).
argmin([dim, skipna]) Reduce this Dataset’s data by applying argmin along some dimension(s).
argsort([axis, kind, order]) Returns the indices that would sort this array.
assign([variables]) Assign new data variables to a Dataset, returning a new object with all the original variables in addition to the new ones.
assign_attrs(*args, **kwargs) Assign new attrs to this object.
assign_coords(**kwargs) Assign new coordinates to this object.
astype(dtype[, order, casting, subok, copy]) Copy of the array, cast to a specified type.
bfill(dim[, limit]) Fill NaN values by propogating values backward
broadcast_equals(other) Two Datasets are broadcast equal if they are equal after broadcasting all variables against each other.
chunk([chunks, name_prefix, token, lock]) Coerce all arrays in this dataset into dask arrays with the given chunks.
clip([min, max, out]) Return an array whose values are limited to [min, max].
close() Close any files linked to this object
combine_first(other) Combine two Datasets, default to data_vars of self.
compute(**kwargs) Manually trigger loading of this dataset’s data from disk or a remote source into memory and return a new dataset.
conj() Complex-conjugate all elements.
conjugate() Return the complex conjugate, element-wise.
copy([deep, data]) Returns a copy of this dataset.
count([dim]) Reduce this Dataset’s data by applying count along some dimension(s).
cumprod([dim, skipna]) Apply cumprod along some dimension of Dataset.
cumsum([dim, skipna]) Apply cumsum along some dimension of Dataset.
diff(dim[, n, label]) Calculate the n-th order discrete difference along given axis.
differentiate(coord[, edge_order, datetime_unit]) Differentiate with the second order accurate central differences.
drop(labels[, dim]) Drop variables or index labels from this dataset.
dropna(dim[, how, thresh, subset]) Returns a new dataset with dropped labels for missing values along the provided dimension.
dump_to_store(store, **kwargs) Store dataset contents to a backends.*DataStore object.
equals(other) Two Datasets are equal if they have matching variables and coordinates, all of which are equal.
expand_dims(dim[, axis]) Return a new object with an additional axis (or axes) inserted at the corresponding position in the array shape.
ffill(dim[, limit]) Fill NaN values by propogating values forward
fillna(value) Fill missing values in this object.
filter_by_attrs(**kwargs) Returns a Dataset with variables that match specific conditions.
from_dataframe(dataframe) Convert a pandas.DataFrame into an xarray.Dataset
from_dict(d) Convert a dictionary into an xarray.Dataset.
get(k[,d])
get_index(key) Get an index for a dimension, with fall-back to a default RangeIndex
groupby(group[, squeeze]) Returns a GroupBy object for performing grouped operations.
groupby_bins(group, bins[, right, labels, …]) Returns a GroupBy object for performing grouped operations.
identical(other) Like equals, but also checks all dataset attributes and the attributes on all variables and coordinates.
info([buf]) Concise summary of a Dataset variables and attributes.
interp([coords, method, assume_sorted, kwargs]) Multidimensional interpolation of Dataset.
interp_like(other[, method, assume_sorted, …]) Interpolate this object onto the coordinates of another object, filling the out of range values with NaN.
interpolate_na([dim, method, limit, …]) Interpolate values according to different methods.
isel([indexers, drop]) Returns a new dataset with each array indexed along the specified dimension(s).
isel_points([dim]) Returns a new dataset with each array indexed pointwise along the specified dimension(s).
isin(test_elements) Tests each value in the array for whether it is in the supplied list.
isnull(*args, **kwargs)
items()
keys()
load(**kwargs) Manually trigger loading of this dataset’s data from disk or a remote source into memory and return this dataset.
load_store(store[, decoder]) Create a new dataset from the contents of a backends.*DataStore object
max([dim, skipna]) Reduce this Dataset’s data by applying max along some dimension(s).
mean([dim, skipna]) Reduce this Dataset’s data by applying mean along some dimension(s).
median([dim, skipna]) Reduce this Dataset’s data by applying median along some dimension(s).
merge(other[, inplace, overwrite_vars, …]) Merge the arrays of two datasets into a single dataset.
min([dim, skipna]) Reduce this Dataset’s data by applying min along some dimension(s).
notnull(*args, **kwargs)
persist(**kwargs) Trigger computation, keeping data as dask arrays
pipe(func, *args, **kwargs) Apply func(self, *args, **kwargs)
prod([dim, skipna]) Reduce this Dataset’s data by applying prod along some dimension(s).
quantile(q[, dim, interpolation, …]) Compute the qth quantile of the data along the specified dimension.
rank(dim[, pct, keep_attrs]) Ranks the data.
reduce(func[, dim, keep_attrs, …]) Reduce this dataset by applying func along some dimension(s).
reindex([indexers, method, tolerance, copy]) Conform this object onto a new set of indexes, filling in missing values with NaN.
reindex_like(other[, method, tolerance, copy]) Conform this object onto the indexes of another object, filling in missing values with NaN.
rename([name_dict, inplace]) Returns a new object with renamed variables and dimensions.
reorder_levels([dim_order, inplace]) Rearrange index levels using input order.
resample([indexer, skipna, closed, label, …]) Returns a Resample object for performing resampling operations.
reset_coords([names, drop, inplace]) Given names of coordinates, reset them to become variables
reset_index(dims_or_levels[, drop, inplace]) Reset the specified index(es) or multi-index level(s).
roll([shifts, roll_coords]) Roll this dataset by an offset along one or more dimensions.
rolling([dim, min_periods, center]) Rolling window object.
round(*args, **kwargs)
sel([indexers, method, tolerance, drop]) Returns a new dataset with each array indexed by tick labels along the specified dimension(s).
sel_points([dim, method, tolerance]) Returns a new dataset with each array indexed pointwise by tick labels along the specified dimension(s).
set_coords(names[, inplace]) Given names of one or more variables, set them as coordinates
set_index([indexes, append, inplace]) Set Dataset (multi-)indexes using one or more existing coordinates or variables.
shift([shifts]) Shift this dataset by an offset along one or more dimensions.
sortby(variables[, ascending]) Sort object by labels or values (along an axis).
squeeze([dim, drop, axis]) Return a new object with squeezed data.
stack([dimensions]) Stack any number of existing dimensions into a single new dimension.
std([dim, skipna]) Reduce this Dataset’s data by applying std along some dimension(s).
sum([dim, skipna]) Reduce this Dataset’s data by applying sum along some dimension(s).
swap_dims(dims_dict[, inplace]) Returns a new object with swapped dimensions.
to_array([dim, name]) Convert this dataset into an xarray.DataArray
to_dask_dataframe([dim_order, set_index]) Convert this dataset into a dask.dataframe.DataFrame.
to_dataframe() Convert this dataset into a pandas.DataFrame.
to_dict() Convert this dataset to a dictionary following xarray naming conventions.
to_netcdf([path, mode, format, group, …]) Write dataset contents to a netCDF file.
to_zarr([store, mode, synchronizer, group, …]) Write dataset contents to a zarr group.
transpose(*dims) Return a new Dataset object with all array dimensions transposed.
unstack([dim]) Unstack existing dimensions corresponding to MultiIndexes into multiple new dimensions.
update(other[, inplace]) Update this dataset’s variables with those from another dataset.
values()
var([dim, skipna]) Reduce this Dataset’s data by applying var along some dimension(s).
where(cond[, other, drop]) Filter elements from this object according to a condition.

Attributes

attrs Dictionary of global attributes on this dataset
chunks Block dimensions for this dataset’s data or None if it’s not a dask array.
coords Dictionary of xarray.DataArray objects corresponding to coordinate variables
data_vars Dictionary of xarray.DataArray objects corresponding to data variables
dims Mapping from dimension names to lengths.
encoding Dictionary of global encoding attributes on this dataset
imag
indexes OrderedDict of pandas.Index objects used for label based indexing
loc Attribute for location based indexing.
nbytes
real
sizes Mapping from dimension names to lengths.
variables Low level interface to Dataset contents as dict of Variable objects.