xarray.open_mfdataset

xarray.open_mfdataset(paths, chunks=None, concat_dim='_not_supplied', compat='no_conflicts', preprocess=None, engine=None, lock=None, data_vars='all', coords='different', combine='_old_auto', autoclose=None, parallel=False, **kwargs)

Open multiple files as a single dataset.

If combine=’by_coords’ then the function combine_by_coords is used to combine the datasets into one before returning the result, and if combine=’nested’ then combine_nested is used. The filepaths must be structured according to which combining function is used, the details of which are given in the documentation for combine_by_coords and combine_nested. By default the old (now deprecated) auto_combine will be used, please specify either combine='by_coords' or combine='nested' in future. Requires dask to be installed. See documentation for details on dask [1]. Attributes from the first dataset file are used for the combined dataset.

Parameters
pathsstr or sequence

Either a string glob in the form “path/to/my/files/*.nc” or an explicit list of files to open. Paths can be given as strings or as pathlib Paths. If concatenation along more than one dimension is desired, then paths must be a nested list-of-lists (see manual_combine for details). (A string glob will be expanded to a 1-dimensional list.)

chunksint or dict, optional

Dictionary with keys given by dimension names and values given by chunk sizes. In general, these should divide the dimensions of each dataset. If int, chunk each dimension by chunks. By default, chunks will be chosen to load entire input files into memory at once. This has a major impact on performance: please see the full documentation for more details [2].

concat_dimstr, or list of str, DataArray, Index or None, optional

Dimensions to concatenate files along. You only need to provide this argument if any of the dimensions along which you want to concatenate is not a dimension in the original datasets, e.g., if you want to stack a collection of 2D arrays along a third dimension. Set concat_dim=[..., None, ...] explicitly to disable concatenation along a particular dimension.

combine{‘by_coords’, ‘nested’}, optional

Whether xarray.combine_by_coords or xarray.combine_nested is used to combine all the data. If this argument is not provided, xarray.auto_combine is used, but in the future this behavior will switch to use xarray.combine_by_coords by default.

compat{‘identical’, ‘equals’, ‘broadcast_equals’,

‘no_conflicts’}, optional

String indicating how to compare variables of the same name for potential conflicts when merging:

  • ‘broadcast_equals’: all values must be equal when variables are broadcast against each other to ensure common dimensions.

  • ‘equals’: all values and dimensions must be the same.

  • ‘identical’: all values, dimensions and attributes must be the same.

  • ‘no_conflicts’: only values which are not null in both datasets must be equal. The returned dataset then contains the combination of all non-null values.

preprocesscallable, optional

If provided, call this function on each dataset prior to concatenation. You can find the file-name from which each dataset was loaded in ds.encoding['source'].

engine{‘netcdf4’, ‘scipy’, ‘pydap’, ‘h5netcdf’, ‘pynio’, ‘cfgrib’}, optional

Engine to use when reading files. If not provided, the default engine is chosen based on available dependencies, with a preference for ‘netcdf4’.

lockFalse or duck threading.Lock, optional

Resource lock to use when reading data from disk. Only relevant when using dask or another form of parallelism. By default, appropriate locks are chosen to safely read and write files with the currently active dask scheduler.

data_vars{‘minimal’, ‘different’, ‘all’ or list of str}, optional
These data variables will be concatenated together:
  • ‘minimal’: Only data variables in which the dimension already appears are included.

  • ‘different’: Data variables which are not equal (ignoring attributes) across all datasets are also concatenated (as well as all for which dimension already appears). Beware: this option may load the data payload of data variables into memory if they are not already loaded.

  • ‘all’: All data variables will be concatenated.

  • list of str: The listed data variables will be concatenated, in addition to the ‘minimal’ data variables.

coords{‘minimal’, ‘different’, ‘all’ or list of str}, optional
These coordinate variables will be concatenated together:
  • ‘minimal’: Only coordinates in which the dimension already appears are included.

  • ‘different’: Coordinates which are not equal (ignoring attributes) across all datasets are also concatenated (as well as all for which dimension already appears). Beware: this option may load the data payload of coordinate variables into memory if they are not already loaded.

  • ‘all’: All coordinate variables will be concatenated, except those corresponding to other dimensions.

  • list of str: The listed coordinate variables will be concatenated, in addition the ‘minimal’ coordinates.

parallelbool, optional

If True, the open and preprocess steps of this function will be performed in parallel using dask.delayed. Default is False.

**kwargsoptional

Additional arguments passed on to xarray.open_dataset().

Returns
xarray.Dataset

Notes

open_mfdataset opens files with read-only access. When you modify values of a Dataset, even one linked to files on disk, only the in-memory copy you are manipulating in xarray is modified: the original file on disk is never touched.

References

1

http://xarray.pydata.org/en/stable/dask.html

2

http://xarray.pydata.org/en/stable/dask.html#chunking-and-performance