xarray.Dataset.to_zarr

Dataset.to_zarr(store=None, chunk_store=None, mode=None, synchronizer=None, group=None, encoding=None, compute=True, consolidated=None, append_dim=None, region=None, safe_chunks=True, storage_options=None)[source]

Write dataset contents to a zarr group.

Zarr chunks are determined in the following way:

  • From the chunks attribute in each variable’s encoding

  • If the variable is a Dask array, from the dask chunks

  • If neither Dask chunks nor encoding chunks are present, chunks will be determined automatically by Zarr

  • If both Dask chunks and encoding chunks are present, encoding chunks will be used, provided that there is a many-to-one relationship between encoding chunks and dask chunks (i.e. Dask chunks are bigger than and evenly divide encoding chunks); otherwise raise a ValueError. This restriction ensures that no synchronization / locks are required when writing. To disable this restriction, use safe_chunks=False.

Parameters
  • store (MutableMapping, str or Path, optional) – Store or path to directory in local or remote file system.

  • chunk_store (MutableMapping, str or Path, optional) – Store or path to directory in local or remote file system only for Zarr array chunks. Requires zarr-python v2.4.0 or later.

  • mode ({"w", "w-", "a", "r+", None}, optional) – Persistence mode: “w” means create (overwrite if exists); “w-” means create (fail if exists); “a” means override existing variables (create if does not exist); “r+” means modify existing array values only (raise an error if any metadata or shapes would change). The default mode is “a” if append_dim is set. Otherwise, it is “r+” if region is set and w- otherwise.

  • synchronizer (object, optional) – Zarr array synchronizer.

  • group (str, optional) – Group path. (a.k.a. path in zarr terminology.)

  • encoding (dict, optional) – Nested dictionary with variable names as keys and dictionaries of variable specific encodings as values, e.g., {"my_variable": {"dtype": "int16", "scale_factor": 0.1,}, ...}

  • compute (bool, optional) – If True write array data immediately, otherwise return a dask.delayed.Delayed object that can be computed to write array data later. Metadata is always updated eagerly.

  • consolidated (bool, optional) – If True, apply zarr’s consolidate_metadata function to the store after writing metadata and read existing stores with consolidated metadata; if False, do not. The default (consolidated=None) means write consolidated metadata and attempt to read consolidated metadata for existing stores (falling back to non-consolidated).

  • append_dim (hashable, optional) – If set, the dimension along which the data will be appended. All other dimensions on overriden variables must remain the same size.

  • region (dict, optional) – Optional mapping from dimension names to integer slices along dataset dimensions to indicate the region of existing zarr array(s) in which to write this dataset’s data. For example, {'x': slice(0, 1000), 'y': slice(10000, 11000)} would indicate that values should be written to the region 0:1000 along x and 10000:11000 along y.

    Two restrictions apply to the use of region:

    • If region is set, _all_ variables in a dataset must have at least one dimension in common with the region. Other variables should be written in a separate call to to_zarr().

    • Dimensions cannot be included in both region and append_dim at the same time. To create empty arrays to fill in with region, use a separate call to to_zarr() with compute=False. See “Appending to existing Zarr stores” in the reference documentation for full details.

  • safe_chunks (bool, optional) – If True, only allow writes to when there is a many-to-one relationship between Zarr chunks (specified in encoding) and Dask chunks. Set False to override this restriction; however, data may become corrupted if Zarr arrays are written in parallel. This option may be useful in combination with compute=False to initialize a Zarr from an existing Dataset with aribtrary chunk structure.

  • storage_options (dict, optional) – Any additional parameters for the storage backend (ignored for local paths).

References

https://zarr.readthedocs.io/

Notes

Zarr chunking behavior:

If chunks are found in the encoding argument or attribute corresponding to any DataArray, those chunks are used. If a DataArray is a dask array, it is written with those chunks. If not other chunks are found, Zarr uses its own heuristics to choose automatic chunk sizes.

encoding:

The encoding attribute (if exists) of the DataArray(s) will be used. Override any existing encodings by providing the encoding kwarg.

See also

Zarr

The I/O user guide, with more details and examples.