xarray.Dataset.to_zarr¶
-
Dataset.
to_zarr
(store=None, chunk_store=None, mode=None, synchronizer=None, group=None, encoding=None, compute=True, consolidated=False, append_dim=None, region=None, safe_chunks=True)[source]¶ Write dataset contents to a zarr group.
Zarr chunks are determined in the following way:
From the
chunks
attribute in each variable’sencoding
If the variable is a Dask array, from the dask chunks
If neither Dask chunks nor encoding chunks are present, chunks will be determined automatically by Zarr
If both Dask chunks and encoding chunks are present, encoding chunks will be used, provided that there is a many-to-one relationship between encoding chunks and dask chunks (i.e. Dask chunks are bigger than and evenly divide encoding chunks); otherwise raise a
ValueError
. This restriction ensures that no synchronization / locks are required when writing. To disable this restriction, usesafe_chunks=False
.
- Parameters
store (
MutableMapping
,str
orPath
, optional) – Store or path to directory in file system.chunk_store (
MutableMapping
,str
orPath
, optional) – Store or path to directory in file system only for Zarr array chunks. Requires zarr-python v2.4.0 or later.mode (
{"w", "w-", "a", None}
, optional) – Persistence mode: “w” means create (overwrite if exists); “w-” means create (fail if exists); “a” means override existing variables (create if does not exist). Ifappend_dim
is set,mode
can be omitted as it is internally set to"a"
. Otherwise,mode
will default to w- if not set.synchronizer (
object
, optional) – Zarr array synchronizer.group (
str
, optional) – Group path. (a.k.a. path in zarr terminology.)encoding (
dict
, optional) – Nested dictionary with variable names as keys and dictionaries of variable specific encodings as values, e.g.,{"my_variable": {"dtype": "int16", "scale_factor": 0.1,}, ...}
compute (
bool
, optional) – If True write array data immediately, otherwise return adask.delayed.Delayed
object that can be computed to write array data later. Metadata is always updated eagerly.consolidated (
bool
, optional) – If True, apply zarr’s consolidate_metadata function to the store after writing metadata.append_dim (hashable, optional) – If set, the dimension along which the data will be appended. All other dimensions on overriden variables must remain the same size.
region (
dict
, optional) – Optional mapping from dimension names to integer slices along dataset dimensions to indicate the region of existing zarr array(s) in which to write this dataset’s data. For example,{'x': slice(0, 1000), 'y': slice(10000, 11000)}
would indicate that values should be written to the region0:1000
alongx
and10000:11000
alongy
.Two restrictions apply to the use of
region
:If
region
is set, _all_ variables in a dataset must have at least one dimension in common with the region. Other variables should be written in a separate call toto_zarr()
.Dimensions cannot be included in both
region
andappend_dim
at the same time. To create empty arrays to fill in withregion
, use a separate call toto_zarr()
withcompute=False
. See “Appending to existing Zarr stores” in the reference documentation for full details.
safe_chunks (
bool
, optional) – If True, only allow writes to when there is a many-to-one relationship between Zarr chunks (specified in encoding) and Dask chunks. Set False to override this restriction; however, data may become corrupted if Zarr arrays are written in parallel. This option may be useful in combination withcompute=False
to initialize a Zarr from an existing Dataset with aribtrary chunk structure.
References
Notes
- Zarr chunking behavior:
If chunks are found in the encoding argument or attribute corresponding to any DataArray, those chunks are used. If a DataArray is a dask array, it is written with those chunks. If not other chunks are found, Zarr uses its own heuristics to choose automatic chunk sizes.
See also
- Zarr
The I/O user guide, with more details and examples.