What’s New¶
v0.11.1 (29 December 2018)¶
This minor release includes a number of enhancements and bug fixes, and two (slightly) breaking changes.
Warning
This is the last xarray release that will support Python 2.7. Future releases will be Python 3 only, but older versions of xarray will always be available for Python 2.7 users. For the more details, see:
Breaking changes¶
- Minimum rasterio version increased from 0.36 to 1.0 (for
open_rasterio
) - Time bounds variables are now also decoded according to CF conventions (GH2565). The previous behavior was to decode them only if they had specific time attributes, now these attributes are copied automatically from the corresponding time coordinate. This might brake downstream code that was relying on these variables to be not decoded. By Fabien Maussion.
Enhancements¶
- Ability to read and write consolidated metadata in zarr stores (GH2558). By Ryan Abernathey.
CFTimeIndex
uses slicing for string indexing when possible (likepandas.DatetimeIndex
), which avoids unnecessary copies. By Stephan Hoyer- Enable passing
rasterio.io.DatasetReader
orrasterio.vrt.WarpedVRT
toopen_rasterio
instead of file path string. Allows for in-memory reprojection, see (GH2588). By Scott Henderson. - Like
pandas.DatetimeIndex
,CFTimeIndex
now supports “dayofyear” and “dayofweek” accessors (GH2597). Note this requires a version of cftime greater than 1.0.2. By Spencer Clark. - The option
'warn_for_unclosed_files'
(False by default) has been added to allow users to enable a warning when files opened by xarray are deallocated but were not explicitly closed. This is mostly useful for debugging; we recommend enabling it in your test suites if you use xarray for IO. By Stephan Hoyer - Support Dask
HighLevelGraphs
by Matthew Rocklin. DataArray.resample()
andDataset.resample()
now supports theloffset
kwarg just like Pandas. By Deepak Cherian- Datasets are now guaranteed to have a
'source'
encoding, so the source file name is always stored (GH2550). By Tom Nicholas. - The
apply
methods forDatasetGroupBy
,DataArrayGroupBy
,DatasetResample
andDataArrayResample
now support passing positional arguments to the applied function as a tuple to theargs
argument. By Matti Eskelinen. - 0d slices of ndarrays are now obtained directly through indexing, rather than extracting and wrapping a scalar, avoiding unnecessary copying. By Daniel Wennberg.
- Added support for
fill_value
withshift()
andshift()
By Maximilian Roos
Bug fixes¶
- Ensure files are automatically closed, if possible, when no longer referenced by a Python variable (GH2560). By Stephan Hoyer
- Fixed possible race conditions when reading/writing to disk in parallel (GH2595). By Stephan Hoyer
- Fix h5netcdf saving scalars with filters or chunks (GH2563). By Martin Raspaud.
- Fix parsing of
_Unsigned
attribute set by OPENDAP servers. (GH2583). By Deepak Cherian - Fix failure in time encoding when exporting to netCDF with versions of pandas less than 0.21.1 (GH2623). By Spencer Clark.
- Fix MultiIndex selection to update label and level (GH2619). By Keisuke Fujii.
v0.11.0 (7 November 2018)¶
Breaking changes¶
Finished deprecations (changed behavior with this release):
Dataset.T
has been removed as a shortcut forDataset.transpose()
. CallDataset.transpose()
directly instead.- Iterating over a
Dataset
now includes only data variables, not coordinates. Similarily, callinglen
andbool
on aDataset
now includes only data variables. DataArray.__contains__
(used by Python’sin
operator) now checks array data, not coordinates.- The old resample syntax from before xarray 0.10, e.g.,
data.resample('1D', dim='time', how='mean')
, is no longer supported will raise an error in most cases. You need to use the new resample syntax instead, e.g.,data.resample(time='1D').mean()
ordata.resample({'time': '1D'}).mean()
.
New deprecations (behavior will be changed in xarray 0.12):
- Reduction of
DataArray.groupby()
andDataArray.resample()
without dimension argument will change in the next release. Now we warn a FutureWarning. By Keisuke Fujii. - The
inplace
kwarg of a number of DataArray and Dataset methods is being deprecated and will be removed in the next release. By Deepak Cherian.
- Reduction of
Refactored storage backends:
Xarray’s storage backends now automatically open and close files when necessary, rather than requiring opening a file with
autoclose=True
. A global least-recently-used cache is used to store open files; the default limit of 128 open files should suffice in most cases, but can be adjusted if necessary withxarray.set_options(file_cache_maxsize=...)
. Theautoclose
argument toopen_dataset
and related functions has been deprecated and is now a no-op.This change, along with an internal refactor of xarray’s storage backends, should significantly improve performance when reading and writing netCDF files with Dask, especially when working with many files or using Dask Distributed. By Stephan Hoyer
Support for non-standard calendars used in climate science:
- Xarray will now always use
cftime.datetime
objects, rather than by default trying to coerce them intonp.datetime64[ns]
objects. ACFTimeIndex
will be used for indexing along time coordinates in these cases. - A new method
to_datetimeindex()
has been added to aid in converting from aCFTimeIndex
to apandas.DatetimeIndex
for the remaining use-cases where using aCFTimeIndex
is still a limitation (e.g. for resample or plotting). - Setting the
enable_cftimeindex
option is now a no-op and emits aFutureWarning
.
- Xarray will now always use
Enhancements¶
xarray.DataArray.plot.line()
can now accept multidimensional coordinate variables as input. hue must be a dimension name in this case. (GH2407) By Deepak Cherian.- Added support for Python 3.7. (GH2271). By Joe Hamman.
- Added support for plotting data with pandas.Interval coordinates, such as those
created by
groupby_bins()
By Maximilian Maahn. - Added
shift()
for shifting the values of a CFTimeIndex by a specified frequency. (GH2244). By Spencer Clark. - Added support for using
cftime.datetime
coordinates withdifferentiate()
,differentiate()
,interp()
, andinterp()
. By Spencer Clark - There is now a global option to either always keep or always discard
dataset and dataarray attrs upon operations. The option is set with
xarray.set_options(keep_attrs=True)
, and the default is to use the old behaviour. By Tom Nicholas. - Added a new backend for the GRIB file format based on ECMWF cfgrib python driver and ecCodes C-library. (GH2475) By Alessandro Amici, sponsored by ECMWF.
- Resample now supports a dictionary mapping from dimension to frequency as
its first argument, e.g.,
data.resample({'time': '1D'}).mean()
. This is consistent with other xarray functions that accept either dictionaries or keyword arguments. By Stephan Hoyer. - The preferred way to access tutorial data is now to load it lazily with
xarray.tutorial.open_dataset()
.xarray.tutorial.load_dataset()
calls Dataset.load() prior to returning (and is now deprecated). This was changed in order to facilitate using tutorial datasets with dask. By Joe Hamman. DataArray
can now usexr.set_option(keep_attrs=True)
and retain attributes in binary operations, such as (+, -, * ,/
). Default behaviour is unchanged (Attributes will be dismissed). By Michael Blaschek
Bug fixes¶
FacetGrid
now properly uses thecbar_kwargs
keyword argument. (GH1504, GH1717) By Deepak Cherian.- Addition and subtraction operators used with a CFTimeIndex now preserve the index’s type. (GH2244). By Spencer Clark.
- We now properly handle arrays of
datetime.datetime
anddatetime.timedelta
provided as coordinates. (GH2512) By Deepak Cherian. xarray.DataArray.roll
correctly handles multidimensional arrays. (GH2445) By Keisuke Fujii.xarray.plot()
now properly accepts anorm
argument and does not override the norm’svmin
andvmax
. (GH2381) By Deepak Cherian.xarray.DataArray.std()
now correctly acceptsddof
keyword argument. (GH2240) By Keisuke Fujii.- Restore matplotlib’s default of plotting dashed negative contours when
a single color is passed to
DataArray.contour()
e.g.colors='k'
. By Deepak Cherian. - Fix a bug that caused some indexing operations on arrays opened with
open_rasterio
to error (GH2454). By Stephan Hoyer. - Subtracting one CFTimeIndex from another now returns a
pandas.TimedeltaIndex
, analogous to the behavior for DatetimeIndexes (GH2484). By Spencer Clark. - Adding a TimedeltaIndex to, or subtracting a TimedeltaIndex from a CFTimeIndex is now allowed (GH2484). By Spencer Clark.
- Avoid use of Dask’s deprecated
get=
parameter in tests by Matthew Rocklin. - An
OverflowError
is now accurately raised and caught during the encoding process if a reference date is used that is so distant that the dates must be encoded using cftime rather than NumPy (GH2272). By Spencer Clark. - Chunked datasets can now roundtrip to Zarr storage continually
with to_zarr and
open_zarr
(GH2300). By Lily Wang.
v0.10.9 (21 September 2018)¶
This minor release contains a number of backwards compatible enhancements.
Announcements of note:
- Xarray is now a NumFOCUS fiscally sponsored project! Read the anouncement for more details.
- We have a new Development roadmap that outlines our future development plans.
- Dataset.apply now properly documents the way func is called. By Matti Eskelinen.
Enhancements¶
differentiate()
anddifferentiate()
are newly added. (GH1332) By Keisuke Fujii.- Default colormap for sequential and divergent data can now be set via
set_options()
(GH2394) By Julius Busecke. - min_count option is newly supported in
sum()
,prod()
andsum()
, andprod()
. (GH2230) By Keisuke Fujii. plot()
now accepts the kwargsxscale, yscale, xlim, ylim, xticks, yticks
just like Pandas. Alsoxincrease=False, yincrease=False
now use matplotlib’s axis inverting methods instead of setting limits. By Deepak Cherian. (GH2224)- DataArray coordinates and Dataset coordinates and data variables are now displayed as a b … y z rather than a b c d …. (GH1186) By Seth P.
- A new CFTimeIndex-enabled
cftime_range()
function for use in generating dates from standard or non-standard calendars. By Spencer Clark. - When interpolating over a
datetime64
axis, you can now provide a datetime string instead of adatetime64
object. E.g.da.interp(time='1991-02-01')
(GH2284) By Deepak Cherian. - A clear error message is now displayed if a
set
ordict
is passed in place of an array (GH2331) By Maximilian Roos. - Applying
unstack
to a large DataArray or Dataset is now much faster if the MultiIndex has not been modified after stacking the indices. (GH1560) By Maximilian Maahn. - You can now control whether or not to offset the coordinates when using
the
roll
method and the current behavior, coordinates rolled by default, raises a deprecation warning unless explicitly setting the keyword argument. (GH1875) By Andrew Huang. - You can now call
unstack
without arguments to unstack every MultiIndex in a DataArray or Dataset. By Julia Signell. - Added the ability to pass a data kwarg to
copy
to create a new object with the same metadata as the original object but using new values. By Julia Signell.
Bug fixes¶
xarray.plot.imshow()
correctly uses theorigin
argument. (GH2379) By Deepak Cherian.- Fixed
DataArray.to_iris()
failure while creatingDimCoord
by falling back to creatingAuxCoord
. Fixed dependency onvar_name
attribute being set. (GH2201) By Thomas Voigt. - Fixed a bug in
zarr
backend which prevented use with datasets with invalid chunk size encoding after reading from an existing store (GH2278). By Joe Hamman. - Tests can be run in parallel with pytest-xdist By Tony Tung.
- Follow up the renamings in dask; from dask.ghost to dask.overlap By Keisuke Fujii.
- Now raises a ValueError when there is a conflict between dimension names and level names of MultiIndex. (GH2299) By Keisuke Fujii.
- Follow up the renamings in dask; from dask.ghost to dask.overlap By Keisuke Fujii.
- Now
xr.apply_ufunc()
raises a ValueError when the size ofinput_core_dims
is inconsistent with the number of arguments. (GH2341) By Keisuke Fujii. - Fixed
Dataset.filter_by_attrs()
behavior not matchingnetCDF4.Dataset.get_variables_by_attributes()
. When more than onekey=value
is passed intoDataset.filter_by_attrs()
it will now return a Dataset with variables which pass all the filters. (GH2315) By Andrew Barna.
v0.10.8 (18 July 2018)¶
Breaking changes¶
Xarray no longer supports python 3.4. Additionally, the minimum supported versions of the following dependencies has been updated and/or clarified:
- Pandas: 0.18 -> 0.19
- NumPy: 1.11 -> 1.12
- Dask: 0.9 -> 0.16
- Matplotlib: unspecified -> 1.5
(GH2204). By Joe Hamman.
Enhancements¶
interp_like()
andinterp_like()
methods are newly added. (GH2218) By Keisuke Fujii.- Added support for curvilinear and unstructured generic grids
to
to_cdms2()
andfrom_cdms2()
(GH2262). By Stephane Raynaud.
Bug fixes¶
- Fixed a bug in
zarr
backend which prevented use with datasets with incomplete chunks in multiple dimensions (GH2225). By Joe Hamman. - Fixed a bug in
to_netcdf()
which prevented writing datasets when the arrays had different chunk sizes (GH2254). By Mike Neish. - Fixed masking during the conversion to cdms2 objects by
to_cdms2()
(GH2262). By Stephane Raynaud. - Fixed a bug in 2D plots which incorrectly raised an error when 2D coordinates weren’t monotonic (GH2250). By Fabien Maussion.
- Fixed warning raised in
to_netcdf()
due to deprecation of effective_get in dask (GH2238). By Joe Hamman.
v0.10.7 (7 June 2018)¶
Enhancements¶
- Plot labels now make use of metadata that follow CF conventions (GH2135). By Deepak Cherian and Ryan Abernathey.
- Line plots now support facetting with
row
andcol
arguments (GH2107). By Yohai Bar Sinai. interp()
andinterp()
methods are newly added. See interpolating values with interp for the detail. (GH2079) By Keisuke Fujii.
Bug fixes¶
- Fixed a bug in
rasterio
backend which prevented use withdistributed
. Therasterio
backend now returns pickleable objects (GH2021). By Joe Hamman.
v0.10.6 (31 May 2018)¶
The minor release includes a number of bug-fixes and backwards compatible enhancements.
Enhancements¶
- New PseudoNetCDF backend for many Atmospheric data formats including GEOS-Chem, CAMx, NOAA arlpacked bit and many others. See Formats supported by PseudoNetCDF for more details. By Barron Henderson.
- The
Dataset
constructor now alignsDataArray
arguments indata_vars
to indexes set explicitly incoords
, where previously an error would be raised. (GH674) By Maximilian Roos. sel()
,isel()
&reindex()
, (and theirDataset
counterparts) now support supplying adict
as a first argument, as an alternative to the existing approach of supplying kwargs. This allows for more robust behavior of dimension names which conflict with other keyword names, or are not strings. By Maximilian Roos.rename()
now supports supplying**kwargs
, as an alternative to the existing approach of supplying adict
as the first argument. By Maximilian Roos.cumsum()
andcumprod()
now support aggregation over multiple dimensions at the same time. This is the default behavior when dimensions are not specified (previously this raised an error). By Stephan HoyerDataArray.dot()
anddot()
are partly supported with older dask<0.17.4. (related to GH2203) By Keisuke Fujii.- Xarray now uses Versioneer to manage its version strings. (GH1300). By Joe Hamman.
Bug fixes¶
- Fixed a regression in 0.10.4, where explicitly specifying
dtype='S1'
ordtype=str
inencoding
withto_netcdf()
raised an error (GH2149). Stephan Hoyer apply_ufunc()
now directly validates output variables (GH1931). By Stephan Hoyer.- Fixed a bug where
to_netcdf(..., unlimited_dims='bar')
yielded NetCDF files with spurious 0-length dimensions (i.e.b
,a
, andr
) (GH2134). By Joe Hamman. - Removed spurious warnings with
Dataset.update(Dataset)
(GH2161) andarray.equals(array)
whenarray
containsNaT
(GH2162). By Stephan Hoyer. - Aggregations with
Dataset.reduce()
(includingmean
,sum
, etc) no longer drop unrelated coordinates (GH1470). Also fixed a bug where non-scalar data-variables that did not include the aggregation dimension were improperly skipped. By Stephan Hoyer - Fix
stack()
with non-unique coordinates on pandas 0.23 (GH2160). By Stephan Hoyer - Selecting data indexed by a length-1
CFTimeIndex
with a slice of strings now behaves as it does when using a length-1DatetimeIndex
(i.e. it no longer falsely returns an empty array when the slice includes the value in the index) (GH2165). By Spencer Clark. - Fix
DataArray.groupby().reduce()
mutating coordinates on the input array when grouping over dimension coordinates with duplicated entries (GH2153). By Stephan Hoyer - Fix
Dataset.to_netcdf()
cannot create group withengine="h5netcdf"
(GH2177). By Stephan Hoyer
v0.10.4 (16 May 2018)¶
The minor release includes a number of bug-fixes and backwards compatible
enhancements. A highlight is CFTimeIndex
, which offers support for
non-standard calendars used in climate modeling.
Documentation¶
- New FAQ entry, faq.other_projects. By Deepak Cherian.
- Assigning values with indexing now includes examples on how to select and assign
values to a
DataArray
with.loc
. By Chiara Lepore.
Enhancements¶
- Add an option for using a
CFTimeIndex
for indexing times with non-standard calendars and/or outside the Timestamp-valid range; this index enables a subset of the functionality of a standardpandas.DatetimeIndex
. See Non-standard calendars and dates outside the Timestamp-valid range for full details. (GH789, GH1084, GH1252) By Spencer Clark with help from Stephan Hoyer. - Allow for serialization of
cftime.datetime
objects (GH789, GH1084, GH2008, GH1252) using the standalonecftime
library. By Spencer Clark. - Support writing lists of strings as netCDF attributes (GH2044). By Dan Nowacki.
to_netcdf()
withengine='h5netcdf'
now accepts h5py encoding settingscompression
andcompression_opts
, along with the NetCDF4-Python style settingsgzip=True
andcomplevel
. This allows using any compression plugin installed in hdf5, e.g. LZF (GH1536). By Guido Imperiale.dot()
on dask-backed data will now calldask.array.einsum()
. This greatly boosts speed and allows chunking on the core dims. The function now requires dask >= 0.17.3 to work on dask-backed data (GH2074). By Guido Imperiale.plot.line()
learned new kwargs:xincrease
,yincrease
that change the direction of the respective axes. By Deepak Cherian.- Added the
parallel
option toopen_mfdataset()
. This option usesdask.delayed
to parallelize the open and preprocessing steps withinopen_mfdataset
. This is expected to provide performance improvements when opening many files, particularly when used in conjunction with dask’s multiprocessing or distributed schedulers (GH1981). By Joe Hamman. - New
compute
option into_netcdf()
,to_zarr()
, andsave_mfdataset()
to allow for the lazy computation of netCDF and zarr stores. This feature is currently only supported by the netCDF4 and zarr backends. (GH1784). By Joe Hamman.
Bug fixes¶
ValueError
is raised when coordinates with the wrong size are assigned to aDataArray
. (GH2112) By Keisuke Fujii.- Fixed a bug in
rolling()
with bottleneck. Also, fixed a bug in rolling an integer dask array. (GH2113) By Keisuke Fujii. - Fixed a bug where keep_attrs=True flag was neglected if
apply_ufunc()
was used withVariable
. (GH2114) By Keisuke Fujii. - When assigning a
DataArray
toDataset
, any conflicted non-dimensional coordinates of the DataArray are now dropped. (GH2068) By Keisuke Fujii. - Better error handling in
open_mfdataset
(GH2077). By Stephan Hoyer. plot.line()
does not callautofmt_xdate()
anymore. Instead it changes the rotation and horizontal alignment of labels without removing the x-axes of any other subplots in the figure (if any). By Deepak Cherian.- Colorbar limits are now determined by excluding ±Infs too. By Deepak Cherian. By Joe Hamman.
- Fixed
to_iris
to maintain lazy dask array after conversion (GH2046). By Alex Hilson and Stephan Hoyer.
v0.10.3 (13 April 2018)¶
The minor release includes a number of bug-fixes and backwards compatible enhancements.
Enhancements¶
isin()
andisin()
methods, which test each value in the array for whether it is contained in the supplied list, returning a bool array. See Selecting values with isin for full details. Similar to thenp.isin
function. By Maximilian Roos.- Some speed improvement to construct
DataArrayRolling
object (GH1993) By Keisuke Fujii. - Handle variables with different values for
missing_value
and_FillValue
by masking values for both attributes; previously this resulted in aValueError
. (GH2016) By Ryan May.
Bug fixes¶
- Fixed
decode_cf
function to operate lazily on dask arrays (GH1372). By Ryan Abernathey. - Fixed labeled indexing with slice bounds given by xarray objects with datetime64 or timedelta64 dtypes (GH1240). By Stephan Hoyer.
- Attempting to convert an xarray.Dataset into a numpy array now raises an informative error message. By Stephan Hoyer.
- Fixed a bug in decode_cf_datetime where
int32
arrays weren’t parsed correctly (GH2002). By Fabien Maussion. - When calling xr.auto_combine() or xr.open_mfdataset() with a concat_dim, the resulting dataset will have that one-element dimension (it was silently dropped, previously) (GH1988). By Ben Root.
v0.10.2 (13 March 2018)¶
The minor release includes a number of bug-fixes and enhancements, along with one possibly backwards incompatible change.
Backwards incompatible changes¶
- The addition of
__array_ufunc__
for xarray objects (see below) means that NumPy ufunc methods (e.g.,np.add.reduce
) that previously worked onxarray.DataArray
objects by converting them into NumPy arrays will now raiseNotImplementedError
instead. In all cases, the work-around is simple: convert your objects explicitly into NumPy arrays before calling the ufunc (e.g., with.values
).
Enhancements¶
Added
dot()
, equivalent tonp.einsum()
. Also,dot()
now supportsdims
option, which specifies the dimensions to sum over. (GH1951) By Keisuke Fujii.Support for writing xarray datasets to netCDF files (netcdf4 backend only) when using the dask.distributed scheduler (GH1464). By Joe Hamman.
Support lazy vectorized-indexing. After this change, flexible indexing such as orthogonal/vectorized indexing, becomes possible for all the backend arrays. Also, lazy
transpose
is now also supported. (GH1897) By Keisuke Fujii.Implemented NumPy’s
__array_ufunc__
protocol for all xarray objects (GH1617). This enables using NumPy ufuncs directly onxarray.Dataset
objects with recent versions of NumPy (v1.13 and newer):In [1]: ds = xr.Dataset({'a': 1}) In [2]: np.sin(ds) Out[2]: <xarray.Dataset> Dimensions: () Data variables: a float64 0.8415
This obliviates the need for the
xarray.ufuncs
module, which will be deprecated in the future when xarray drops support for older versions of NumPy. By Stephan Hoyer.Improve
rolling()
logic.DataArrayRolling()
object now supportsconstruct()
method that returns a view of the DataArray / Dataset object with the rolling-window dimension added to the last axis. This enables more flexible operation, such as strided rolling, windowed rolling, ND-rolling, short-time FFT and convolution. (GH1831, GH1142, GH819) By Keisuke Fujii.line()
learned to make plots with data on x-axis if so specified. (GH575) By Deepak Cherian.
Bug fixes¶
- Raise an informative error message when using
apply_ufunc
with numpy v1.11 (GH1956). By Stephan Hoyer. - Fix the precision drop after indexing datetime64 arrays (GH1932). By Keisuke Fujii.
- Silenced irrelevant warnings issued by
open_rasterio
(GH1964). By Stephan Hoyer. - Fix kwarg colors clashing with auto-inferred cmap (GH1461) By Deepak Cherian.
- Fix
imshow()
error when passed an RGB array with size one in a spatial dimension. By Zac Hatfield-Dodds.
v0.10.1 (25 February 2018)¶
The minor release includes a number of bug-fixes and backwards compatible enhancements.
Documentation¶
- Added a new guide on Contributing to xarray (GH640) By Joe Hamman.
- Added apply_ufunc example to Toy weather data (GH1844). By Liam Brannigan.
- New entry Why don’t aggregations return Python scalars? in the Frequently Asked Questions (GH1726). By 0x0L.
Enhancements¶
New functions and methods:
- Added
DataArray.to_iris()
andDataArray.from_iris()
for converting data arrays to and from Iris Cubes with the same data and coordinates (GH621 and GH37). By Neil Parley and Duncan Watson-Parris. - Experimental support for using Zarr as storage layer for xarray (GH1223). By Ryan Abernathey and Joe Hamman.
- New
rank()
on arrays and datasets. Requires bottleneck (GH1731). By 0x0L. .dt
accessor can now ceil, floor and round timestamps to specified frequency. By Deepak Cherian.
Plotting enhancements:
xarray.plot.imshow()
now handles RGB and RGBA images. Saturation can be adjusted withvmin
andvmax
, or withrobust=True
. By Zac Hatfield-Dodds.contourf()
learned to contour 2D variables that have both a 1D coordinate (e.g. time) and a 2D coordinate (e.g. depth as a function of time) (GH1737). By Deepak Cherian.plot()
rotates x-axis ticks if x-axis is time. By Deepak Cherian.line()
can draw multiple lines if provided with a 2D variable. By Deepak Cherian.
Other enhancements:
Reduce methods such as
DataArray.sum()
now handles object-type array.In [3]: da = xr.DataArray(np.array([True, False, np.nan], dtype=object), dims='x') In [4]: da.sum() Out[4]: <xarray.DataArray ()> array(1)
(GH1866) By Keisuke Fujii.
Reduce methods such as
DataArray.sum()
now acceptsdtype
arguments. (GH1838) By Keisuke Fujii.Added nodatavals attribute to DataArray when using
open_rasterio()
. (GH1736). By Alan Snow.Use
pandas.Grouper
class in xarray resample methods rather than the deprecatedpandas.TimeGrouper
class (GH1766). By Joe Hamman.Experimental support for parsing ENVI metadata to coordinates and attributes in
xarray.open_rasterio()
. By Matti Eskelinen.Reduce memory usage when decoding a variable with a scale_factor, by converting 8-bit and 16-bit integers to float32 instead of float64 (PR1840), and keeping float16 and float32 as float32 (GH1842). Correspondingly, encoded variables may also be saved with a smaller dtype. By Zac Hatfield-Dodds.
Speed of reindexing/alignment with dask array is orders of magnitude faster when inserting missing values (GH1847). By Stephan Hoyer.
Fix
axis
keyword ignored when applyingnp.squeeze
toDataArray
(GH1487). By Florian Pinault.netcdf4-python
has moved the its time handling in thenetcdftime
module to a standalone package (netcdftime). As such, xarray now considers netcdftime an optional dependency. One benefit of this change is that it allows for encoding/decoding of datetimes with non-standard calendars without thenetcdf4-python
dependency (GH1084). By Joe Hamman.
New functions/methods
Bug fixes¶
- Rolling aggregation with
center=True
option now gives the same result with pandas including the last element (GH1046). By Keisuke Fujii. - Support indexing with a 0d-np.ndarray (GH1921). By Keisuke Fujii.
- Added warning in api.py of a netCDF4 bug that occurs when the filepath has 88 characters (GH1745). By Liam Brannigan.
- Fixed encoding of multi-dimensional coordinates in
to_netcdf()
(GH1763). By Mike Neish. - Fixed chunking with non-file-based rasterio datasets (GH1816) and refactored rasterio test suite. By Ryan Abernathey
- Bug fix in open_dataset(engine=’pydap’) (GH1775) By Keisuke Fujii.
- Bug fix in vectorized assignment (GH1743, GH1744).
Now item assignment to
__setitem__()
checks - Bug fix in vectorized assignment (GH1743, GH1744).
Now item assignment to
DataArray.__setitem__()
checks coordinates of target, destination and keys. If there are any conflict among these coordinates,IndexError
will be raised. By Keisuke Fujii. - Properly point
DataArray.__dask_scheduler__()
todask.threaded.get
. By Matthew Rocklin. - Bug fixes in
DataArray.plot.imshow()
: all-NaN arrays and arrays with size one in some dimension can now be plotted, which is good for exploring satellite imagery (GH1780). By Zac Hatfield-Dodds. - Fixed
UnboundLocalError
when opening netCDF file (GH1781). By Stephan Hoyer. - The
variables
,attrs
, anddimensions
properties have been deprecated as part of a bug fix addressing an issue where backends were unintentionally loading the datastores data and attributes repeatedly during writes (GH1798). By Joe Hamman. - Compatibility fixes to plotting module for Numpy 1.14 and Pandas 0.22 (GH1813). By Joe Hamman.
- Bug fix in encoding coordinates with
{'_FillValue': None}
in netCDF metadata (GH1865). By Chris Roth. - Fix indexing with lists for arrays loaded from netCDF files with
engine='h5netcdf
(GH1864). By Stephan Hoyer. - Corrected a bug with incorrect coordinates for non-georeferenced geotiff
files (GH1686). Internally, we now use the rasterio coordinate
transform tool instead of doing the computations ourselves. A
parse_coordinates
kwarg has beed added toopen_rasterio()
(set toTrue
per default). By Fabien Maussion. - The colors of discrete colormaps are now the same regardless if seaborn is installed or not (GH1896). By Fabien Maussion.
- Fixed dtype promotion rules in
where()
andconcat()
to match pandas (GH1847). A combination of strings/numbers or unicode/bytes now promote to object dtype, instead of strings or unicode. By Stephan Hoyer. - Fixed bug where
isnull()
was loading data stored as dask arrays (GH1937). By Joe Hamman.
v0.10.0 (20 November 2017)¶
This is a major release that includes bug fixes, new features and a few backwards incompatible changes. Highlights include:
- Indexing now supports broadcasting over dimensions, similar to NumPy’s vectorized indexing (but better!).
resample()
has a new groupby-like API like pandas.apply_ufunc()
facilitates wrapping and parallelizing functions written for NumPy arrays.- Performance improvements, particularly for dask and
open_mfdataset()
.
Breaking changes¶
xarray now supports a form of vectorized indexing with broadcasting, where the result of indexing depends on dimensions of indexers, e.g.,
array.sel(x=ind)
withind.dims == ('y',)
. Alignment between coordinates on indexed and indexing objects is also now enforced. Due to these changes, existing uses of xarray objects to index other xarray objects will break in some cases.The new indexing API is much more powerful, supporting outer, diagonal and vectorized indexing in a single interface. The
isel_points
andsel_points
methods are deprecated, since they are now redundant with theisel
/sel
methods. See Vectorized Indexing for the details (GH1444, GH1436). By Keisuke Fujii and Stephan Hoyer.A new resampling interface to match pandas’ groupby-like API was added to
Dataset.resample()
andDataArray.resample()
(GH1272). Timeseries resampling is fully supported for data with arbitrary dimensions as is both downsampling and upsampling (including linear, quadratic, cubic, and spline interpolation).Old syntax:
In [5]: ds.resample('24H', dim='time', how='max') Out[5]: <xarray.Dataset> [...]
New syntax:
In [6]: ds.resample(time='24H').max() Out[6]: <xarray.Dataset> [...]
Note that both versions are currently supported, but using the old syntax will produce a warning encouraging users to adopt the new syntax. By Daniel Rothenberg.
Calling
repr()
or printing xarray objects at the command line or in a Jupyter Notebook will not longer automatically compute dask variables or load data on arrays lazily loaded from disk (GH1522). By Guido Imperiale.Supplying
coords
as a dictionary to theDataArray
constructor without also supplying an explicitdims
argument is no longer supported. This behavior was deprecated in version 0.9 but will now raise an error (GH727).Several existing features have been deprecated and will change to new behavior in xarray v0.11. If you use any of them with xarray v0.10, you should see a
FutureWarning
that describes how to update your code:Dataset.T
has been deprecated an alias forDataset.transpose()
(GH1232). In the next major version of xarray, it will provide short- cut lookup for variables or attributes with name'T'
.DataArray.__contains__
(e.g.,key in data_array
) currently checks for membership inDataArray.coords
. In the next major version of xarray, it will check membership in the array data found inDataArray.values
instead (GH1267).- Direct iteration over and counting a
Dataset
(e.g.,[k for k in ds]
,ds.keys()
,ds.values()
,len(ds)
andif ds
) currently includes all variables, both data and coordinates. For improved usability and consistency with pandas, in the next major version of xarray these will change to only include data variables (GH884). Useds.variables
,ds.data_vars
ords.coords
as alternatives.
Changes to minimum versions of dependencies:
- Old numpy < 1.11 and pandas < 0.18 are no longer supported (GH1512). By Keisuke Fujii.
- The minimum supported version bottleneck has increased to 1.1 (GH1279). By Joe Hamman.
Enhancements¶
New functions/methods
New helper function
apply_ufunc()
for wrapping functions written to work on NumPy arrays to support labels on xarray objects (GH770).apply_ufunc
also support automatic parallelization for many functions with dask. See Wrapping custom computation and Automatic parallelization for details. By Stephan Hoyer.Added new method
Dataset.to_dask_dataframe()
, convert a dataset into a dask dataframe. This allows lazy loading of data from a dataset containing dask arrays (GH1462). By James Munroe.New function
where()
for conditionally switching between values in xarray objects, likenumpy.where()
:In [7]: import xarray as xr In [8]: arr = xr.DataArray([[1, 2, 3], [4, 5, 6]], dims=('x', 'y')) In [9]: xr.where(arr % 2, 'even', 'odd') Out[9]: <xarray.DataArray (x: 2, y: 3)> array([['even', 'odd', 'even'], ['odd', 'even', 'odd']], dtype='<U4') Dimensions without coordinates: x, y
Equivalently, the
where()
method also now supports theother
argument, for filling with a value other thanNaN
(GH576). By Stephan Hoyer.Added
show_versions()
function to aid in debugging (GH1485). By Joe Hamman.
Performance improvements
concat()
was computing variables that aren’t in memory (e.g. dask-based) multiple times;open_mfdataset()
was loading them multiple times from disk. Now, both functions will instead load them at most once and, if they do, store them in memory in the concatenated array/dataset (GH1521). By Guido Imperiale.- Speed-up (x 100) of
decode_cf_datetime()
. By Christian Chwala.
IO related improvements
Unicode strings (
str
on Python 3) are now round-tripped successfully even when written as character arrays (e.g., as netCDF3 files or when usingengine='scipy'
) (GH1638). This is controlled by the_Encoding
attribute convention, which is also understood directly by the netCDF4-Python interface. See String encoding for full details. By Stephan Hoyer.Support for
data_vars
andcoords
keywords fromconcat()
added toopen_mfdataset()
(GH438). Using these keyword arguments can significantly reduce memory usage and increase speed. By Oleksandr Huziy.Support for
pathlib.Path
objects added toopen_dataset()
,open_mfdataset()
,to_netcdf()
, andsave_mfdataset()
(GH799):In [10]: from pathlib import Path # In Python 2, use pathlib2! In [11]: data_dir = Path("data/") In [12]: one_file = data_dir / "dta_for_month_01.nc" In [13]: xr.open_dataset(one_file) Out[13]: <xarray.Dataset> [...]
By Willi Rath.
You can now explicitly disable any default
_FillValue
(NaN
for floating point values) by passing the enconding{'_FillValue': None}
(GH1598). By Stephan Hoyer.More attributes available in
attrs
dictionary when raster files are opened withopen_rasterio()
. By Greg Brener.Support for NetCDF files using an
_Unsigned
attribute to indicate that a a signed integer data type should be interpreted as unsigned bytes (GH1444). By Eric Bruning.Support using an existing, opened netCDF4
Dataset
withNetCDF4DataStore
. This permits creating anDataset
from a netCDF4Dataset
that has been opened using other means (GH1459). By Ryan May.Changed
PydapDataStore
to take a Pydap dataset. This permits opening Opendap datasets that require authentication, by instantiating a Pydap dataset with a session object. Also addedxarray.backends.PydapDataStore.open()
which takes a url and session object (GH1068). By Philip Graae.Support reading and writing unlimited dimensions with h5netcdf (GH1636). By Joe Hamman.
Other improvements
- Added
_ipython_key_completions_
to xarray objects, to enable autocompletion for dictionary-like access in IPython, e.g.,ds['tem
+ tab ->ds['temperature']
(GH1628). By Keisuke Fujii. - Support passing keyword arguments to
load
,compute
, andpersist
methods. Any keyword arguments supplied to these methods are passed on to the corresponding dask function (GH1523). By Joe Hamman. - Encoding attributes are now preserved when xarray objects are concatenated. The encoding is copied from the first object (GH1297). By Joe Hamman and Gerrit Holl.
- Support applying rolling window operations using bottleneck’s moving window functions on data stored as dask arrays (GH1279). By Joe Hamman.
- Experimental support for the Dask collection interface (GH1674). By Matthew Rocklin.
Bug fixes¶
Suppress
RuntimeWarning
issued bynumpy
for “invalid value comparisons” (e.g.NaN
). Xarray now behaves similarly to Pandas in its treatment of binary and unary operations on objects with NaNs (GH1657). By Joe Hamman.Unsigned int support for reduce methods with
skipna=True
(GH1562). By Keisuke Fujii.Fixes to ensure xarray works properly with pandas 0.21:
- Fix
isnull()
method (GH1549). to_series()
andto_dataframe()
should not return apandas.MultiIndex
for 1D data (GH1548).- Fix plotting with datetime64 axis labels (GH1661).
By Stephan Hoyer.
- Fix
open_rasterio()
method now shifts the rasterio coordinates so that they are centered in each pixel (GH1468). By Greg Brener.rename()
method now doesn’t throw errors if someVariable
is renamed to the same name as anotherVariable
as long as that otherVariable
is also renamed (GH1477). This method now does throw when twoVariables
would end up with the same name after the rename (since one of them would get overwritten in this case). By Prakhar Goel.Fix
xarray.testing.assert_allclose()
to actually useatol
andrtol
arguments when called onDataArray
objects (GH1488). By Stephan Hoyer.xarray
quantile
methods now properly raise aTypeError
when applied to objects with data stored asdask
arrays (GH1529). By Joe Hamman.Fix positional indexing to allow the use of unsigned integers (GH1405). By Joe Hamman and Gerrit Holl.
Creating a
Dataset
now raisesMergeError
if a coordinate shares a name with a dimension but is comprised of arbitrary dimensions (GH1120). By Joe Hamman.open_rasterio()
method now skips rasterio’scrs
attribute if its value isNone
(GH1520). By Leevi Annala.Fix
xarray.DataArray.to_netcdf()
to return bytes when no path is provided (GH1410). By Joe Hamman.Fix
xarray.save_mfdataset()
to properly raise an informative error when objects other thanDataset
are provided (GH1555). By Joe Hamman.xarray.Dataset.copy()
would not preserve the encoding property (GH1586). By Guido Imperiale.xarray.concat()
would eagerly load dask variables into memory if the first argument was a numpy variable (GH1588). By Guido Imperiale.Fix bug in
to_netcdf()
when writing in append mode (GH1215). By Joe Hamman.Fix
netCDF4
backend to properly roundtrip theshuffle
encoding option (GH1606). By Joe Hamman.Fix bug when using
pytest
class decorators to skiping certain unittests. The previous behavior unintentionally causing additional tests to be skipped (GH1531). By Joe Hamman.Fix pynio backend for upcoming release of pynio with Python 3 support (GH1611). By Ben Hillman.
Fix
seaborn
import warning for Seaborn versions 0.8 and newer when theapionly
module was deprecated. (GH1633). By Joe Hamman.Fix COMPAT: MultiIndex checking is fragile (GH1833). By Florian Pinault.
Fix
rasterio
backend for Rasterio versions 1.0alpha10 and newer. (GH1641). By Chris Holden.
Bug fixes after rc1¶
- Suppress warning in IPython autocompletion, related to the deprecation
of
.T
attributes (GH1675). By Keisuke Fujii. - Fix a bug in lazily-indexing netCDF array. (GH1688) By Keisuke Fujii.
- (Internal bug) MemoryCachedArray now supports the orthogonal indexing. Also made some internal cleanups around array wrappers (GH1429). By Keisuke Fujii.
- (Internal bug) MemoryCachedArray now always wraps
np.ndarray
byNumpyIndexingAdapter
. (GH1694) By Keisuke Fujii. - Fix importing xarray when running Python with
-OO
(GH1706). By Stephan Hoyer. - Saving a netCDF file with a coordinates with a spaces in its names now raises an appropriate warning (GH1689). By Stephan Hoyer.
- Fix two bugs that were preventing dask arrays from being specified as coordinates in the DataArray constructor (GH1684). By Joe Hamman.
- Fixed
apply_ufunc
withdask='parallelized'
for scalar arguments (GH1697). By Stephan Hoyer. - Fix “Chunksize cannot exceed dimension size” error when writing netCDF4 files loaded from disk (GH1225). By Stephan Hoyer.
- Validate the shape of coordinates with names matching dimensions in the DataArray constructor (GH1709). By Stephan Hoyer.
- Raise
NotImplementedError
when attempting to save a MultiIndex to a netCDF file (GH1547). By Stephan Hoyer. - Remove netCDF dependency from rasterio backend tests. By Matti Eskelinen
Bug fixes after rc2¶
- Fixed unexpected behavior in
Dataset.set_index()
andDataArray.set_index()
introduced by Pandas 0.21.0. Setting a new index with a single variable resulted in 1-levelpandas.MultiIndex
instead of a simplepandas.Index
(GH1722). By Benoit Bovy. - Fixed unexpected memory loading of backend arrays after
print
. (GH1720). By Keisuke Fujii.
v0.9.6 (8 June 2017)¶
This release includes a number of backwards compatible enhancements and bug fixes.
Enhancements¶
- New
sortby()
method toDataset
andDataArray
that enable sorting along dimensions (GH967). See the docs for examples. By Chun-Wei Yuan and Kyle Heuton. - Add
.dt
accessor to DataArrays for computing datetime-like properties for the values they contain, similar topandas.Series
(GH358). By Daniel Rothenberg. - Renamed internal dask arrays created by
open_dataset
to match new dask conventions (GH1343). By Ryan Abernathey. as_variable()
is now part of the public API (GH1303). By Benoit Bovy.align()
now supportsjoin='exact'
, which raises an error instead of aligning when indexes to be aligned are not equal. By Stephan Hoyer.- New function
open_rasterio()
for opening raster files with the rasterio library. See the docs for details. By Joe Hamman, Nic Wayand and Fabien Maussion
Bug fixes¶
- Fix error from repeated indexing of datasets loaded from disk (GH1374). By Stephan Hoyer.
- Fix a bug where
.isel_points
wrongly assigns unselected coordinate todata_vars
. By Keisuke Fujii. - Tutorial datasets are now checked against a reference MD5 sum to confirm successful download (GH1392). By Matthew Gidden.
DataArray.chunk()
now accepts dask specific kwargs likeDataset.chunk()
does. By Fabien Maussion.- Support for
engine='pydap'
with recent releases of Pydap (3.2.2+), including on Python 3 (GH1174).
Documentation¶
- A new gallery allows to add interactive examples to the documentation. By Fabien Maussion.
Testing¶
- Fix test suite failure caused by changes to
pandas.cut
function (GH1386). By Ryan Abernathey. - Enhanced tests suite by use of
@network
decorator, which is controlled via--run-network-tests
command line argument topy.test
(GH1393). By Matthew Gidden.
v0.9.5 (17 April, 2017)¶
Remove an inadvertently introduced print statement.
v0.9.3 (16 April, 2017)¶
This minor release includes bug-fixes and backwards compatible enhancements.
Enhancements¶
- New
persist()
method to Datasets and DataArrays to enable persisting data in distributed memory when using Dask (GH1344). By Matthew Rocklin. - New
expand_dims()
method forDataArray
andDataset
(GH1326). By Keisuke Fujii.
Bug fixes¶
- Fix
.where()
withdrop=True
when arguments do not have indexes (GH1350). This bug, introduced in v0.9, resulted in xarray producing incorrect results in some cases. By Stephan Hoyer. - Fixed writing to file-like objects with
to_netcdf()
(GH1320). Stephan Hoyer. - Fixed explicitly setting
engine='scipy'
withto_netcdf
when not providing a path (GH1321). Stephan Hoyer. - Fixed open_dataarray does not pass properly its parameters to open_dataset (GH1359). Stephan Hoyer.
- Ensure test suite works when runs from an installed version of xarray
(GH1336). Use
@pytest.mark.slow
instead of a custom flag to mark slow tests. By Stephan Hoyer
v0.9.2 (2 April 2017)¶
The minor release includes bug-fixes and backwards compatible enhancements.
Enhancements¶
rolling
on Dataset is now supported (GH859)..rolling()
on Dataset is now supported (GH859). By Keisuke Fujii.- When bottleneck version 1.1 or later is installed, use bottleneck for rolling
var
,argmin
,argmax
, andrank
computations. Also, rolling median now accepts amin_periods
argument (GH1276). By Joe Hamman. - When
.plot()
is called on a 2D DataArray and only one dimension is specified withx=
ory=
, the other dimension is now guessed (GH1291). By Vincent Noel. - Added new method
assign_attrs()
toDataArray
andDataset
, a chained-method compatible implementation of thedict.update
method on attrs (GH1281). By Henry S. Harrison. - Added new
autoclose=True
argument toopen_mfdataset()
to explicitly close opened files when not in use to prevent occurrence of an OS Error related to too many open files (GH1198). Note, the default isautoclose=False
, which is consistent with previous xarray behavior. By Phillip J. Wolfram. - The
repr()
ofDataset
andDataArray
attributes uses a similar format to coordinates and variables, with vertically aligned entries truncated to fit on a single line (GH1319). Hopefully this will stop people writingdata.attrs = {}
and discarding metadata in notebooks for the sake of cleaner output. The full metadata is still available asdata.attrs
. By Zac Hatfield-Dodds. - Enhanced tests suite by use of
@slow
and@flaky
decorators, which are controlled via--run-flaky
and--skip-slow
command line arguments topy.test
(GH1336). By Stephan Hoyer and Phillip J. Wolfram. - New aggregation on rolling objects
DataArray.rolling(...).count()
which providing a rolling count of valid values (GH1138).
Bug fixes¶
- Rolling operations now keep preserve original dimension order (GH1125). By Keisuke Fujii.
- Fixed
sel
withmethod='nearest'
on Python 2.7 and 64-bit Windows (GH1140). Stephan Hoyer. - Fixed
where
withdrop='True'
for empty masks (GH1341). By Stephan Hoyer and Phillip J. Wolfram.
v0.9.1 (30 January 2017)¶
Renamed the “Unindexed dimensions” section in the Dataset
and
DataArray
repr (added in v0.9.0) to “Dimensions without coordinates”
(GH1199).
v0.9.0 (25 January 2017)¶
This major release includes five months worth of enhancements and bug fixes from 24 contributors, including some significant changes that are not fully backwards compatible. Highlights include:
- Coordinates are now optional in the xarray data model, even for dimensions.
- Changes to caching, lazy loading and pickling to improve xarray’s experience for parallel computing.
- Improvements for accessing and manipulating
pandas.MultiIndex
levels. - Many new methods and functions, including
quantile()
,cumsum()
,cumprod()
combine_first
set_index()
,reset_index()
,reorder_levels()
,full_like()
,zeros_like()
,ones_like()
open_dataarray()
,compute()
,Dataset.info()
,testing.assert_equal()
,testing.assert_identical()
, andtesting.assert_allclose()
.
Breaking changes¶
Index coordinates for each dimensions are now optional, and no longer created by default GH1017. You can identify such dimensions without coordinates by their appearance in list of “Dimensions without coordinates” in the
Dataset
orDataArray
repr:In [14]: xr.Dataset({'foo': (('x', 'y'), [[1, 2]])}) Out[14]: <xarray.Dataset> Dimensions: (x: 1, y: 2) Dimensions without coordinates: x, y Data variables: foo (x, y) int64 1 2
This has a number of implications:
align()
andreindex()
can now error, if dimensions labels are missing and dimensions have different sizes.- Because pandas does not support missing indexes, methods such as
to_dataframe
/from_dataframe
andstack
/unstack
no longer roundtrip faithfully on all inputs. Usereset_index()
to remove undesired indexes. Dataset.__delitem__
anddrop()
no longer delete/drop variables that have dimensions matching a deleted/dropped variable.DataArray.coords.__delitem__
is now allowed on variables matching dimension names..sel
and.loc
now handle indexing along a dimension without coordinate labels by doing integer based indexing. See Missing coordinate labels for an example.indexes
is no longer guaranteed to include all dimensions names as keys. The new methodget_index()
has been added to get an index for a dimension guaranteed, falling back to produce a defaultRangeIndex
if necessary.
The default behavior of
merge
is nowcompat='no_conflicts'
, so some merges will now succeed in cases that previously raisedxarray.MergeError
. Setcompat='broadcast_equals'
to restore the previous default. See Merging with ‘no_conflicts’ for more details.Reading
values
no longer always caches values in a NumPy array GH1128. Caching of.values
on variables read from netCDF files on disk is still the default whenopen_dataset()
is called withcache=True
. By Guido Imperiale and Stephan Hoyer.Pickling a
Dataset
orDataArray
linked to a file on disk no longer caches its values into memory before pickling (GH1128). Instead, pickle stores file paths and restores objects by reopening file references. This enables preliminary, experimental use of xarray for opening files with dask.distributed. By Stephan Hoyer.Coordinates used to index a dimension are now loaded eagerly into
pandas.Index
objects, instead of loading the values lazily. By Guido Imperiale.Automatic levels for 2d plots are now guaranteed to land on
vmin
andvmax
when these kwargs are explicitly provided (GH1191). The automated level selection logic also slightly changed. By Fabien Maussion.DataArray.rename()
behavior changed to strictly change theDataArray.name
if called with string argument, or strictly change coordinate names if called with dict-like argument. By Markus Gonser.By default
to_netcdf()
add a_FillValue = NaN
attributes to float types. By Frederic Laliberte.repr
onDataArray
objects uses an shortened display for NumPy array data that is less likely to overflow onto multiple pages (GH1207). By Stephan Hoyer.xarray no longer supports python 3.3, versions of dask prior to v0.9.0, or versions of bottleneck prior to v1.0.
Deprecations¶
- Renamed the
Coordinate
class from xarray’s low level API toIndexVariable
.Variable.to_variable
andVariable.to_coord
have been renamed toto_base_variable()
andto_index_variable()
. - Deprecated supplying
coords
as a dictionary to theDataArray
constructor without also supplying an explicitdims
argument. The old behavior encouraged relying on the iteration order of dictionaries, which is a bad practice (GH727). - Removed a number of methods deprecated since v0.7.0 or earlier:
load_data
,vars
,drop_vars
,dump
,dumps
and thevariables
keyword argument toDataset
. - Removed the dummy module that enabled
import xray
.
Enhancements¶
- Added new method
combine_first()
toDataArray
andDataset
, based on the pandas method of the same name (see Combine). By Chun-Wei Yuan. - Added the ability to change default automatic alignment (arithmetic_join=”inner”)
for binary operations via
set_options()
(see Automatic alignment). By Chun-Wei Yuan. - Add checking of
attr
names and values when saving to netCDF, raising useful error messages if they are invalid. (GH911). By Robin Wilson. - Added ability to save
DataArray
objects directly to netCDF files usingto_netcdf()
, and to load directly from netCDF files usingopen_dataarray()
(GH915). These remove the need to convert aDataArray
to aDataset
before saving as a netCDF file, and deals with names to ensure a perfect ‘roundtrip’ capability. By Robin Wilson. - Multi-index levels are now accessible as “virtual” coordinate variables,
e.g.,
ds['time']
can pull out the'time'
level of a multi-index (see Coordinates).sel
also accepts providing multi-index levels as keyword arguments, e.g.,ds.sel(time='2000-01')
(see Multi-level indexing). By Benoit Bovy. - Added
set_index
,reset_index
andreorder_levels
methods to easily create and manipulate (multi-)indexes (see Set and reset index). By Benoit Bovy. - Added the
compat
option'no_conflicts'
tomerge
, allowing the combination of xarray objects with disjoint (GH742) or overlapping (GH835) coordinates as long as all present data agrees. By Johnnie Gray. See Merging with ‘no_conflicts’ for more details. - It is now possible to set
concat_dim=None
explicitly inopen_mfdataset()
to disable inferring a dimension along which to concatenate. By Stephan Hoyer. - Added methods
DataArray.compute()
,Dataset.compute()
, andVariable.compute()
as a non-mutating alternative toload()
. By Guido Imperiale. - Adds DataArray and Dataset methods
cumsum()
andcumprod()
. By Phillip J. Wolfram. - New properties
Dataset.sizes
andDataArray.sizes
for providing consistent access to dimension length on bothDataset
andDataArray
(GH921). By Stephan Hoyer. - New keyword argument
drop=True
forsel()
,isel()
andsqueeze()
for dropping scalar coordinates that arise from indexing.DataArray
(GH242). By Stephan Hoyer. - New top-level functions
full_like()
,zeros_like()
, andones_like()
By Guido Imperiale. - Overriding a preexisting attribute with
register_dataset_accessor()
orregister_dataarray_accessor()
now issues a warning instead of raising an error (GH1082). By Stephan Hoyer. - Options for axes sharing between subplots are exposed to
FacetGrid
andplot()
, so axes sharing can be disabled for polar plots. By Bas Hoonhout. - New utility functions
assert_equal()
,assert_identical()
, andassert_allclose()
for asserting relationships between xarray objects, designed for use in a pytest test suite. figsize
,size
andaspect
plot arguments are now supported for all plots (GH897). See Controlling the figure size for more details. By Stephan Hoyer and Fabien Maussion.- New
info()
method to summarizeDataset
variables and attributes. The method prints to a buffer (e.g.stdout
) with output similar to what the command line utilityncdump -h
produces (GH1150). By Joe Hamman. - Added the ability write unlimited netCDF dimensions with the
scipy
andnetcdf4
backends via the newencoding
attribute or via theunlimited_dims
argument toto_netcdf()
. By Joe Hamman. - New
quantile()
method to calculate quantiles from DataArray objects (GH1187). By Joe Hamman.
Bug fixes¶
groupby_bins
now restores empty bins by default (GH1019). By Ryan Abernathey.- Fix issues for dates outside the valid range of pandas timestamps (GH975). By Mathias Hauser.
- Unstacking produced flipped array after stacking decreasing coordinate values (GH980). By Stephan Hoyer.
- Setting
dtype
via theencoding
parameter ofto_netcdf
failed if the encoded dtype was the same as the dtype of the original array (GH873). By Stephan Hoyer. - Fix issues with variables where both attributes
_FillValue
andmissing_value
are set toNaN
(GH997). By Marco Zühlke. .where()
and.fillna()
now preserve attributes (GH1009). By Fabien Maussion.- Applying
broadcast()
to an xarray object based on the dask backend won’t accidentally convert the array from dask to numpy anymore (GH978). By Guido Imperiale. Dataset.concat()
now preserves variables order (GH1027). By Fabien Maussion.- Fixed an issue with pcolormesh (GH781). A new
infer_intervals
keyword gives control on whether the cell intervals should be computed or not. By Fabien Maussion. - Grouping over an dimension with non-unique values with
groupby
gives correct groups. By Stephan Hoyer. - Fixed accessing coordinate variables with non-string names from
.coords
. By Stephan Hoyer. rename()
now simultaneously renames the array and any coordinate with the same name, when supplied via adict
(GH1116). By Yves Delley.- Fixed sub-optimal performance in certain operations with object arrays (GH1121). By Yves Delley.
- Fix
.groupby(group)
whengroup
has datetime dtype (GH1132). By Jonas Sølvsteen. - Fixed a bug with facetgrid (the
norm
keyword was ignored, GH1159). By Fabien Maussion. - Resolved a concurrency bug that could cause Python to crash when simultaneously reading and writing netCDF4 files with dask (GH1172). By Stephan Hoyer.
- Fix to make
.copy()
actually copy dask arrays, which will be relevant for future releases of dask in which dask arrays will be mutable (GH1180). By Stephan Hoyer. - Fix opening NetCDF files with multi-dimensional time variables (GH1229). By Stephan Hoyer.
Performance improvements¶
isel_points()
andsel_points()
now use vectorised indexing in numpy and dask (GH1161), which can result in several orders of magnitude speedup. By Jonathan Chambers.
v0.8.2 (18 August 2016)¶
This release includes a number of bug fixes and minor enhancements.
Breaking changes¶
broadcast()
andconcat()
now auto-align inputs, usingjoin=outer
. Previously, these functions raisedValueError
for non-aligned inputs. By Guido Imperiale.
Enhancements¶
- New documentation on Transitioning from pandas.Panel to xarray. By Maximilian Roos.
- New
Dataset
andDataArray
methodsto_dict()
andfrom_dict()
to allow easy conversion between dictionaries and xarray objects (GH432). See dictionary IO for more details. By Julia Signell. - Added
exclude
andindexes
optional parameters toalign()
, andexclude
optional parameter tobroadcast()
. By Guido Imperiale. - Better error message when assigning variables without dimensions (GH971). By Stephan Hoyer.
- Better error message when reindex/align fails due to duplicate index values (GH956). By Stephan Hoyer.
Bug fixes¶
- Ensure xarray works with h5netcdf v0.3.0 for arrays with
dtype=str
(GH953). By Stephan Hoyer. Dataset.__dir__()
(i.e. the method python calls to get autocomplete options) failed if one of the dataset’s keys was not a string (GH852). By Maximilian Roos.Dataset
constructor can now take arbitrary objects as values (GH647). By Maximilian Roos.- Clarified
copy
argument forreindex()
andalign()
, which now consistently always return new xarray objects (GH927). - Fix
open_mfdataset
withengine='pynio'
(GH936). By Stephan Hoyer. groupby_bins
sorted bin labels as strings (GH952). By Stephan Hoyer.- Fix bug introduced by v0.8.0 that broke assignment to datasets when both the left and right side have the same non-unique index values (GH956).
v0.8.1 (5 August 2016)¶
Bug fixes¶
- Fix bug in v0.8.0 that broke assignment to Datasets with non-unique indexes (GH943). By Stephan Hoyer.
v0.8.0 (2 August 2016)¶
This release includes four months of new features and bug fixes, including several breaking changes.
Breaking changes¶
- Dropped support for Python 2.6 (GH855).
- Indexing on multi-index now drop levels, which is consistent with pandas. It also changes the name of the dimension / coordinate when the multi-index is reduced to a single index (GH802).
- Contour plots no longer add a colorbar per default (GH866). Filled contour plots are unchanged.
DataArray.values
and.data
now always returns an NumPy array-like object, even for 0-dimensional arrays with object dtype (GH867). Previously,.values
returned native Python objects in such cases. To convert the values of scalar arrays to Python objects, use the.item()
method.
Enhancements¶
- Groupby operations now support grouping over multidimensional variables. A new
method called
groupby_bins()
has also been added to allow users to specify bins for grouping. The new features are described in Multidimensional Grouping and Working with Multidimensional Coordinates. By Ryan Abernathey. - DataArray and Dataset method
where()
now supports adrop=True
option that clips coordinate elements that are fully masked. By Phillip J. Wolfram. - New top level
merge()
function allows for combining variables from any number ofDataset
and/orDataArray
variables. See Merge for more details. By Stephan Hoyer. - DataArray and Dataset method
resample()
now supports thekeep_attrs=False
option that determines whether variable and dataset attributes are retained in the resampled object. By Jeremy McGibbon. - Better multi-index support in DataArray and Dataset
sel()
andloc()
methods, which now behave more closely to pandas and which also accept dictionaries for indexing based on given level names and labels (see Multi-level indexing). By Benoit Bovy. - New (experimental) decorators
register_dataset_accessor()
andregister_dataarray_accessor()
for registering custom xarray extensions without subclassing. They are described in the new documentation page on xarray Internals. By Stephan Hoyer. - Round trip boolean datatypes. Previously, writing boolean datatypes to netCDF formats would raise an error since netCDF does not have a bool datatype. This feature reads/writes a dtype attribute to boolean variables in netCDF files. By Joe Hamman.
- 2D plotting methods now have two new keywords (cbar_ax and cbar_kwargs), allowing more control on the colorbar (GH872). By Fabien Maussion.
- New Dataset method
filter_by_attrs()
, akin tonetCDF4.Dataset.get_variables_by_attributes
, to easily filter data variables using its attributes. Filipe Fernandes.
Bug fixes¶
- Attributes were being retained by default for some resampling
operations when they should not. With the
keep_attrs=False
option, they will no longer be retained by default. This may be backwards-incompatible with some scripts, but the attributes may be kept by adding thekeep_attrs=True
option. By Jeremy McGibbon. - Concatenating xarray objects along an axis with a MultiIndex or PeriodIndex preserves the nature of the index (GH875). By Stephan Hoyer.
- Fixed bug in arithmetic operations on DataArray objects whose dimensions are numpy structured arrays or recarrays GH861, GH837. By Maciek Swat.
decode_cf_timedelta
now accepts arrays withndim
>1 (GH842).- This fixes issue GH665. Filipe Fernandes.
- Fix a bug where xarray.ufuncs that take two arguments would incorrectly use to numpy functions instead of dask.array functions (GH876). By Stephan Hoyer.
- Support for pickling functions from
xarray.ufuncs
(GH901). By Stephan Hoyer. Variable.copy(deep=True)
no longer converts MultiIndex into a base Index (GH769). By Benoit Bovy.- Fixes for groupby on dimensions with a multi-index (GH867). By Stephan Hoyer.
- Fix printing datasets with unicode attributes on Python 2 (GH892). By Stephan Hoyer.
- Fixed incorrect test for dask version (GH891). By Stephan Hoyer.
- Fixed dim argument for isel_points/sel_points when a pandas.Index is passed. By Stephan Hoyer.
contour()
now plots the correct number of contours (GH866). By Fabien Maussion.
v0.7.2 (13 March 2016)¶
This release includes two new, entirely backwards compatible features and several bug fixes.
Enhancements¶
New DataArray method
DataArray.dot()
for calculating the dot product of two DataArrays along shared dimensions. By Dean Pospisil.Rolling window operations on DataArray objects are now supported via a new
DataArray.rolling()
method. For example:In [15]: import xarray as xr; import numpy as np In [16]: arr = xr.DataArray(np.arange(0, 7.5, 0.5).reshape(3, 5), dims=('x', 'y')) In [17]: arr Out[17]: <xarray.DataArray (x: 3, y: 5)> array([[ 0. , 0.5, 1. , 1.5, 2. ], [ 2.5, 3. , 3.5, 4. , 4.5], [ 5. , 5.5, 6. , 6.5, 7. ]]) Coordinates: * x (x) int64 0 1 2 * y (y) int64 0 1 2 3 4 In [18]: arr.rolling(y=3, min_periods=2).mean() Out[18]: <xarray.DataArray (x: 3, y: 5)> array([[ nan, 0.25, 0.5 , 1. , 1.5 ], [ nan, 2.75, 3. , 3.5 , 4. ], [ nan, 5.25, 5.5 , 6. , 6.5 ]]) Coordinates: * x (x) int64 0 1 2 * y (y) int64 0 1 2 3 4
See Rolling window operations for more details. By Joe Hamman.
Bug fixes¶
- Fixed an issue where plots using pcolormesh and Cartopy axes were being distorted
by the inference of the axis interval breaks. This change chooses not to modify
the coordinate variables when the axes have the attribute
projection
, allowing Cartopy to handle the extent of pcolormesh plots (GH781). By Joe Hamman. - 2D plots now better handle additional coordinates which are not
DataArray
dimensions (GH788). By Fabien Maussion.
v0.7.1 (16 February 2016)¶
This is a bug fix release that includes two small, backwards compatible enhancements. We recommend that all users upgrade.
Enhancements¶
Bug fixes¶
- Restore checks for shape consistency between data and coordinates in the DataArray constructor (GH758).
- Single dimension variables no longer transpose as part of a broader
.transpose
. This behavior was causingpandas.PeriodIndex
dimensions to lose their type (GH749) Dataset
labels remain as their native type on.to_dataset
. Previously they were coerced to strings (GH745)- Fixed a bug where replacing a
DataArray
index coordinate would improperly align the coordinate (GH725). DataArray.reindex_like
now maintains the dtype of complex numbers when reindexing leads to NaN values (GH738).Dataset.rename
andDataArray.rename
support the old and new names being the same (GH724).- Fix
from_dataset()
for DataFrames with Categorical column and a MultiIndex index (GH737). - Fixes to ensure xarray works properly after the upcoming pandas v0.18 and NumPy v1.11 releases.
Acknowledgments¶
The following individuals contributed to this release:
- Edward Richards
- Maximilian Roos
- Rafael Guedes
- Spencer Hill
- Stephan Hoyer
v0.7.0 (21 January 2016)¶
This major release includes redesign of DataArray
internals, as well as new methods for reshaping, rolling and shifting
data. It includes preliminary support for pandas.MultiIndex
,
as well as a number of other features and bug fixes, several of which
offer improved compatibility with pandas.
New name¶
The project formerly known as “xray” is now “xarray”, pronounced “x-array”! This avoids a namespace conflict with the entire field of x-ray science. Renaming our project seemed like the right thing to do, especially because some scientists who work with actual x-rays are interested in using this project in their work. Thanks for your understanding and patience in this transition. You can now find our documentation and code repository at new URLs:
To ease the transition, we have simultaneously released v0.7.0 of both
xray
and xarray
on the Python Package Index. These packages are
identical. For now, import xray
still works, except it issues a
deprecation warning. This will be the last xray release. Going forward, we
recommend switching your import statements to import xarray as xr
.
Breaking changes¶
The internal data model used by
DataArray
has been rewritten to fix several outstanding issues (GH367, GH634, this stackoverflow report). Internally,DataArray
is now implemented in terms of._variable
and._coords
attributes instead of holding variables in aDataset
object.This refactor ensures that if a DataArray has the same name as one of its coordinates, the array and the coordinate no longer share the same data.
In practice, this means that creating a DataArray with the same
name
as one of its dimensions no longer automatically uses that array to label the corresponding coordinate. You will now need to provide coordinate labels explicitly. Here’s the old behavior:In [19]: xray.DataArray([4, 5, 6], dims='x', name='x') Out[19]: <xray.DataArray 'x' (x: 3)> array([4, 5, 6]) Coordinates: * x (x) int64 4 5 6
and the new behavior (compare the values of the
x
coordinate):In [20]: xray.DataArray([4, 5, 6], dims='x', name='x') Out[20]: <xray.DataArray 'x' (x: 3)> array([4, 5, 6]) Coordinates: * x (x) int64 0 1 2
It is no longer possible to convert a DataArray to a Dataset with
xray.DataArray.to_dataset()
if it is unnamed. This will now raiseValueError
. If the array is unnamed, you need to supply thename
argument.
Enhancements¶
Basic support for
MultiIndex
coordinates on xray objects, including indexing,stack()
andunstack()
:In [21]: df = pd.DataFrame({'foo': range(3), ....: 'x': ['a', 'b', 'b'], ....: 'y': [0, 0, 1]}) ....: In [22]: s = df.set_index(['x', 'y'])['foo'] In [23]: arr = xray.DataArray(s, dims='z') In [24]: arr Out[24]: <xray.DataArray 'foo' (z: 3)> array([0, 1, 2]) Coordinates: * z (z) object ('a', 0) ('b', 0) ('b', 1) In [25]: arr.indexes['z'] Out[25]: MultiIndex(levels=[[u'a', u'b'], [0, 1]], labels=[[0, 1, 1], [0, 0, 1]], names=[u'x', u'y']) In [26]: arr.unstack('z') Out[26]: <xray.DataArray 'foo' (x: 2, y: 2)> array([[ 0., nan], [ 1., 2.]]) Coordinates: * x (x) object 'a' 'b' * y (y) int64 0 1 In [27]: arr.unstack('z').stack(z=('x', 'y')) Out[27]: <xray.DataArray 'foo' (z: 4)> array([ 0., nan, 1., 2.]) Coordinates: * z (z) object ('a', 0) ('a', 1) ('b', 0) ('b', 1)
See Stack and unstack for more details.
Warning
xray’s MultiIndex support is still experimental, and we have a long to- do list of desired additions (GH719), including better display of multi-index levels when printing a
Dataset
, and support for saving datasets with a MultiIndex to a netCDF file. User contributions in this area would be greatly appreciated.Support for reading GRIB, HDF4 and other file formats via PyNIO. See Formats supported by PyNIO for more details.
Better error message when a variable is supplied with the same name as one of its dimensions.
Plotting: more control on colormap parameters (GH642).
vmin
andvmax
will not be silently ignored anymore. Settingcenter=False
prevents automatic selection of a divergent colormap.New
shift()
androll()
methods for shifting/rotating datasets or arrays along a dimension:In [28]: array = xray.DataArray([5, 6, 7, 8], dims='x') In [29]: array.shift(x=2) Out[29]: <xarray.DataArray (x: 4)> array([nan, nan, 5., 6.]) Dimensions without coordinates: x In [30]: array.roll(x=2)