Toy weather data¶

Here is an example of how to easily manipulate a toy weather dataset using xarray and other recommended Python libraries:

Examine a dataset with pandas and seaborn
Probability of freeze by calendar month
Monthly averaging
Calculate monthly anomalies
Calculate standardized monthly anomalies
Fill missing values with climatology

Shared setup:

import numpy as np
import pandas as pd
import seaborn as sns  # noqa, pandas aware plotting library

import xarray as xr

np.random.seed(123)

times = pd.date_range("2000-01-01", "2001-12-31", name="time")
annual_cycle = np.sin(2 * np.pi * (times.dayofyear.values / 365.25 - 0.28))

base = 10 + 15 * annual_cycle.reshape(-1, 1)
tmin_values = base + 3 * np.random.randn(annual_cycle.size, 3)
tmax_values = base + 10 + 3 * np.random.randn(annual_cycle.size, 3)

ds = xr.Dataset(
    {
        "tmin": (("time", "location"), tmin_values),
        "tmax": (("time", "location"), tmax_values),
    },
    {"time": times, "location": ["IA", "IN", "IL"]},
)

Examine a dataset with pandas and seaborn ¶

In [1]: ds
Out[1]: 
<xarray.Dataset>
Dimensions:   (location: 3, time: 731)
Coordinates:
  * time      (time) datetime64[ns] 2000-01-01 2000-01-02 ... 2001-12-31
  * location  (location) <U2 'IA' 'IN' 'IL'
Data variables:
    tmin      (time, location) float64 -8.037 -1.788 -3.932 ... -1.346 -4.544
    tmax      (time, location) float64 12.98 3.31 6.779 ... 6.636 3.343 3.805

In [2]: df = ds.to_dataframe()

In [3]: df.head()
Out[3]: 
                          tmin       tmax
location time                            
IA       2000-01-01  -8.037369  12.980549
         2000-01-02  -9.341157   0.447856
         2000-01-03 -12.139719   5.322699
         2000-01-04  -7.492914   1.889425
         2000-01-05  -0.447129   0.791176

In [4]: df.describe()
Out[4]: 
              tmin         tmax
count  2193.000000  2193.000000
mean      9.975426    20.108232
std      10.963228    11.010569
min     -13.395763    -3.506234
25%      -0.040347     9.853905
50%      10.060403    19.967409
75%      20.083590    30.045588
max      33.456060    43.271148

In [5]: ds.mean(dim='location').to_dataframe().plot()
Out[5]: <matplotlib.axes._subplots.AxesSubplot at 0x7fae2826f6d8>

In [6]: sns.pairplot(df.reset_index(), vars=ds.data_vars)
Out[6]: <seaborn.axisgrid.PairGrid at 0x7fae06ecc550>

Probability of freeze by calendar month ¶

In [7]: freeze = (ds['tmin'] <= 0).groupby('time.month').mean('time')

In [8]: freeze
Out[8]: 
<xarray.DataArray 'tmin' (month: 12, location: 3)>
array([[0.952, 0.887, 0.935],
       [0.842, 0.719, 0.772],
       [0.242, 0.129, 0.161],
       ...,
       [0.   , 0.016, 0.   ],
       [0.333, 0.35 , 0.233],
       [0.935, 0.855, 0.823]])
Coordinates:
  * location  (location) <U2 'IA' 'IN' 'IL'
  * month     (month) int64 1 2 3 4 5 6 7 8 9 10 11 12

In [9]: freeze.to_pandas().plot()
Out[9]: <matplotlib.axes._subplots.AxesSubplot at 0x7fae30dc3780>

Monthly averaging ¶

In [10]: monthly_avg = ds.resample(time='1MS').mean()

In [11]: monthly_avg.sel(location='IA').to_dataframe().plot(style='s-')
Out[11]: <matplotlib.axes._subplots.AxesSubplot at 0x7fae06df05c0>

../_images/examples_tmin_tmax_plot_mean.png

Note that MS here refers to Month-Start; M labels Month-End (the last day of the month).

Calculate monthly anomalies ¶

In climatology, “anomalies” refer to the difference between observations and typical weather for a particular season. Unlike observations, anomalies should not show any seasonal cycle.

In [12]: climatology = ds.groupby('time.month').mean('time')

In [13]: anomalies = ds.groupby('time.month') - climatology

In [14]: anomalies.mean('location').to_dataframe()[['tmin', 'tmax']].plot()
Out[14]: <matplotlib.axes._subplots.AxesSubplot at 0x7fae2b41a048>

Calculate standardized monthly anomalies ¶

You can create standardized anomalies where the difference between the observations and the climatological monthly mean is divided by the climatological standard deviation.

In [15]: climatology_mean = ds.groupby('time.month').mean('time')

In [16]: climatology_std = ds.groupby('time.month').std('time')

In [17]: stand_anomalies = xr.apply_ufunc(
   ....:                                  lambda x, m, s: (x - m) / s,
   ....:                                  ds.groupby('time.month'),
   ....:                                  climatology_mean, climatology_std)
   ....: 

In [18]: stand_anomalies.mean('location').to_dataframe()[['tmin', 'tmax']].plot()
Out[18]: <matplotlib.axes._subplots.AxesSubplot at 0x7fae2b3892b0>

../_images/examples_standardized_anomalies_plot.png

Fill missing values with climatology ¶

The fillna() method on grouped objects lets you easily fill missing values by group:

 # throw away the first half of every month
In [19]: some_missing = ds.tmin.sel(time=ds['time.day'] > 15).reindex_like(ds)

In [20]: filled = some_missing.groupby('time.month').fillna(climatology.tmin)

In [21]: both = xr.Dataset({'some_missing': some_missing, 'filled': filled})

In [22]: both
Out[22]: 
<xarray.Dataset>
Dimensions:       (location: 3, time: 731)
Coordinates:
  * time          (time) datetime64[ns] 2000-01-01 2000-01-02 ... 2001-12-31
  * location      (location) object 'IA' 'IN' 'IL'
    month         (time) int64 1 1 1 1 1 1 1 1 1 ... 12 12 12 12 12 12 12 12 12
Data variables:
    some_missing  (time, location) float64 nan nan nan ... 2.063 -1.346 -4.544
    filled        (time, location) float64 -5.163 -4.216 ... -1.346 -4.544

In [23]: df = both.sel(time='2000').mean('location').reset_coords(drop=True).to_dataframe()

In [24]: df[['filled', 'some_missing']].plot()
Out[24]: <matplotlib.axes._subplots.AxesSubplot at 0x7fae2b363710>

Toy weather data¶

Examine a dataset with pandas and seaborn¶

Probability of freeze by calendar month¶

Monthly averaging¶

Calculate monthly anomalies¶

Calculate standardized monthly anomalies¶

Fill missing values with climatology¶