xskillscore.sign_test

xskillscore.sign_test(forecasts1, forecasts2, observations=None, time_dim='time', dim=[], alpha=0.05, metric=None, orientation='negative')

Returns the Delsole and Tippett sign test over the given time dimension.

The sign test can be applied to a wide class of measures of forecast quality, including ordered (ranked) categorical data. It is independent of distributional assumptions about the forecast errors. This is different than alternative measures like correlation and mean square error, which assume that the metrics were computed from independent samples. However, skill metrics computed over a common period with a common set of observations are not independent. For example, different forecasts tend to bust for the same event. This procedure is equivalent to testing whether a coin is fair based on the frequency of heads. The null hypothesis is that the difference between the median scores is zero.

Parameters:

forecasts1 (xarray.Dataset or xarray.DataArray) – forecasts1 to be compared to observations
forecasts2 (xarray.Dataset or xarray.DataArray) – forecasts2 to be compared to observations
observations (xarray.Dataset or xarray.DataArray or None) – observation to be compared to both forecasts. Only used if metric is provided, otherwise it is assumed that both forecasts have already been compared to observations and this input is ignored. Please adjust orientation accordingly. Defaults to None.
time_dim (str) – time dimension of dimension over which to compute the random walk. This dimension is not reduced, unlike in other xskillscore functions. Defaults to 'time'.
dim (str or list of str) – dimensions to apply metric to if metric is provided. Cannot contain time_dim. Ignored if metric is None. Defaults to [].
alpha (float) – significance level for random walk.
metric (callable, str, optional) – metric to compare forecast# with observations if metric is not None. If metric is None, assume that forecast# have been compared observations before using sign_test. Make sure to adjust orientation if metric is None. Use metric=categorical, if the winning forecast should only be rewarded a point if it exactly equals the observations. Also allows strings to be convered to xskillscore.{metric}. Defaults to None.
orientation (str) – One of ['positive', 'negative']. Which skill values correspond to better skill? Smaller values ('negative') or larger values ('positive')? Defaults to 'negative'. Ignored if metric==categorical.

Returns:

xarray.DataArray or xarray.Dataset – boolean whether forecast1 is significantly different to forecast2.
xarray.DataArray or xarray.Dataset – walk values shows how often forecast1 is better forecast2.
xarray.DataArray or xarray.Dataset – confidence boundary for a random walk at significance level alpha.

Examples

>>> f1 = xr.DataArray(np.random.normal(size=(30)), coords=[("time", np.arange(30))])
>>> f2 = f1 + 2
>>> o = xr.DataArray(np.random.normal(size=(30)), coords=[("time", np.arange(30))])
>>> significantly_different, walk, confidence = xs.sign_test(
...     f1, f2, o, time_dim="time", metric="mae", orientation="negative"
... )
>>> walk.plot()
[<matplotlib.lines.Line2D object at 0x...>]
>>> confidence.plot(color="gray")
[<matplotlib.lines.Line2D object at 0x...>]
>>> (-1 * confidence).plot(color="gray")
[<matplotlib.lines.Line2D object at 0x...>]
>>> walk
<xarray.DataArray (time: 30)> Size: 240B
array([ 1,  0,  1,  2,  1,  2,  3,  4,  5,  6,  5,  6,  7,  6,  7,  8,  9,
       10,  9, 10, 11, 12, 13, 12, 11, 12, 13, 14, 15, 14])
Coordinates:
  * time     (time) int64 240B 0 1 2 3 4 5 6 7 8 ... 21 22 23 24 25 26 27 28 29
>>> significantly_different
<xarray.DataArray (time: 30)> Size: 30B
array([False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True])
Coordinates:
  * time     (time) int64 240B 0 1 2 3 4 5 6 7 8 ... 21 22 23 24 25 26 27 28 29
    alpha    float64 8B 0.05

References

DelSole, T., & Tippett, M. K. (2016). Forecast Comparison Based on Random Walks. Monthly Weather Review, 144(2), 615–626. doi: 10/f782pf