xskillscore.halfwidth_ci_test(forecasts1, forecasts2, observations=None, metric=None, dim=None, time_dim='time', alpha=0.05, **kwargs)

Returns the Jolliffe and Ebert significance test.

Tests whether forecasts1 and forecasts2 have different distance from observations at significance level alpha. https://www.cawcr.gov.au/projects/verification/CIdiff/FAQ-CIdiff.html


alpha is the desired significance level and the maximum acceptable risk of falsely rejecting the null-hypothesis. The smaller the value of α the greater the strength of the test. The confidence level of the test is defined as 1 - alpha, and often expressed as a percentage. So for example a significance level of 0.05, is equivalent to a 95% confidence level. Source: NIST/SEMATECH e-Handbook of Statistical Methods. https://www.itl.nist.gov/div898/handbook/prc/section1/prc14.htm

  • forecasts1 (xarray.Dataset or xarray.DataArray) – first forecast to be compared to the observations.

  • forecasts2 (xarray.Dataset or xarray.DataArray) – second forecast to be compared to the observations.

  • observations (xarray.Dataset or xarray.DataArray, optional) – observations to be compared to both forecasts. if None, assumes that arguments forecasts1 and forecasts2 are already MAEs. Defaults to None.

  • metric (str, optional) – Name of distance metric function to be used for computing the error between forecasts and observation. It can be any of the xskillscore distance metric function except for mape. Valid metrics are me, rmse, mse, mae, median_absolute_error and smape. Note that if metric is None, observations must also be None. Defaults to None.

  • time_dim (str, optional) – time dimension of dimension over which to compute the temporal correlation. Defaults to 'time'.

  • dim (str or list of str, optional) – dimensions to apply metric function to. Cannot contain time_dim. Defaults to None which is then converted to [] since dim=None must not be passed to metric functions.

  • alpha (float, optional) – significance level alpha that forecast1 is different than forecast2.

  • **kwargs (dict, optional) – Optional keyword arguments passed directly on to call metric, excluding dim.


  • xarray.DataArray or xarray.Dataset – boolean whether the difference in scores (score(f2) - score(f1)) are significant.

  • xarray.DataArray or xarray.Dataset – difference in scores (score(f2) - score(f1)) reduced by dim and time_dim.

  • xarray.DataArray or xarray.Dataset – half-width of the confidence interval at the significance level alpha.


>>> f1 = xr.DataArray(np.random.normal(size=(30)),
...                   coords=[('time', np.arange(30))])
>>> f2 = xr.DataArray(np.random.normal(size=(30)),
...                   coords=[('time', np.arange(30))])
>>> o = xr.DataArray(np.random.normal(size=(30)),
...                  coords=[('time', np.arange(30))])
>>> significantly_different, diff, hwci = xs.halfwidth_ci_test(
...    f1, f2, o, "mae", time_dim='time', dim=[], alpha=0.05
... )
>>> significantly_different
<xarray.DataArray ()>
>>> diff
<xarray.DataArray ()>
>>> hwci
<xarray.DataArray ()>
>>> # absolute magnitude of difference is smaller than half-width of
>>> # confidence interval, therefore not significant at level alpha=0.05
>>> # now comparing against an offset f2, the difference in MAE is significant
>>> significantly_different, diff, hwci = xs.halfwidth_ci_test(
... f1, f2 + 2., o, "mae", time_dim='time', dim=[], alpha=0.05
... )
>>> significantly_different
<xarray.DataArray ()>