
xskillscore.mae_test(forecasts1, forecasts2, observations=None, dim=[], time_dim='time', alpha=0.05)

Returns the Jolliffe and Ebert MAE significance test.

Tests whether forecasts1 and forecasts2 have different mean absolute error (MAE) at significance level alpha. https://www.cawcr.gov.au/projects/verification/CIdiff/FAQ-CIdiff.html


alpha is the desired significance level and the maximum acceptable risk of falsely rejecting the null-hypothesis. The smaller the value of α the greater the strength of the test. The confidence level of the test is defined as 1 - alpha, and often expressed as a percentage. So for example a significance level of 0.05, is equivalent to a 95% confidence level. Source: NIST/SEMATECH e-Handbook of Statistical Methods. https://www.itl.nist.gov/div898/handbook/prc/section1/prc14.htm

  • forecasts1 (xarray.Dataset or xarray.DataArray) – first forecast to be compared to the observations

  • forecasts2 (xarray.Dataset or xarray.DataArray) – second forecast to be compared to the observations

  • observations (xarray.Dataset or xarray.DataArray or None) – observations to be compared to both forecasts. if None, assumes that arguments forecasts1 and forecasts2 are already MAEs. Defaults to None.

  • time_dim (str) – time dimension of dimension over which to compute the temporal correlation. Defaults to 'time'.

  • dim (str or list of str) – dimensions to apply MAE to. Cannot contain time_dim. Defaults to [].

  • alpha (float) – significance level alpha that forecast1 is different than forecast2.


  • xarray.DataArray or xarray.Dataset – is the difference in MAE significant? boolean returns

  • xarray.DataArray or xarray.Dataset – Difference in xs.mae reduced by dim and time_dim

  • xarray.DataArray or xarray.Dataset – half-width of the confidence interval at the significance level alpha.


>>> np.random.seed(42)
>>> f1 = xr.DataArray(np.random.normal(size=(30)),
...      coords=[('time', np.arange(30))])
>>> f2 = xr.DataArray(np.random.normal(size=(30)),
...      coords=[('time', np.arange(30))])
>>> o = xr.DataArray(np.random.normal(size=(30)),
...      coords=[('time', np.arange(30))])
>>> significantly_different, diff, hwci = mae_test(f1, f2, o, time_dim='time',
        dim=[], alpha=0.05)
>>> significantly_different
<xarray.DataArray ()>
>>> diff
<xarray.DataArray ()>
>>> hwci
<xarray.DataArray ()>
>>> # absolute magnitude of difference is smaller than half-width of
>>> # confidence interval, therefore not significant at level alpha=0.05
>>> # now comparing against an offset f2, the difference in MAE is significant
>>> significantly_different, diff, hwci = mae_test(f1, f2 + 2., o, time_dim='time',
        dim=[], alpha=0.05)
>>> significantly_different
<xarray.DataArray ()>
