xskillscore.effective_sample_size

xskillscore.effective_sample_size(a, b, dim='time', skipna=False, keep_attrs=False)

Effective sample size for temporally correlated data.

Note

This metric should only be applied over the time dimension, since it is designed for temporal autocorrelation. Weights are not included due to the reliance on temporal autocorrelation.

The effective sample size extracts the number of independent samples between two time series being correlated. This is derived by assessing the magnitude of the lag-1 autocorrelation coefficient in each of the time series being correlated. A higher autocorrelation induces a lower effective sample size which raises the correlation coefficient for a given p value.

\[N_{eff} = N\left( \frac{1 - \rho_{f}\rho_{o}}{1 + \rho_{f}\rho_{o}} \right),\]

where \(\rho_{f}\) and \(\rho_{o}\) are the lag-1 autocorrelation coefficients for the forecast and observations.

Parameters
  • a (xarray.Dataset or xarray.DataArray) – Labeled array(s) over which to apply the function.

  • b (xarray.Dataset or xarray.DataArray) – Labeled array(s) over which to apply the function.

  • dim (str, list) – The dimension(s) to apply the function along. Note that this dimension will be reduced as a result. Defaults to None reducing all dimensions.

  • skipna (bool) – If True, skip NaNs when computing function.

  • keep_attrs (bool) – If True, the attributes (attrs) will be copied from the first input to the new one. If False (default), the new object will be returned without attributes.

Returns

Effective sample size.

Return type

xarray.Dataset or xarray.DataArray

References

  • Bretherton, Christopher S., et al. “The effective number of spatial degrees of freedom of a time-varying field.” Journal of climate 12.7 (1999): 1990-2009.

  • Wilks, Daniel S. Statistical methods in the atmospheric sciences. Vol. 100. Academic press, 2011.

Examples

>>> a = xr.DataArray(np.random.rand(5, 3, 3),
...                  dims=['time', 'x', 'y'])
>>> b = xr.DataArray(np.random.rand(5, 3, 3),
...                  dims=['time', 'x', 'y'])
>>> xs.effective_sample_size(a, b, dim='time')
<xarray.DataArray (x: 3, y: 3)>
array([[4., 0., 4.],
       [3., 4., 4.],
       [3., 4., 2.]])
Dimensions without coordinates: x, y