xskillscore.rps¶

xskillscore.rps(observations, forecasts, category_edges, dim=None, fair=False, weights=None, keep_attrs=False, member_dim='member')¶

Calculate Ranked Probability Score.

\[RPS = \sum_{m=1}^{M}[(\sum_{k=1}^{m} y_k) - (\sum_{k=1}^{m} o_k)]^{2}\]

where y and o are forecast and observation probabilities in M categories.

Note

Takes the sum over all categories as in Weigel et al. 2007 and not the mean as in https://www.cawcr.gov.au/projects/verification/verif_web_page.html#RPS. Therefore RPS has no upper boundary.

Parameters

observations (xarray.Dataset or xarray.DataArray) – The observations of the event. Further requirements are specified based on category_edges.
forecasts (xarray.Dataset or xarray.DataArray) – The forecast of the event with dimension specified by member_dim. Further requirements are specified based on category_edges.
category_edges (array_like, xr.Dataset, xr.DataArray, None) –
Edges (left-edge inclusive) of the bins used to calculate the cumulative density function (cdf). Note that here the bins have to include the full range of observations and forecasts data. Effectively, negative infinity is appended to the left side of category_edges, and positive infinity is appended to the right side. Thus, N category edges produces N+1 bins. For example, specifying category_edges = [0,1] will compute the RPS for bins [-inf, 0), [0, 1) and [1, inf), which results in CDF bins [-inf, 0), [-inf, 1) and [-inf, inf). Note that the edges are right-edge exclusive. Forecasts, observations and category_edge are expected in absolute units or probabilities consistently.
- np.array (1d): will be internally converted and broadcasted to observations. Use this if you wish to use the same category edges for all elements of both forecasts and observations.
- xr.Dataset/xr.DataArray: edges of the categories provided as dimension category_edge with optional category labels as category_edge coordinate. Use xr.Dataset/xr.DataArray if edges multi-dimensional and vary across dimensions. Use this if your category edges vary across dimensions of forecasts and observations, but are the same for both.
- tuple of np.array/xr.Dataset/xr.DataArray: same as above, where the first item is taken as category_edges for observations and the second item for category_edges for forecasts. Use this if your category edges vary across dimensions of forecasts and observations, and are different for each.
- None: expect than observations and forecasts are already CDFs containing category_edge dimension. Use this if your category edges vary across dimensions of forecasts and observations, and are different for each.
dim (str or list of str, optional) – Dimension over which to mean after computing rps. This represents a mean over multiple forecasts-observations pairs. Defaults to None implying averaging over all dimensions.
fair (boolean) – Apply ensemble member-size adjustment for unbiased, fair metric; see Ferro (2013).
weights (xr.DataArray with dimensions from dim, optional) – Weights for weighted.mean(dim). Defaults to None, such that no weighting is applied.
keep_attrs (bool) – If True, the attributes (attrs) will be copied from the first input to the new one. If False (default), the new object will be returned without attributes.
member_dim (str, optional) – Name of ensemble member dimension. By default, ‘member’.

Returns

ranked probability score with coords forecasts_category_edge and observations_category_edge as str

Return type

xarray.Dataset or xarray.DataArray

Examples

>>> observations = xr.DataArray(np.random.random(size=(3, 3)),
...                             coords=[('x', np.arange(3)),
...                                     ('y', np.arange(3))])
>>> forecasts = xr.DataArray(np.random.random(size=(3, 3, 3)),
...                          coords=[('x', np.arange(3)),
...                                  ('y', np.arange(3)),
...                                  ('member', np.arange(3))])
>>> category_edges = np.array([.33, .66])
>>> xs.rps(observations, forecasts, category_edges, dim='x')
<xarray.DataArray (y: 3)>
array([0.14814815, 0.7037037 , 1.51851852])
Coordinates:
  * y                           (y) int64 0 1 2
    forecasts_category_edge     <U38 '[-np.inf, 0.33), [0.33, 0.66), [0.66, np.inf]'
    observations_category_edge  <U38 '[-np.inf, 0.33), [0.33, 0.66), [0.66, np.inf]'

You can also define multi-dimensional category_edges, e.g. with xr.quantile. However, you still need to ensure that category_edges covers the forecasts and observations distributions.

>>> category_edges = observations.quantile(
...     q=[.33, .66]).rename({'quantile': 'category_edge'}),
>>> xs.rps(observations, forecasts, category_edges, dim='x')
<xarray.DataArray (y: 3)>
array([1.18518519, 0.85185185, 0.40740741])
Coordinates:
  * y                           (y) int64 0 1 2
    forecasts_category_edge     <U38 '[-np.inf, 0.33), [0.33, 0.66), [0.66, np.inf]'
    observations_category_edge  <U38 '[-np.inf, 0.33), [0.33, 0.66), [0.66, np.inf]'

References

Weigel, A. P., Liniger, M. A., & Appenzeller, C. (2007). The Discrete Brier and Ranked Probability Skill Scores. Monthly Weather Review, 135(1), 118–124. doi: 10/b59qz5
C. A. T. Ferro. Fair scores for ensemble forecasts. Q.R.J. Meteorol. Soc., 140: 1917–1923, 2013. doi: 10.1002/qj.2270.
https://www-miklip.dkrz.de/about/problems/