xskillscore.rps

xskillscore.rps(observations, forecasts, category_edges, dim=None, fair=False, weights=None, keep_attrs=False, member_dim='member', input_distributions=None)

Calculate Ranked Probability Score.

\[RPS = \sum_{m=1}^{M}[(\sum_{k=1}^{m} y_k) - (\sum_{k=1}^{m} o_k)]^{2}\]

where y and o are forecast and observation probabilities in M categories.

Note

Takes the sum over all categories as in Weigel et al. 2007 and not the mean as in https://www.cawcr.gov.au/projects/verification/verif_web_page.html#RPS. Therefore RPS has no upper boundary.

Parameters:

observations (xarray.Dataset or xarray.DataArray) – The observations of the event. Further requirements are specified based on category_edges.
forecasts (xarray.Dataset or xarray.DataArray) – The forecast of the event with dimension specified by member_dim. Further requirements are specified based on category_edges.
category_edges (array_like, xr.Dataset, xr.DataArray, None) –
If forecasts and observations are probabilistic, use category_edges=None and set input_distributions. If forecasts are deterministic and given in absolute units, set category_edges and leave input_distributions=None.

Edges (left-edge inclusive) of the bins used to calculate the cumulative density function (cdf). Note that here the bins have to include the full range of observations and forecasts data. Effectively, negative infinity is appended to the left side of category_edges, and positive infinity is appended to the right side. Thus, N category edges produces N+1 bins. For example, specifying category_edges = [0,1] will compute the RPS for bins [-inf, 0), [0, 1) and [1, inf), which results in CDF bins [-inf, 0), [-inf, 1) and [-inf, inf). Note that the edges are right-edge exclusive. Forecasts, observations and category_edge are expected in absolute units or probabilities consistently. dtypes are handled accordingly:
- np.ndarray (1d): will be internally converted and broadcasted to observations. Use this if you wish to use the same category edges for all elements of both forecasts and observations.
- xr.Dataset/xr.DataArray: edges of the categories provided as dimension category_edge with optional category labels as category_edge coordinate. Use xr.Dataset/xr.DataArray if edges multi-dimensional and vary across dimensions. Use this if your category edges vary across dimensions of forecasts and observations, but are the same for both.
- tuple of np.ndarray/xr.Dataset/xr.DataArray: same as above, where the first item is taken as category_edges for observations and the second item for category_edges for forecasts. Use this if your category edges vary across dimensions of forecasts and observations, and are different for each.
- None: expect that observations and forecasts are already cumulative distribution functions (c) or probability density functions (p), as specified by the input_distributions arg. In this case, the inputs observations and forecasts must contain a dimension named category with the bin values of the density function.
dim (str or list of str, optional) – Dimension over which to mean after computing rps. This represents a mean over multiple forecasts-observations pairs. Defaults to None implying averaging over all dimensions.
fair (boolean) – Apply ensemble member-size adjustment for unbiased, fair metric; see Ferro (2013). If fair==True and category_edges==None, require forecasts to have number of members in coordinates as forecasts[member_dim].
weights (xr.DataArray with dimensions from dim, optional) – Weights for weighted.mean(dim). Defaults to None, such that no weighting is applied.
keep_attrs (bool) – If True, the attributes (attrs) will be copied from the first input to the new one. If False (default), the new object will be returned without attributes.
member_dim (str, optional) – Name of ensemble member dimension. By default, ‘member’.
input_distributions (str or None) – Indicates whether observations and forecasts are probability distributions by p or cumulative distributions by c. Only valid if category_edges is None. Defaults: None.

Returns:

ranked probability score with coords forecasts_category_edge and observations_category_edge as str

Return type:

xarray.Dataset or xarray.DataArray

Examples

In the examples below o is used for observations and f for forecasts.

>>> o = xr.DataArray(
...     np.random.random(size=(3, 3)),
...     coords=[("x", np.arange(3)), ("y", np.arange(3))],
... )
>>> f = xr.DataArray(
...     np.random.random(size=(3, 3, 3)),
...     coords=[("x", np.arange(3)), ("y", np.arange(3)), ("member", np.arange(3))],
... )
>>> category_edges = np.array([0.33, 0.66])
>>> xs.rps(o, f, category_edges, dim="x")
<xarray.DataArray (y: 3)> Size: 24B
array([0.37037037, 0.81481481, 1.        ])
Coordinates:
  * y                           (y) int64 24B 0 1 2
    observations_category_edge  <U45 180B '[-np.inf, 0.33), [0.33, 0.66), [0....
    forecasts_category_edge     <U45 180B '[-np.inf, 0.33), [0.33, 0.66), [0....

You can also define multi-dimensional category_edges, e.g. with xr.quantile. However, you still need to ensure that category_edges covers the forecasts and observations distributions.

>>> category_edges = o.quantile(q=[0.33, 0.66]).rename({"quantile": "category_edge"})
>>> xs.rps(o, f, category_edges, dim="x")
<xarray.DataArray (y: 3)> Size: 24B
array([0.37037037, 0.81481481, 0.88888889])
Coordinates:
  * y                           (y) int64 24B 0 1 2
    observations_category_edge  <U45 180B '[-np.inf, 0.33), [0.33, 0.66), [0....
    forecasts_category_edge     <U45 180B '[-np.inf, 0.33), [0.33, 0.66), [0....

If you have probabilistic forecasts, i.e. without a member dimension but different category probabilities, you can also provide probability distribution inputs by specifying category_edges=None and input_distributions:

>>> category_edges = category_edges.rename({"category_edge": "category"})
>>> categories = ["below normal", "normal", "above_normal"]
>>> o_p = xr.concat(
...     [
...         (o < category_edges.isel(category=0)).assign_coords(category=categories[0]),
...         (
...             (o >= category_edges.isel(category=0)) & (o < category_edges.isel(category=1))
...         ).assign_coords(category=categories[1]),
...         (o >= category_edges.isel(category=1)).assign_coords(category=categories[2]),
...     ],
...     "category",
... )
>>> assert (o_p.sum("category") == 1).all()
>>> f_p = xr.concat(
...     [
...         (f < category_edges.isel(category=0)).assign_coords(category=categories[0]),
...         (
...             (f >= category_edges.isel(category=0)) & (f < category_edges.isel(category=1))
...         ).assign_coords(category=categories[1]),
...         (f >= category_edges.isel(category=1)).assign_coords(category=categories[2]),
...     ],
...     "category",
... ).mean("member")
>>> assert (f_p.sum("category") == 1).all()
>>> xs.rps(o_p, f_p, category_edges=None, dim="x", input_distributions="p")
<xarray.DataArray (y: 3)> Size: 24B
array([0.37037037, 0.81481481, 0.88888889])
Coordinates:
  * y        (y) int64 24B 0 1 2

Providing cumulative distribution inputs yields identical results, where highest category equals 1 by default and can be ignored:

>>> o_c = o < category_edges
>>> f_c = (f < category_edges).mean("member")
>>> xs.rps(o_c, f_c, category_edges=None, dim="x", input_distributions="c")
<xarray.DataArray (y: 3)> Size: 24B
array([0.37037037, 0.81481481, 0.88888889])
Coordinates:
  * y        (y) int64 24B 0 1 2

References

Weigel, A. P., Liniger, M. A., & Appenzeller, C. (2007). The Discrete Brier and Ranked Probability Skill Scores. Monthly Weather Review, 135(1), 118–124. doi: 10/b59qz5
C. A. T. Ferro. Fair scores for ensemble forecasts. Q.R.J. Meteorol. Soc., 140: 1917–1923, 2013. doi: 10.1002/qj.2270.
https://www-miklip.dkrz.de/about/problems/