xskillscore.rps
- xskillscore.rps(observations, forecasts, category_edges, dim=None, fair=False, weights=None, keep_attrs=False, member_dim='member', input_distributions=None)
Calculate Ranked Probability Score.
\[RPS = \sum_{m=1}^{M}[(\sum_{k=1}^{m} y_k) - (\sum_{k=1}^{m} o_k)]^{2}\]where
y
ando
are forecast and observation probabilities inM
categories.Note
Takes the sum over all categories as in Weigel et al. 2007 and not the mean as in https://www.cawcr.gov.au/projects/verification/verif_web_page.html#RPS. Therefore RPS has no upper boundary.
- Parameters:
observations (xarray.Dataset or xarray.DataArray) – The observations of the event. Further requirements are specified based on
category_edges
.forecasts (xarray.Dataset or xarray.DataArray) – The forecast of the event with dimension specified by
member_dim
. Further requirements are specified based oncategory_edges
.category_edges (array_like, xr.Dataset, xr.DataArray, None) –
If forecasts and observations are probabilistic, use
category_edges=None
and setinput_distributions
. If forecasts are deterministic and given in absolute units, setcategory_edges
and leaveinput_distributions=None
.Edges (left-edge inclusive) of the bins used to calculate the cumulative density function (cdf). Note that here the bins have to include the full range of observations and forecasts data. Effectively, negative infinity is appended to the left side of category_edges, and positive infinity is appended to the right side. Thus, N category edges produces N+1 bins. For example, specifying category_edges = [0,1] will compute the RPS for bins [-inf, 0), [0, 1) and [1, inf), which results in CDF bins [-inf, 0), [-inf, 1) and [-inf, inf). Note that the edges are right-edge exclusive. Forecasts, observations and category_edge are expected in absolute units or probabilities consistently. dtypes are handled accordingly:
np.ndarray (1d): will be internally converted and broadcasted to observations. Use this if you wish to use the same category edges for all elements of both forecasts and observations.
xr.Dataset/xr.DataArray: edges of the categories provided as dimension
category_edge
with optional category labels ascategory_edge
coordinate. Use xr.Dataset/xr.DataArray if edges multi-dimensional and vary across dimensions. Use this if your category edges vary across dimensions of forecasts and observations, but are the same for both.tuple of np.ndarray/xr.Dataset/xr.DataArray: same as above, where the first item is taken as
category_edges
for observations and the second item forcategory_edges
for forecasts. Use this if your category edges vary across dimensions of forecasts and observations, and are different for each.None: expect that observations and forecasts are already cumulative distribution functions (
c
) or probability density functions (p
), as specified by theinput_distributions
arg. In this case, the inputs observations and forecasts must contain a dimension namedcategory
with the bin values of the density function.
dim (str or list of str, optional) – Dimension over which to mean after computing
rps
. This represents a mean over multiple forecasts-observations pairs. Defaults to None implying averaging over all dimensions.fair (boolean) – Apply ensemble member-size adjustment for unbiased, fair metric; see Ferro (2013). If
fair==True
andcategory_edges==None
, require forecasts to have number of members in coordinates as forecasts[member_dim].weights (xr.DataArray with dimensions from dim, optional) – Weights for weighted.mean(dim). Defaults to None, such that no weighting is applied.
keep_attrs (bool) – If True, the attributes (attrs) will be copied from the first input to the new one. If False (default), the new object will be returned without attributes.
member_dim (str, optional) – Name of ensemble member dimension. By default, ‘member’.
input_distributions (str or None) – Indicates whether observations and forecasts are probability distributions by
p
or cumulative distributions byc
. Only valid if category_edges is None. Defaults: None.
- Returns:
ranked probability score with coords
forecasts_category_edge
andobservations_category_edge
as str- Return type:
Examples
In the examples below o is used for observations and f for forecasts.
>>> o = xr.DataArray( ... np.random.random(size=(3, 3)), ... coords=[("x", np.arange(3)), ("y", np.arange(3))], ... ) >>> f = xr.DataArray( ... np.random.random(size=(3, 3, 3)), ... coords=[("x", np.arange(3)), ("y", np.arange(3)), ("member", np.arange(3))], ... ) >>> category_edges = np.array([0.33, 0.66]) >>> xs.rps(o, f, category_edges, dim="x") <xarray.DataArray (y: 3)> Size: 24B array([0.37037037, 0.81481481, 1. ]) Coordinates: * y (y) int64 24B 0 1 2 observations_category_edge <U45 180B '[-np.inf, 0.33), [0.33, 0.66), [0.... forecasts_category_edge <U45 180B '[-np.inf, 0.33), [0.33, 0.66), [0....
You can also define multi-dimensional
category_edges
, e.g. with xr.quantile. However, you still need to ensure thatcategory_edges
covers the forecasts and observations distributions.>>> category_edges = o.quantile(q=[0.33, 0.66]).rename( ... {"quantile": "category_edge"} ... ) >>> xs.rps(o, f, category_edges, dim="x") <xarray.DataArray (y: 3)> Size: 24B array([0.37037037, 0.81481481, 0.88888889]) Coordinates: * y (y) int64 24B 0 1 2 observations_category_edge <U45 180B '[-np.inf, 0.33), [0.33, 0.66), [0.... forecasts_category_edge <U45 180B '[-np.inf, 0.33), [0.33, 0.66), [0....
If you have probabilistic forecasts, i.e. without a
member
dimension but differentcategory
probabilities, you can also provide probability distribution inputs by specifyingcategory_edges=None
andinput_distributions
:>>> category_edges = category_edges.rename({"category_edge": "category"}) >>> categories = ["below normal", "normal", "above_normal"] >>> o_p = xr.concat( ... [ ... (o < category_edges.isel(category=0)).assign_coords( ... category=categories[0] ... ), ... ( ... (o >= category_edges.isel(category=0)) ... & (o < category_edges.isel(category=1)) ... ).assign_coords(category=categories[1]), ... (o >= category_edges.isel(category=1)).assign_coords( ... category=categories[2] ... ), ... ], ... "category", ... ) >>> assert (o_p.sum("category") == 1).all() >>> f_p = xr.concat( ... [ ... (f < category_edges.isel(category=0)).assign_coords( ... category=categories[0] ... ), ... ( ... (f >= category_edges.isel(category=0)) ... & (f < category_edges.isel(category=1)) ... ).assign_coords(category=categories[1]), ... (f >= category_edges.isel(category=1)).assign_coords( ... category=categories[2] ... ), ... ], ... "category", ... ).mean("member") >>> assert (f_p.sum("category") == 1).all() >>> xs.rps(o_p, f_p, category_edges=None, dim="x", input_distributions="p") <xarray.DataArray (y: 3)> Size: 24B array([0.37037037, 0.81481481, 0.88888889]) Coordinates: * y (y) int64 24B 0 1 2
Providing cumulative distribution inputs yields identical results, where highest category equals 1 by default and can be ignored:
>>> o_c = o < category_edges >>> f_c = (f < category_edges).mean("member") >>> xs.rps(o_c, f_c, category_edges=None, dim="x", input_distributions="c") <xarray.DataArray (y: 3)> Size: 24B array([0.37037037, 0.81481481, 0.88888889]) Coordinates: * y (y) int64 24B 0 1 2
References
Weigel, A. P., Liniger, M. A., & Appenzeller, C. (2007). The Discrete Brier and Ranked Probability Skill Scores. Monthly Weather Review, 135(1), 118–124. doi: 10/b59qz5
C. A. T. Ferro. Fair scores for ensemble forecasts. Q.R.J. Meteorol. Soc., 140: 1917–1923, 2013. doi: 10.1002/qj.2270.