arviz_stats.summary

Contents

arviz_stats.summary#

arviz_stats.summary(data, var_names=None, filter_vars=None, group='posterior', coords=None, sample_dims=None, kind='all', fmt='wide', ci_prob=None, ci_kind=None, round_to='auto', skipna=False)[source]#

Create a data frame with summary statistics and or diagnostics.

Parameters:
dataxarray.DataTree, DataSet or InferenceData
var_nameslist of str, optional

Names of variables to include in summary. If None all variables are included.

filter_vars: {None, “like”, “regex”}, default None

Used for var_names only. If None (default), interpret var_names as the real variables names. If “like”, interpret var_names as substrings of the real variables names. If “regex”, interpret var_names as regular expressions on the real variables names.

group: str

Select a group for summary. Defaults to “posterior”.

coordsdict, optional

Coordinates defining a subset over the selected group.

sample_dimsstr or sequence of hashable, optional

Defaults to rcParams["data.sample_dims"]

kind: {‘all’, ‘stats’, ‘diagnostics’, ‘all_median’, ‘stats_median’,
‘diagnostics_median’, ‘mc_diagnostics’}, default ‘all’
  • all: mean, sd, ci, ess_bulk, ess_tail, r_hat, mcse_mean, mcse_sd.

  • stats: mean, sd, and ci.

  • diagnostics: ess_bulk, ess_tail, r_hat, mcse_mean, mcse_sd.

  • all_median: median, mad, ci, ess_median, ess_tail, r_hat, mcse_median.

  • stats_median: median, mad, and ci.

  • diagnostics_median: ess_median, ess_tail, r_hat, mcse_median.

  • mc_diagnostics: mcse_mean, ess_mean, and min_ss.

fmt: {‘wide’, ‘long’, ‘xarray’}

Return format is either pandas.DataFrame {‘wide’, ‘long’} or xarray.Dataset {‘xarray’}.

ci_probfloat, optional

Probability for the credible interval. Defaults to rcParams["stats.ci_prob"].

ci_kind{“hdi”, “eti”}, optional

Type of credible interval. Defaults to rcParams["stats.ci_kind"]. If kind is stats_median or all_median, ci_kind is forced to “eti”.

round_toint or {“auto”, “none”}, optional

Rounding specification. Defaults to “auto”. If integer, number of decimal places to round to. If “none”, no rounding is applied. If “auto”, and fmt is “xarray” defaults to rcParams["stats.round_to"]. If “auto” and fmt is in {“wide”, “long”}, applies the following rounding rules:

  • ESS values (ess_bulk, ess_tail, ess_mean, ess_median, min_ss) are rounded down to int

  • R-hat always shows 2 digits after the decimal

  • If a column stat and mcse_stat are both present then the mcse is shown to 2 significant figures, and stat is shown with precision based on 2*mcse.

  • All other floating point numbers are shown following rcParams["stats.round_to"].

  • For all floating point numbers except R-hat, trailing zeros are removed and values are converted to string for consistent display.

Note: “auto” is intended for display purposes, using it is not recommended when the output will be used for further numerical computations.

skipna: bool

If true ignores nan values when computing the summary statistics. Defaults to false.

Returns:
pandas.DataFrame or xarray.Dataset

Return type determined by fmt argument.

See also

rhat

Compute estimate of rank normalized split R-hat for a set of traces.

ess

Calculate the effective sample size of a set of traces.

mcse

Calculate Markov Chain Standard Error statistic.

plot_ess

Plot quantile, local or evolution of effective sample sizes (ESS).

plot_mcse

Plot quantile, local or evolution of Markov Chain Standard Error (MCSE).

Examples

In [1]: from arviz_base import load_arviz_data
   ...: from arviz_stats import summary
   ...: data = load_arviz_data("non_centered_eight")
   ...: summary(data, var_names=["mu", "tau"])
   ...: 
Out[1]: 
    mean   sd eti89_lb eti89_ub  ess_bulk  ess_tail r_hat mcse_mean mcse_sd
mu   4.3  3.3    -0.75      9.4      2114      1219  1.00     0.072   0.052
tau  3.5  3.2     0.22      9.4       833       712  1.00     0.091    0.13

You can use filter_vars to select variables without having to specify all the exact names. Use filter_vars="like" to select based on partial naming:

In [2]: summary(data, var_names=["the"], filter_vars="like")
Out[2]: 
                            mean    sd eti89_lb  ... r_hat  mcse_mean  mcse_sd
theta_t[Choate]             0.33     1     -1.3  ...  1.00      0.021    0.015
theta_t[Deerfield]           0.1  0.95     -1.4  ...  1.00      0.018    0.013
theta_t[Phillips Andover]  -0.08  0.97     -1.6  ...  1.00      0.019    0.013
theta_t[Phillips Exeter]    0.05  0.93     -1.4  ...  1.00      0.019    0.014
theta_t[Hotchkiss]         -0.15  0.91     -1.6  ...  1.00       0.02    0.014
theta_t[Lawrenceville]     -0.03  0.94     -1.6  ...  1.01      0.019    0.014
theta_t[St. Paul's]         0.35  0.99     -1.2  ...  1.00      0.022    0.015
theta_t[Mt. Hermon]         0.05  0.99     -1.6  ...  1.00       0.02    0.015
theta[Choate]                6.1   5.3     -1.1  ...  1.00       0.12     0.12
theta[Deerfield]             4.9   4.7       -2  ...  1.00      0.097      0.1
theta[Phillips Andover]      3.8   5.3     -4.6  ...  1.00       0.12     0.15
theta[Phillips Exeter]       4.6   4.7     -2.7  ...  1.00        0.1    0.091
theta[Hotchkiss]             3.6   4.6     -4.1  ...  1.00        0.1    0.091
theta[Lawrenceville]         4.3   4.8     -2.9  ...  1.00        0.1    0.097
theta[St. Paul's]            6.2   5.2    -0.64  ...  1.00       0.14     0.16
theta[Mt. Hermon]            4.7   5.2     -2.8  ...  1.00       0.12     0.12

[16 rows x 9 columns]

Use filter_vars="regex" to select based on regular expressions, and prefix the variables you want to exclude by ~. Here, we exclude from the summary all the variables starting with the letter t:

In [3]: summary(data, var_names=["~^t"], filter_vars="regex")
Out[3]: 
   mean   sd eti89_lb eti89_ub  ess_bulk  ess_tail r_hat mcse_mean mcse_sd
mu  4.3  3.3    -0.75      9.4      2114      1219  1.00     0.072   0.052