arviz_stats.summary#
- arviz_stats.summary(data, var_names=None, filter_vars=None, group='posterior', coords=None, sample_dims=None, kind='all', fmt='wide', ci_prob=None, ci_kind=None, round_to='auto', skipna=False)[source]#
Create a data frame with summary statistics and or diagnostics.
- Parameters:
- data
xarray.DataTree,DataSetorInferenceData - var_names
listofstr, optional Names of variables to include in summary. If None all variables are included.
- filter_vars: {None, “like”, “regex”}, default None
Used for var_names only. If
None(default), interpret var_names as the real variables names. If “like”, interpret var_names as substrings of the real variables names. If “regex”, interpret var_names as regular expressions on the real variables names.- group: str
Select a group for summary. Defaults to “posterior”.
- coords
dict, optional Coordinates defining a subset over the selected group.
- sample_dims
stror sequence ofhashable, optional Defaults to
rcParams["data.sample_dims"]- kind: {‘all’, ‘stats’, ‘diagnostics’, ‘all_median’, ‘stats_median’,
- ‘diagnostics_median’, ‘mc_diagnostics’}, default ‘all’
all:mean, sd, ci, ess_bulk, ess_tail, r_hat, mcse_mean, mcse_sd.stats:mean, sd, and ci.diagnostics: ess_bulk, ess_tail, r_hat, mcse_mean, mcse_sd.all_median:median, mad, ci, ess_median, ess_tail, r_hat, mcse_median.stats_median:median, mad, and ci.diagnostics_median: ess_median, ess_tail, r_hat, mcse_median.mc_diagnostics: mcse_mean, ess_mean, and min_ss.
- fmt: {‘wide’, ‘long’, ‘xarray’}
Return format is either pandas.DataFrame {‘wide’, ‘long’} or xarray.Dataset {‘xarray’}.
- ci_prob
float, optional Probability for the credible interval. Defaults to
rcParams["stats.ci_prob"].- ci_kind{“hdi”, “eti”}, optional
Type of credible interval. Defaults to
rcParams["stats.ci_kind"]. If kind is stats_median or all_median, ci_kind is forced to “eti”.- round_to
intor {“auto”, “none”}, optional Rounding specification. Defaults to “auto”. If integer, number of decimal places to round to. If “none”, no rounding is applied. If “auto”, and fmt is “xarray” defaults to
rcParams["stats.round_to"]. If “auto” and fmt is in {“wide”, “long”}, applies the following rounding rules:ESS values (ess_bulk, ess_tail, ess_mean, ess_median, min_ss) are rounded down to int
R-hat always shows 2 digits after the decimal
If a column stat and mcse_stat are both present then the mcse is shown to 2 significant figures, and stat is shown with precision based on 2*mcse.
All other floating point numbers are shown following
rcParams["stats.round_to"].For all floating point numbers except R-hat, trailing zeros are removed and values are converted to string for consistent display.
Note: “auto” is intended for display purposes, using it is not recommended when the output will be used for further numerical computations.
- skipna: bool
If true ignores nan values when computing the summary statistics. Defaults to false.
- data
- Returns:
pandas.DataFrameorxarray.DatasetReturn type determined by fmt argument.
See also
rhatCompute estimate of rank normalized split R-hat for a set of traces.
essCalculate the effective sample size of a set of traces.
mcseCalculate Markov Chain Standard Error statistic.
plot_essPlot quantile, local or evolution of effective sample sizes (ESS).
plot_mcsePlot quantile, local or evolution of Markov Chain Standard Error (MCSE).
Examples
In [1]: from arviz_base import load_arviz_data ...: from arviz_stats import summary ...: data = load_arviz_data("non_centered_eight") ...: summary(data, var_names=["mu", "tau"]) ...: Out[1]: mean sd eti89_lb eti89_ub ess_bulk ess_tail r_hat mcse_mean mcse_sd mu 4.3 3.3 -0.75 9.4 2114 1219 1.00 0.072 0.052 tau 3.5 3.2 0.22 9.4 833 712 1.00 0.091 0.13
You can use
filter_varsto select variables without having to specify all the exact names. Usefilter_vars="like"to select based on partial naming:In [2]: summary(data, var_names=["the"], filter_vars="like") Out[2]: mean sd eti89_lb ... r_hat mcse_mean mcse_sd theta_t[Choate] 0.33 1 -1.3 ... 1.00 0.021 0.015 theta_t[Deerfield] 0.1 0.95 -1.4 ... 1.00 0.018 0.013 theta_t[Phillips Andover] -0.08 0.97 -1.6 ... 1.00 0.019 0.013 theta_t[Phillips Exeter] 0.05 0.93 -1.4 ... 1.00 0.019 0.014 theta_t[Hotchkiss] -0.15 0.91 -1.6 ... 1.00 0.02 0.014 theta_t[Lawrenceville] -0.03 0.94 -1.6 ... 1.01 0.019 0.014 theta_t[St. Paul's] 0.35 0.99 -1.2 ... 1.00 0.022 0.015 theta_t[Mt. Hermon] 0.05 0.99 -1.6 ... 1.00 0.02 0.015 theta[Choate] 6.1 5.3 -1.1 ... 1.00 0.12 0.12 theta[Deerfield] 4.9 4.7 -2 ... 1.00 0.097 0.1 theta[Phillips Andover] 3.8 5.3 -4.6 ... 1.00 0.12 0.15 theta[Phillips Exeter] 4.6 4.7 -2.7 ... 1.00 0.1 0.091 theta[Hotchkiss] 3.6 4.6 -4.1 ... 1.00 0.1 0.091 theta[Lawrenceville] 4.3 4.8 -2.9 ... 1.00 0.1 0.097 theta[St. Paul's] 6.2 5.2 -0.64 ... 1.00 0.14 0.16 theta[Mt. Hermon] 4.7 5.2 -2.8 ... 1.00 0.12 0.12 [16 rows x 9 columns]
Use
filter_vars="regex"to select based on regular expressions, and prefix the variables you want to exclude by~. Here, we exclude from the summary all the variables starting with the letter t:In [3]: summary(data, var_names=["~^t"], filter_vars="regex") Out[3]: mean sd eti89_lb eti89_ub ess_bulk ess_tail r_hat mcse_mean mcse_sd mu 4.3 3.3 -0.75 9.4 2114 1219 1.00 0.072 0.052