Once you have built your full specification blueprint and feel comfortable with how the pipeline is executed, you can implement a full multiverse-style analysis.
Simply use
run_multiverse(<your expanded grid object>)
:
library(tidyverse)
library(multitool)
# create some data
the_data <-
data.frame(
id = 1:500,
iv1 = rnorm(500),
iv2 = rnorm(500),
iv3 = rnorm(500),
mod = rnorm(500),
dv1 = rnorm(500),
dv2 = rnorm(500),
include1 = rbinom(500, size = 1, prob = .1),
include2 = sample(1:3, size = 500, replace = TRUE),
include3 = rnorm(500)
)
# create a pipeline blueprint
full_pipeline <-
the_data |>
add_filters(include1 == 0, include2 != 3, include3 > -2.5) |>
add_variables(var_group = "ivs", iv1, iv2, iv3) |>
add_variables(var_group = "dvs", dv1, dv2) |>
add_model("linear model", lm({dvs} ~ {ivs} * mod))
# expand the pipeline
expanded_pipeline <- expand_decisions(full_pipeline)
# Run the multiverse
multiverse_results <- run_multiverse(expanded_pipeline)
multiverse_results
#> # A tibble: 48 × 4
#> decision specifications model_fitted pipeline_code
#> <chr> <list> <list> <list>
#> 1 1 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#> 2 2 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#> 3 3 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#> 4 4 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#> 5 5 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#> 6 6 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#> 7 7 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#> 8 8 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#> 9 9 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#> 10 10 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#> # ℹ 38 more rows
The result will be another tibble
with various list
columns.
It will always contain a list column named
specifications
containing all the information you generated
in your blueprint. Next, there will a list column for your fitted model
fitted, labelled model_fitted
.
There are two main ways to unpack and examine multitool
results. The first is by using tidyr::unnest()
.
Inside the model_fitted
column, multitool
gives us 4 columns: model_parameters
,
model_performance
, model_warnings
, and
model_messages
.
multiverse_results |> unnest(model_fitted)
#> # A tibble: 48 × 8
#> decision specifications model_function model_parameters model_performance
#> <chr> <list> <chr> <list> <list>
#> 1 1 <tibble [1 × 3]> lm <prmtrs_m> <prfrmnc_ [1 × 7]>
#> 2 2 <tibble [1 × 3]> lm <prmtrs_m> <prfrmnc_ [1 × 7]>
#> 3 3 <tibble [1 × 3]> lm <prmtrs_m> <prfrmnc_ [1 × 7]>
#> 4 4 <tibble [1 × 3]> lm <prmtrs_m> <prfrmnc_ [1 × 7]>
#> 5 5 <tibble [1 × 3]> lm <prmtrs_m> <prfrmnc_ [1 × 7]>
#> 6 6 <tibble [1 × 3]> lm <prmtrs_m> <prfrmnc_ [1 × 7]>
#> 7 7 <tibble [1 × 3]> lm <prmtrs_m> <prfrmnc_ [1 × 7]>
#> 8 8 <tibble [1 × 3]> lm <prmtrs_m> <prfrmnc_ [1 × 7]>
#> 9 9 <tibble [1 × 3]> lm <prmtrs_m> <prfrmnc_ [1 × 7]>
#> 10 10 <tibble [1 × 3]> lm <prmtrs_m> <prfrmnc_ [1 × 7]>
#> # ℹ 38 more rows
#> # ℹ 3 more variables: model_warnings <list>, model_messages <list>,
#> # pipeline_code <list>
The model_parameters
column gives you the result of
calling parameters::parameters()
on each model in your
grid, which is a data.frame
of model coefficients and their
associated standard errors, confidence intervals, test statistic, and
p-values.
multiverse_results |>
unnest(model_fitted) |>
unnest(model_parameters)
#> # A tibble: 192 × 20
#> decision specifications model_function parameter unstd_coef se unstd_ci
#> <chr> <list> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 1 <tibble [1 × 3]> lm (Interce… 0.0469 0.0556 0.95
#> 2 1 <tibble [1 × 3]> lm iv1 0.0386 0.0577 0.95
#> 3 1 <tibble [1 × 3]> lm mod 0.0511 0.0517 0.95
#> 4 1 <tibble [1 × 3]> lm iv1:mod -0.0337 0.0538 0.95
#> 5 2 <tibble [1 × 3]> lm (Interce… 0.00139 0.0582 0.95
#> 6 2 <tibble [1 × 3]> lm iv1 0.0394 0.0604 0.95
#> 7 2 <tibble [1 × 3]> lm mod -0.0678 0.0542 0.95
#> 8 2 <tibble [1 × 3]> lm iv1:mod -0.00131 0.0563 0.95
#> 9 3 <tibble [1 × 3]> lm (Interce… 0.0493 0.0556 0.95
#> 10 3 <tibble [1 × 3]> lm iv2 0.0111 0.0546 0.95
#> # ℹ 182 more rows
#> # ℹ 13 more variables: unstd_ci_low <dbl>, unstd_ci_high <dbl>, t <dbl>,
#> # df_error <int>, p <dbl>, std_coef <dbl>, std_ci <dbl>, std_ci_low <dbl>,
#> # std_ci_high <dbl>, model_performance <list>, model_warnings <list>,
#> # model_messages <list>, pipeline_code <list>
The model_performance
column gives fit statistics, such
as r2 or AIC and BIC values, computed by running
performance::performance()
on each model in your grid.
multiverse_results |>
unnest(model_fitted) |>
unnest(model_performance)
#> # A tibble: 48 × 14
#> decision specifications model_function model_parameters aic aicc bic
#> <chr> <list> <chr> <list> <dbl> <dbl> <dbl>
#> 1 1 <tibble [1 × 3]> lm <prmtrs_m> 820. 820. 838.
#> 2 2 <tibble [1 × 3]> lm <prmtrs_m> 847. 847. 865.
#> 3 3 <tibble [1 × 3]> lm <prmtrs_m> 820. 820. 839.
#> 4 4 <tibble [1 × 3]> lm <prmtrs_m> 847. 847. 866.
#> 5 5 <tibble [1 × 3]> lm <prmtrs_m> 816. 816. 835.
#> 6 6 <tibble [1 × 3]> lm <prmtrs_m> 847. 847. 865.
#> 7 7 <tibble [1 × 3]> lm <prmtrs_m> 822. 822. 840.
#> 8 8 <tibble [1 × 3]> lm <prmtrs_m> 849. 849. 867.
#> 9 9 <tibble [1 × 3]> lm <prmtrs_m> 823. 823. 841.
#> 10 10 <tibble [1 × 3]> lm <prmtrs_m> 849. 849. 868.
#> # ℹ 38 more rows
#> # ℹ 7 more variables: r2 <dbl>, r2_adjusted <dbl>, rmse <dbl>, sigma <dbl>,
#> # model_warnings <list>, model_messages <list>, pipeline_code <list>
The model_messages
and model_warnings
columns contain information provided by the modeling function. If
something went wrong or you need to know something about a particular
model, these columns will have captured messages and warnings printed by
the modeling function.
I wrote wrappers around the tidyr::unnest()
workflow.
The main function is reveal()
. Pass a multiverse results
object to reveal()
and tell it which columns to grab by
indicating the column name in the .what
argument:
multiverse_results |>
reveal(.what = model_fitted)
#> # A tibble: 48 × 8
#> decision specifications model_function model_parameters model_performance
#> <chr> <list> <chr> <list> <list>
#> 1 1 <tibble [1 × 3]> lm <prmtrs_m> <prfrmnc_ [1 × 7]>
#> 2 2 <tibble [1 × 3]> lm <prmtrs_m> <prfrmnc_ [1 × 7]>
#> 3 3 <tibble [1 × 3]> lm <prmtrs_m> <prfrmnc_ [1 × 7]>
#> 4 4 <tibble [1 × 3]> lm <prmtrs_m> <prfrmnc_ [1 × 7]>
#> 5 5 <tibble [1 × 3]> lm <prmtrs_m> <prfrmnc_ [1 × 7]>
#> 6 6 <tibble [1 × 3]> lm <prmtrs_m> <prfrmnc_ [1 × 7]>
#> 7 7 <tibble [1 × 3]> lm <prmtrs_m> <prfrmnc_ [1 × 7]>
#> 8 8 <tibble [1 × 3]> lm <prmtrs_m> <prfrmnc_ [1 × 7]>
#> 9 9 <tibble [1 × 3]> lm <prmtrs_m> <prfrmnc_ [1 × 7]>
#> 10 10 <tibble [1 × 3]> lm <prmtrs_m> <prfrmnc_ [1 × 7]>
#> # ℹ 38 more rows
#> # ℹ 3 more variables: model_warnings <list>, model_messages <list>,
#> # pipeline_code <list>
If you want to get straight to a specific result you can specify a
sub-list with the .which
argument:
multiverse_results |>
reveal(.what = model_fitted, .which = model_parameters)
#> # A tibble: 192 × 20
#> decision specifications model_function parameter unstd_coef se unstd_ci
#> <chr> <list> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 1 <tibble [1 × 3]> lm (Interce… 0.0469 0.0556 0.95
#> 2 1 <tibble [1 × 3]> lm iv1 0.0386 0.0577 0.95
#> 3 1 <tibble [1 × 3]> lm mod 0.0511 0.0517 0.95
#> 4 1 <tibble [1 × 3]> lm iv1:mod -0.0337 0.0538 0.95
#> 5 2 <tibble [1 × 3]> lm (Interce… 0.00139 0.0582 0.95
#> 6 2 <tibble [1 × 3]> lm iv1 0.0394 0.0604 0.95
#> 7 2 <tibble [1 × 3]> lm mod -0.0678 0.0542 0.95
#> 8 2 <tibble [1 × 3]> lm iv1:mod -0.00131 0.0563 0.95
#> 9 3 <tibble [1 × 3]> lm (Interce… 0.0493 0.0556 0.95
#> 10 3 <tibble [1 × 3]> lm iv2 0.0111 0.0546 0.95
#> # ℹ 182 more rows
#> # ℹ 13 more variables: unstd_ci_low <dbl>, unstd_ci_high <dbl>, t <dbl>,
#> # df_error <int>, p <dbl>, std_coef <dbl>, std_ci <dbl>, std_ci_low <dbl>,
#> # std_ci_high <dbl>, model_performance <list>, model_warnings <list>,
#> # model_messages <list>, pipeline_code <list>
reveal_model_*
multitool
will run and save anything you put in your
pipeline but most often, you will want to look at model parameters
and/or performance. To that end, there are a set of convenience
functions for getting at the most common multiverse results:
reveal_model_parameters
,
reveal_model_performance
,
reveal_model_messages
, and
reveal_model_warnings
.
reveal_model_parameters
unpacks the model parameters in
your multiverse:
multiverse_results |>
reveal_model_parameters()
#> # A tibble: 192 × 20
#> decision specifications model_function parameter unstd_coef se unstd_ci
#> <chr> <list> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 1 <tibble [1 × 3]> lm (Interce… 0.0469 0.0556 0.95
#> 2 1 <tibble [1 × 3]> lm iv1 0.0386 0.0577 0.95
#> 3 1 <tibble [1 × 3]> lm mod 0.0511 0.0517 0.95
#> 4 1 <tibble [1 × 3]> lm iv1:mod -0.0337 0.0538 0.95
#> 5 2 <tibble [1 × 3]> lm (Interce… 0.00139 0.0582 0.95
#> 6 2 <tibble [1 × 3]> lm iv1 0.0394 0.0604 0.95
#> 7 2 <tibble [1 × 3]> lm mod -0.0678 0.0542 0.95
#> 8 2 <tibble [1 × 3]> lm iv1:mod -0.00131 0.0563 0.95
#> 9 3 <tibble [1 × 3]> lm (Interce… 0.0493 0.0556 0.95
#> 10 3 <tibble [1 × 3]> lm iv2 0.0111 0.0546 0.95
#> # ℹ 182 more rows
#> # ℹ 13 more variables: unstd_ci_low <dbl>, unstd_ci_high <dbl>, t <dbl>,
#> # df_error <int>, p <dbl>, std_coef <dbl>, std_ci <dbl>, std_ci_low <dbl>,
#> # std_ci_high <dbl>, model_performance <list>, model_warnings <list>,
#> # model_messages <list>, pipeline_code <list>
reveal_model_performance
unpacks the model
performance:
multiverse_results |>
reveal_model_performance()
#> # A tibble: 48 × 14
#> decision specifications model_function model_parameters aic aicc bic
#> <chr> <list> <chr> <list> <dbl> <dbl> <dbl>
#> 1 1 <tibble [1 × 3]> lm <prmtrs_m> 820. 820. 838.
#> 2 2 <tibble [1 × 3]> lm <prmtrs_m> 847. 847. 865.
#> 3 3 <tibble [1 × 3]> lm <prmtrs_m> 820. 820. 839.
#> 4 4 <tibble [1 × 3]> lm <prmtrs_m> 847. 847. 866.
#> 5 5 <tibble [1 × 3]> lm <prmtrs_m> 816. 816. 835.
#> 6 6 <tibble [1 × 3]> lm <prmtrs_m> 847. 847. 865.
#> 7 7 <tibble [1 × 3]> lm <prmtrs_m> 822. 822. 840.
#> 8 8 <tibble [1 × 3]> lm <prmtrs_m> 849. 849. 867.
#> 9 9 <tibble [1 × 3]> lm <prmtrs_m> 823. 823. 841.
#> 10 10 <tibble [1 × 3]> lm <prmtrs_m> 849. 849. 868.
#> # ℹ 38 more rows
#> # ℹ 7 more variables: r2 <dbl>, r2_adjusted <dbl>, rmse <dbl>, sigma <dbl>,
#> # model_warnings <list>, model_messages <list>, pipeline_code <list>
You can also choose to expand your decision grid with
.unpack_specs
to see which decisions produced what result.
You have two options for unpacking your decisions - wide
or
long
. If you set .unpack_specs = 'wide'
, you
get one column per decision variable. This is exactly the same as how
your decisions appeared in your grid.
multiverse_results |>
reveal_model_parameters(.unpack_specs = "wide")
#> # A tibble: 192 × 27
#> decision ivs dvs include1 include2 include3 model model_meta model_args
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 iv1 dv1 include1 … include… include… lm(d… linear mo… ""
#> 2 1 iv1 dv1 include1 … include… include… lm(d… linear mo… ""
#> 3 1 iv1 dv1 include1 … include… include… lm(d… linear mo… ""
#> 4 1 iv1 dv1 include1 … include… include… lm(d… linear mo… ""
#> 5 2 iv1 dv2 include1 … include… include… lm(d… linear mo… ""
#> 6 2 iv1 dv2 include1 … include… include… lm(d… linear mo… ""
#> 7 2 iv1 dv2 include1 … include… include… lm(d… linear mo… ""
#> 8 2 iv1 dv2 include1 … include… include… lm(d… linear mo… ""
#> 9 3 iv2 dv1 include1 … include… include… lm(d… linear mo… ""
#> 10 3 iv2 dv1 include1 … include… include… lm(d… linear mo… ""
#> # ℹ 182 more rows
#> # ℹ 18 more variables: model_function <chr>, parameter <chr>, unstd_coef <dbl>,
#> # se <dbl>, unstd_ci <dbl>, unstd_ci_low <dbl>, unstd_ci_high <dbl>, t <dbl>,
#> # df_error <int>, p <dbl>, std_coef <dbl>, std_ci <dbl>, std_ci_low <dbl>,
#> # std_ci_high <dbl>, model_performance <list>, model_warnings <list>,
#> # model_messages <list>, pipeline_code <list>
If you set .unpack_specs = 'long'
, your decisions get
stacked into two columns: decision_set
and
alternatives
. This format is nice for plotting a particular
result from a multiverse analyses per different decision
alternatives.
multiverse_results |>
reveal_model_performance(.unpack_specs = "long")
#> # A tibble: 336 × 15
#> decision decision_set alternatives model_function model_parameters aic
#> <chr> <chr> <chr> <chr> <list> <dbl>
#> 1 1 ivs "iv1" lm <prmtrs_m> 820.
#> 2 1 dvs "dv1" lm <prmtrs_m> 820.
#> 3 1 include1 "include1 == 0" lm <prmtrs_m> 820.
#> 4 1 include2 "include2 != 3" lm <prmtrs_m> 820.
#> 5 1 include3 "include3 > -2.5" lm <prmtrs_m> 820.
#> 6 1 model "linear model" lm <prmtrs_m> 820.
#> 7 1 model_args "" lm <prmtrs_m> 820.
#> 8 2 ivs "iv1" lm <prmtrs_m> 847.
#> 9 2 dvs "dv2" lm <prmtrs_m> 847.
#> 10 2 include1 "include1 == 0" lm <prmtrs_m> 847.
#> # ℹ 326 more rows
#> # ℹ 9 more variables: aicc <dbl>, bic <dbl>, r2 <dbl>, r2_adjusted <dbl>,
#> # rmse <dbl>, sigma <dbl>, model_warnings <list>, model_messages <list>,
#> # pipeline_code <list>
Unpacking specifications alongside specific results allows us to examine the effects of our pipeline decisions.
A powerful way to organize these results is to summarize a specific
results column, say the r2 values of our model over the
entire multiverse. condense()
takes a result column and
summarizes it with the .how
argument, which takes a list in
the form of
list(<a name you pick> = <summary function>)
.
.how
will create a column named like so
<column being condsensed>_<summary function name provided>
.
For this case, we have r2_mean
and
r2_median
.
# model performance r2 summaries
multiverse_results |>
reveal_model_performance() |>
condense(r2, list(mean = mean, median = median))
#> # A tibble: 1 × 3
#> r2_mean r2_median r2_list
#> <dbl> <dbl> <list>
#> 1 0.00557 0.00509 <dbl [48]>
# model parameters for our predictor of interest
multiverse_results |>
reveal_model_parameters() |>
filter(str_detect(parameter, "iv")) |>
condense(unstd_coef, list(mean = mean, median = median))
#> # A tibble: 1 × 3
#> unstd_coef_mean unstd_coef_median unstd_coef_list
#> <dbl> <dbl> <list>
#> 1 0.00563 -0.00122 <dbl [96]>
In the last example, we have filtered our multiverse results to look
at our predictors iv*
to see what the mean and median
effect was (over all combinations of decisions) on our outcomes.
However, we had three versions of our predictor and two outcomes, so
combining dplyr::group_by()
with condense()
might be more informative:
multiverse_results |>
reveal_model_parameters(.unpack_specs = "wide") |>
filter(str_detect(parameter, "iv")) |>
group_by(ivs, dvs) |>
condense(unstd_coef, list(mean = mean, median = median))
#> # A tibble: 6 × 5
#> # Groups: ivs [3]
#> ivs dvs unstd_coef_mean unstd_coef_median unstd_coef_list
#> <chr> <chr> <dbl> <dbl> <list>
#> 1 iv1 dv1 -0.00391 -0.00169 <dbl [16]>
#> 2 iv1 dv2 0.0106 0.0163 <dbl [16]>
#> 3 iv2 dv1 -0.00399 -0.00209 <dbl [16]>
#> 4 iv2 dv2 -0.0125 -0.0170 <dbl [16]>
#> 5 iv3 dv1 0.0582 0.0569 <dbl [16]>
#> 6 iv3 dv2 -0.0146 -0.0136 <dbl [16]>
If we were interested in all the terms of the model, we can leverage
group_by
further:
multiverse_results |>
reveal_model_parameters(.unpack_specs = "wide") |>
group_by(parameter, dvs) |>
condense(unstd_coef, list(mean = mean, median = median))
#> # A tibble: 16 × 5
#> # Groups: parameter [8]
#> parameter dvs unstd_coef_mean unstd_coef_median unstd_coef_list
#> <chr> <chr> <dbl> <dbl> <list>
#> 1 (Intercept) dv1 0.0495 0.0500 <dbl [24]>
#> 2 (Intercept) dv2 0.0000922 0.00194 <dbl [24]>
#> 3 iv1 dv1 0.0333 0.0346 <dbl [8]>
#> 4 iv1 dv2 0.0475 0.0443 <dbl [8]>
#> 5 iv1:mod dv1 -0.0411 -0.0382 <dbl [8]>
#> 6 iv1:mod dv2 -0.0264 -0.0308 <dbl [8]>
#> 7 iv2 dv1 0.00211 0.00119 <dbl [8]>
#> 8 iv2 dv2 -0.0225 -0.0193 <dbl [8]>
#> 9 iv2:mod dv1 -0.0101 -0.00521 <dbl [8]>
#> 10 iv2:mod dv2 -0.00243 -0.00789 <dbl [8]>
#> 11 iv3 dv1 0.0525 0.0507 <dbl [8]>
#> 12 iv3 dv2 -0.00609 -0.0116 <dbl [8]>
#> 13 iv3:mod dv1 0.0639 0.0614 <dbl [8]>
#> 14 iv3:mod dv2 -0.0231 -0.0232 <dbl [8]>
#> 15 mod dv1 0.0273 0.0262 <dbl [24]>
#> 16 mod dv2 -0.0441 -0.0424 <dbl [24]>