Once you have built your full specification blueprint and feel comfortable with how the pipeline is executed, you can implement a full multiverse-style analysis.
Simply use
run_multiverse(<your expanded grid object>):
library(tidyverse)
library(multitool)
# create some data
the_data <-
data.frame(
id = 1:500,
iv1 = rnorm(500),
iv2 = rnorm(500),
iv3 = rnorm(500),
mod = rnorm(500),
dv1 = rnorm(500),
dv2 = rnorm(500),
include1 = rbinom(500, size = 1, prob = .1),
include2 = sample(1:3, size = 500, replace = TRUE),
include3 = rnorm(500)
)
# create a pipeline blueprint
full_pipeline <-
the_data |>
add_filters(include1 == 0, include2 != 3, include3 > -2.5) |>
add_variables(var_group = "ivs", iv1, iv2, iv3) |>
add_variables(var_group = "dvs", dv1, dv2) |>
add_model("linear model", lm({dvs} ~ {ivs} * mod))
# expand the pipeline
expanded_pipeline <- expand_decisions(full_pipeline)
# Run the multiverse
multiverse_results <- analyze_grid(expanded_pipeline)
multiverse_results
#> # A tibble: 48 × 5
#> decision specifications model_fitted pipeline_code timing_logs
#> <dbl> <list> <list> <list> <list>
#> 1 1 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 4]> <tibble [1 × 4]>
#> 2 2 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 4]> <tibble [1 × 4]>
#> 3 3 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 4]> <tibble [1 × 4]>
#> 4 4 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 4]> <tibble [1 × 4]>
#> 5 5 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 4]> <tibble [1 × 4]>
#> 6 6 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 4]> <tibble [1 × 4]>
#> 7 7 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 4]> <tibble [1 × 4]>
#> 8 8 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 4]> <tibble [1 × 4]>
#> 9 9 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 4]> <tibble [1 × 4]>
#> 10 10 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 4]> <tibble [1 × 4]>
#> # ℹ 38 more rowsThe result will be another tibble with various list
columns.
It will always contain a list column named
specifications containing all the information you generated
in your blueprint. Next, there will a list column for your fitted model
fitted, labelled model_fitted.
There are two main ways to unpack and examine multitool
results. The first is by using tidyr::unnest().
Inside the model_fitted column, multitool
gives us 4 columns: model_parameters,
model_performance, model_warnings, and
model_messages.
multiverse_results |> unnest(model_fitted)
#> # A tibble: 48 × 9
#> decision specifications model_function model_parameters model_performance
#> <dbl> <list> <chr> <list> <list>
#> 1 1 <tibble [1 × 3]> lm <tibble [4 × 13]> <tibble [1 × 7]>
#> 2 2 <tibble [1 × 3]> lm <tibble [4 × 13]> <tibble [1 × 7]>
#> 3 3 <tibble [1 × 3]> lm <tibble [4 × 13]> <tibble [1 × 7]>
#> 4 4 <tibble [1 × 3]> lm <tibble [4 × 13]> <tibble [1 × 7]>
#> 5 5 <tibble [1 × 3]> lm <tibble [4 × 13]> <tibble [1 × 7]>
#> 6 6 <tibble [1 × 3]> lm <tibble [4 × 13]> <tibble [1 × 7]>
#> 7 7 <tibble [1 × 3]> lm <tibble [4 × 13]> <tibble [1 × 7]>
#> 8 8 <tibble [1 × 3]> lm <tibble [4 × 13]> <tibble [1 × 7]>
#> 9 9 <tibble [1 × 3]> lm <tibble [4 × 13]> <tibble [1 × 7]>
#> 10 10 <tibble [1 × 3]> lm <tibble [4 × 13]> <tibble [1 × 7]>
#> # ℹ 38 more rows
#> # ℹ 4 more variables: model_warnings <list>, model_messages <list>,
#> # pipeline_code <list>, timing_logs <list>The model_parameters column gives you the result of
calling parameters::parameters() on each model in your
grid, which is a data.frame of model coefficients and their
associated standard errors, confidence intervals, test statistic, and
p-values.
multiverse_results |>
unnest(model_fitted) |>
unnest(model_parameters)
#> # A tibble: 192 × 21
#> decision specifications model_function parameter coefficient se ci
#> <dbl> <list> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 1 <tibble [1 × 3]> lm (Intercept) -0.0826 0.0517 0.95
#> 2 1 <tibble [1 × 3]> lm iv1 0.0121 0.0520 0.95
#> 3 1 <tibble [1 × 3]> lm mod 0.0385 0.0553 0.95
#> 4 1 <tibble [1 × 3]> lm iv1:mod 0.00306 0.0544 0.95
#> 5 2 <tibble [1 × 3]> lm (Intercept) 0.00699 0.0612 0.95
#> 6 2 <tibble [1 × 3]> lm iv1 -0.0658 0.0616 0.95
#> 7 2 <tibble [1 × 3]> lm mod 0.00262 0.0655 0.95
#> 8 2 <tibble [1 × 3]> lm iv1:mod -0.0205 0.0645 0.95
#> 9 3 <tibble [1 × 3]> lm (Intercept) -0.0839 0.0514 0.95
#> 10 3 <tibble [1 × 3]> lm iv2 -0.0146 0.0510 0.95
#> # ℹ 182 more rows
#> # ℹ 14 more variables: ci_low <dbl>, ci_high <dbl>, t <dbl>, df_error <int>,
#> # p <dbl>, std_coefficient <dbl>, std_ci <dbl>, std_ci_low <dbl>,
#> # std_ci_high <dbl>, model_performance <list>, model_warnings <list>,
#> # model_messages <list>, pipeline_code <list>, timing_logs <list>The model_performance column gives fit statistics, such
as r2 or AIC and BIC values, computed by running
performance::performance() on each model in your grid.
multiverse_results |>
unnest(model_fitted) |>
unnest(model_performance)
#> # A tibble: 48 × 15
#> decision specifications model_function model_parameters aic aicc bic
#> <dbl> <list> <chr> <list> <dbl> <dbl> <dbl>
#> 1 1 <tibble [1 × 3]> lm <tibble [4 × 13]> 750. 750. 769.
#> 2 2 <tibble [1 × 3]> lm <tibble [4 × 13]> 849. 849. 867.
#> 3 3 <tibble [1 × 3]> lm <tibble [4 × 13]> 750. 750. 768.
#> 4 4 <tibble [1 × 3]> lm <tibble [4 × 13]> 849. 849. 867.
#> 5 5 <tibble [1 × 3]> lm <tibble [4 × 13]> 749. 749. 768.
#> 6 6 <tibble [1 × 3]> lm <tibble [4 × 13]> 849. 850. 868.
#> 7 7 <tibble [1 × 3]> lm <tibble [4 × 13]> 752. 753. 771.
#> 8 8 <tibble [1 × 3]> lm <tibble [4 × 13]> 851. 852. 870.
#> 9 9 <tibble [1 × 3]> lm <tibble [4 × 13]> 752. 752. 770.
#> 10 10 <tibble [1 × 3]> lm <tibble [4 × 13]> 851. 852. 870.
#> # ℹ 38 more rows
#> # ℹ 8 more variables: r2 <dbl>, r2_adjusted <dbl>, rmse <dbl>, sigma <dbl>,
#> # model_warnings <list>, model_messages <list>, pipeline_code <list>,
#> # timing_logs <list>The model_messages and model_warnings
columns contain information provided by the modeling function. If
something went wrong or you need to know something about a particular
model, these columns will have captured messages and warnings printed by
the modeling function.
I wrote wrappers around the tidyr::unnest() workflow.
The main function is unpack_results(). Pass a multiverse
results object to unpack_results() and tell it which
columns to grab by indicating the column name in the .what
argument:
multiverse_results |>
unpack_results(.what = model_fitted)
#> # A tibble: 48 × 14
#> decision ivs dvs include1 include2 include3 model_meta model_function
#> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 iv1 dv1 include1 ==… include… include… linear mo… lm
#> 2 2 iv1 dv2 include1 ==… include… include… linear mo… lm
#> 3 3 iv2 dv1 include1 ==… include… include… linear mo… lm
#> 4 4 iv2 dv2 include1 ==… include… include… linear mo… lm
#> 5 5 iv3 dv1 include1 ==… include… include… linear mo… lm
#> 6 6 iv3 dv2 include1 ==… include… include… linear mo… lm
#> 7 7 iv1 dv1 include1 ==… include… include… linear mo… lm
#> 8 8 iv1 dv2 include1 ==… include… include… linear mo… lm
#> 9 9 iv2 dv1 include1 ==… include… include… linear mo… lm
#> 10 10 iv2 dv2 include1 ==… include… include… linear mo… lm
#> # ℹ 38 more rows
#> # ℹ 6 more variables: model_parameters <list>, model_performance <list>,
#> # model_warnings <list>, model_messages <list>, pipeline_code <list>,
#> # timing_logs <list>If you want to get straight to a specific result you can specify a
sub-list with the .which argument:
multiverse_results |>
unpack_results(.what = model_fitted, .which = model_parameters)
#> # A tibble: 192 × 26
#> decision ivs dvs include1 include2 include3 model_meta model_function
#> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 iv1 dv1 include1 ==… include… include… linear mo… lm
#> 2 1 iv1 dv1 include1 ==… include… include… linear mo… lm
#> 3 1 iv1 dv1 include1 ==… include… include… linear mo… lm
#> 4 1 iv1 dv1 include1 ==… include… include… linear mo… lm
#> 5 2 iv1 dv2 include1 ==… include… include… linear mo… lm
#> 6 2 iv1 dv2 include1 ==… include… include… linear mo… lm
#> 7 2 iv1 dv2 include1 ==… include… include… linear mo… lm
#> 8 2 iv1 dv2 include1 ==… include… include… linear mo… lm
#> 9 3 iv2 dv1 include1 ==… include… include… linear mo… lm
#> 10 3 iv2 dv1 include1 ==… include… include… linear mo… lm
#> # ℹ 182 more rows
#> # ℹ 18 more variables: parameter <chr>, coefficient <dbl>, se <dbl>, ci <dbl>,
#> # ci_low <dbl>, ci_high <dbl>, t <dbl>, df_error <int>, p <dbl>,
#> # std_coefficient <dbl>, std_ci <dbl>, std_ci_low <dbl>, std_ci_high <dbl>,
#> # model_performance <list>, model_warnings <list>, model_messages <list>,
#> # pipeline_code <list>, timing_logs <list>unpack_model_*multitool will run and save anything you put in your
pipeline but most often, you will want to look at model parameters
and/or performance. To that end, there are a set of convenience
functions for getting at the most common multiverse results:
unpack_model_parameters,
unpack_model_performance,
unpack_model_messages, and
unpack_model_warnings.
unpack_model_parameters unpacks the model parameters in
your multiverse:
multiverse_results |>
unpack_model_parameters()
#> # A tibble: 192 × 21
#> decision ivs dvs include1 include2 include3 model_meta model_function
#> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 iv1 dv1 include1 ==… include… include… linear mo… lm
#> 2 1 iv1 dv1 include1 ==… include… include… linear mo… lm
#> 3 1 iv1 dv1 include1 ==… include… include… linear mo… lm
#> 4 1 iv1 dv1 include1 ==… include… include… linear mo… lm
#> 5 2 iv1 dv2 include1 ==… include… include… linear mo… lm
#> 6 2 iv1 dv2 include1 ==… include… include… linear mo… lm
#> 7 2 iv1 dv2 include1 ==… include… include… linear mo… lm
#> 8 2 iv1 dv2 include1 ==… include… include… linear mo… lm
#> 9 3 iv2 dv1 include1 ==… include… include… linear mo… lm
#> 10 3 iv2 dv1 include1 ==… include… include… linear mo… lm
#> # ℹ 182 more rows
#> # ℹ 13 more variables: parameter <chr>, coefficient <dbl>, se <dbl>, ci <dbl>,
#> # ci_low <dbl>, ci_high <dbl>, t <dbl>, df_error <int>, p <dbl>,
#> # std_coefficient <dbl>, std_ci <dbl>, std_ci_low <dbl>, std_ci_high <dbl>unpack_model_performance unpacks the model
performance:
multiverse_results |>
unpack_model_performance()
#> # A tibble: 48 × 15
#> decision ivs dvs include1 include2 include3 model_meta model_function
#> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 iv1 dv1 include1 ==… include… include… linear mo… lm
#> 2 2 iv1 dv2 include1 ==… include… include… linear mo… lm
#> 3 3 iv2 dv1 include1 ==… include… include… linear mo… lm
#> 4 4 iv2 dv2 include1 ==… include… include… linear mo… lm
#> 5 5 iv3 dv1 include1 ==… include… include… linear mo… lm
#> 6 6 iv3 dv2 include1 ==… include… include… linear mo… lm
#> 7 7 iv1 dv1 include1 ==… include… include… linear mo… lm
#> 8 8 iv1 dv2 include1 ==… include… include… linear mo… lm
#> 9 9 iv2 dv1 include1 ==… include… include… linear mo… lm
#> 10 10 iv2 dv2 include1 ==… include… include… linear mo… lm
#> # ℹ 38 more rows
#> # ℹ 7 more variables: aic <dbl>, aicc <dbl>, bic <dbl>, r2 <dbl>,
#> # r2_adjusted <dbl>, rmse <dbl>, sigma <dbl>You can also choose to expand your decision grid with
.unpack_specs to see which decisions produced what result.
You have two options for unpacking your decisions - wide or
long. If you set .unpack_specs = 'wide', you
get one column per decision variable. This is exactly the same as how
your decisions appeared in your grid.
multiverse_results |>
unpack_model_parameters(.unpack_specs = "wide")
#> # A tibble: 192 × 21
#> decision ivs dvs include1 include2 include3 model_meta model_function
#> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 iv1 dv1 include1 ==… include… include… linear mo… lm
#> 2 1 iv1 dv1 include1 ==… include… include… linear mo… lm
#> 3 1 iv1 dv1 include1 ==… include… include… linear mo… lm
#> 4 1 iv1 dv1 include1 ==… include… include… linear mo… lm
#> 5 2 iv1 dv2 include1 ==… include… include… linear mo… lm
#> 6 2 iv1 dv2 include1 ==… include… include… linear mo… lm
#> 7 2 iv1 dv2 include1 ==… include… include… linear mo… lm
#> 8 2 iv1 dv2 include1 ==… include… include… linear mo… lm
#> 9 3 iv2 dv1 include1 ==… include… include… linear mo… lm
#> 10 3 iv2 dv1 include1 ==… include… include… linear mo… lm
#> # ℹ 182 more rows
#> # ℹ 13 more variables: parameter <chr>, coefficient <dbl>, se <dbl>, ci <dbl>,
#> # ci_low <dbl>, ci_high <dbl>, t <dbl>, df_error <int>, p <dbl>,
#> # std_coefficient <dbl>, std_ci <dbl>, std_ci_low <dbl>, std_ci_high <dbl>If you set .unpack_specs = 'long', your decisions get
stacked into two columns: decision_set and
alternatives. This format is nice for plotting a particular
result from a multiverse analyses per different decision
alternatives.
multiverse_results |>
unpack_model_performance(.unpack_specs = "long")
#> # A tibble: 432 × 12
#> decision decision_type decision_set decision_choice model_function aic
#> <dbl> <chr> <chr> <chr> <chr> <dbl>
#> 1 1 variables ivs iv1 lm 750.
#> 2 1 variables dvs dv1 lm 750.
#> 3 1 filters include1 include1 == 0 lm 750.
#> 4 1 filters include2 include2 != 3 lm 750.
#> 5 1 filters include3 include3 > -2.5 lm 750.
#> 6 1 models model_meta linear model lm 750.
#> 7 1 models model_coefs_fn parameters::pa… lm 750.
#> 8 1 models model_fit_fn performance::p… lm 750.
#> 9 1 models model_standardiz… parameters::st… lm 750.
#> 10 2 variables ivs iv1 lm 849.
#> # ℹ 422 more rows
#> # ℹ 6 more variables: aicc <dbl>, bic <dbl>, r2 <dbl>, r2_adjusted <dbl>,
#> # rmse <dbl>, sigma <dbl>Unpacking specifications alongside specific results allows us to examine the effects of our pipeline decisions.
A powerful way to organize these results is to summarize a specific
results column, say the r2 values of our model over the
entire multiverse. condense() takes a result column and
summarizes it with the .how argument, which takes a list in
the form of
list(<a name you pick> = <summary function>).
.how will create a column named like so
<column being condsensed>_<summary function name provided>.
For this case, we have r2_mean and
r2_median.
# model performance r2 summaries
multiverse_results |>
unpack_model_performance() |>
condense(r2, list(mean = mean, median = median))
#> # A tibble: 1 × 3
#> r2_mean r2_median r2_list
#> <dbl> <dbl> <list>
#> 1 0.00340 0.00318 <dbl [48]>
# model parameters for our predictor of interest
multiverse_results |>
unpack_model_parameters() |>
filter(str_detect(parameter, "iv")) |>
condense(coefficient, list(mean = mean, median = median))
#> # A tibble: 1 × 3
#> coefficient_mean coefficient_median coefficient_list
#> <dbl> <dbl> <list>
#> 1 -0.00707 -0.0128 <dbl [96]>In the last example, we have filtered our multiverse results to look
at our predictors iv* to see what the mean and median
effect was (over all combinations of decisions) on our outcomes.
However, we had three versions of our predictor and two outcomes, so
combining dplyr::group_by() with condense()
might be more informative:
multiverse_results |>
unpack_model_parameters(.unpack_specs = "wide") |>
filter(str_detect(parameter, "iv")) |>
group_by(ivs, dvs) |>
condense(coefficient, list(mean = mean, median = median))
#> # A tibble: 6 × 5
#> # Groups: ivs [3]
#> ivs dvs coefficient_mean coefficient_median coefficient_list
#> <chr> <chr> <dbl> <dbl> <list>
#> 1 iv1 dv1 0.00117 0.00717 <dbl [16]>
#> 2 iv1 dv2 -0.0386 -0.0350 <dbl [16]>
#> 3 iv2 dv1 -0.0115 -0.0124 <dbl [16]>
#> 4 iv2 dv2 0.0173 0.0254 <dbl [16]>
#> 5 iv3 dv1 -0.0461 -0.0449 <dbl [16]>
#> 6 iv3 dv2 0.0353 0.0347 <dbl [16]>If we were interested in all the terms of the model, we can leverage
group_by further:
multiverse_results |>
unpack_model_parameters(.unpack_specs = "wide") |>
group_by(parameter, dvs) |>
condense(coefficient, list(mean = mean, median = median))
#> # A tibble: 16 × 5
#> # Groups: parameter [8]
#> parameter dvs coefficient_mean coefficient_median coefficient_list
#> <chr> <chr> <dbl> <dbl> <list>
#> 1 (Intercept) dv1 -0.0695 -0.0698 <dbl [24]>
#> 2 (Intercept) dv2 -0.0144 -0.0174 <dbl [24]>
#> 3 iv1 dv1 0.0147 0.0103 <dbl [8]>
#> 4 iv1 dv2 -0.0580 -0.0546 <dbl [8]>
#> 5 iv1:mod dv1 -0.0123 -0.00995 <dbl [8]>
#> 6 iv1:mod dv2 -0.0191 -0.0185 <dbl [8]>
#> 7 iv2 dv1 -0.0117 -0.0124 <dbl [8]>
#> 8 iv2 dv2 -0.0240 -0.0285 <dbl [8]>
#> 9 iv2:mod dv1 -0.0114 -0.0108 <dbl [8]>
#> 10 iv2:mod dv2 0.0586 0.0592 <dbl [8]>
#> 11 iv3 dv1 -0.0423 -0.0430 <dbl [8]>
#> 12 iv3 dv2 0.0338 0.0347 <dbl [8]>
#> 13 iv3:mod dv1 -0.0499 -0.0481 <dbl [8]>
#> 14 iv3:mod dv2 0.0369 0.0349 <dbl [8]>
#> 15 mod dv1 0.0197 0.0184 <dbl [24]>
#> 16 mod dv2 -0.0114 -0.0123 <dbl [24]>