An R package for sensitivity analysis (konfound)

2018/04/12

knitr::opts_chunk$set(
  comment = "#>",
  collapse = TRUE
)

With Ran Xu and Ken Frank, I have worked on a Shiny interactive web application for sensitivity analysis as well as an R package for carrying out sensitivity analysis using R.

That R package is now available on CRAN! A link to the CRAN page for it is here and the website for the package is here.

Here is the description:

Statistical methods that quantify the conditions necessary to alter inferences, also known as sensitivity analysis, are becoming increasingly important to a variety of quantitative sciences. A series of recent works, including Frank (2000) and Frank et al. (2013) extend previous sensitivity analyses by considering the characteristics of omitted variables or unobserved cases that would change an inference if such variables or cases were observed. These analyses generate statements such as “an omitted variable would have to be correlated at xx with the predictor of interest (e.g., treatment) and outcome to invalidate an inference of a treatment effect”. Or “one would have to replace pp percent of the observed data with null hypothesis cases to invalidate the inference”. We implement these recent developments of sensitivity analysis and provide modules to calculate these two robustness indices and generate such statements in R. In particular, the functions konfound(), pkonfound() and mkonfound() allow users to calculate the robustness of inferences for a user’s own model, a single published study and multiple studies respectively.

As a super short introduction, imagine that we carried out a regression for the relationship between a car’s weight and its fuel efficiency (miles per gallon):

library(konfound)
#> Sensitivity analysis as described in Frank, Maroulis, Duong, and Kelcey (2013) and in Frank (2000).
#> For more information visit https://jmichaelrosenberg.shinyapps.io/shinykonfound/.

m1 <- lm(mpg ~ wt + drat, data = mtcars)
summary(m1)
#> 
#> Call:
#> lm(formula = mpg ~ wt + drat, data = mtcars)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -5.4159 -2.0452  0.0136  1.7704  6.7466 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)   30.290      7.318   4.139 0.000274 ***
#> wt            -4.783      0.797  -6.001 1.59e-06 ***
#> drat           1.442      1.459   0.989 0.330854    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 3.047 on 29 degrees of freedom
#> Multiple R-squared:  0.7609, Adjusted R-squared:  0.7444 
#> F-statistic: 46.14 on 2 and 29 DF,  p-value: 9.761e-10

We can carry out sensitivity analysis for the effect of weight, for example, using the konfound() function on the model output:

konfound(m1, wt)
#> Note that this output is calculated based on the correlation-based approach used in mkonfound()
#> Replacement of Cases Approach:
#> To invalidate an inference, 65.969% of the estimate would have to be due to bias. This is based on a threshold of -1.628 for statistical significance (alpha = 0.05).
#> To invalidate an inference, 21 observations would have to be replaced with cases for which the effect is 0.
#> 
#> Correlation-based Approach:
#> An omitted variable would have to be correlated at 0.781 with the outcome and at 0.781 with the predictor of interest (conditioning on observed covariates) to invalidate an inference based on a threshold of -0.36 for statistical significance (alpha = 0.05).
#> Correspondingly the impact of an omitted variable (as defined in Frank 2000) must be 0.781 X 0.781 = 0.61 to invalidate an inference.
#> For more detailed output, consider setting `to_return` to table
#> To consider other predictors of interest, consider setting `test_all` to TRUE.

This (very preliminary - and just as an illustration) suggests that nearly two-thirds of the effect of the weight of a car on its miles per gallon would need to be due to bias - in the model or measures, for example - for the effect to be invalidated.

Alternatively, another way to interpret the results of this sensitivity analysis is in terms of how correlated an omitted, confounding variable (i.e., a covariate) would need to be with both the variable of interest (weight) and the outcome, and this approach suggests that such a confounding variable would nee do be correlated at about .80 with both weight and miles per gallon for the effect of weight to be invalidated.

The konfound() function works on output from lm() as well as glm() (for non-linear models) and lmer() (from the lme4 package) for mixed effects models. There are also a number of ways (besides text) to present the output. Much more on the konfound() function (and package) can be found here.