Modifying an R function to iterate (using purrr) and use non-standard evaluation (using rlang)

null

2017/12/17

Note that due to version-related issues the code in this post is displayed but is not excecuted/run.

Background

Research in classrooms and schools can be complex because of all of the factors that matter. A question that often comes up when we say that we observed some pattern in data is, but did you control for X?

In the context of working on an approach to find out how impactful an omitted variable would need to be to invalidate an inference, we had to modify a function that worked for a single sensitivity analysis to work for many and to be easier to use.

The initial version of the function

In the context of programming (and mathematics!), many functions take inputs and then transform them into output.

Imagine we have a function that takes two values used to calculate a t-test, the t statistic (the coefficient divided by its standard error) and the degrees of freedom of the t distribution. it then returns a bunch of output about how sensitive an inference based on the t-test is to bias or to an omitted confounding variable.

Here is a function that outputs values about the sensitivity of a t-test as a row in a row of a data frame (actually an R data.frame modified to be a bit easier to work with, an R tibble).

It’s a bit of a whopper:

core_sensitivity_mkonfound <- function(t, df, alpha = .05, tails = 2) {
    
    critical_t <- stats::qt(1 - (alpha / tails), df)
    critical_r <- critical_t / sqrt((critical_t ^ 2) + df)
    
    obs_r <- abs(t / sqrt(df + (t ^ 2)))
    
    # for replacement of cases framework
    if (abs(obs_r) > abs(critical_r)) {
        action <- "to_invalidate"
        inference <- "reject_null"
        pct_bias <- 100 * (1 - (critical_r / obs_r)) }
    else if (abs(obs_r) < abs(critical_r)) {
        action <- "to_sustain"
        inference <- "fail_to_reject_null"
        pct_bias <- 100 * (1 - (obs_r / critical_r)) }
    else if (obs_r == critical_r) {
        action <- NA
        inference <- NA
        pct_bias <- NA
    }
    
    if ((abs(obs_r) > abs(critical_r)) & ((obs_r * critical_r) > 0)) {
        mp <- -1
    } else {
        mp <- 1
    }
    
    # for correlation based framework
    itcv <- (obs_r - critical_r) / (1 + mp * abs(critical_r))
    r_con <- round(sqrt(abs(itcv)), 3)
    
    out <- dplyr::data_frame(t, df, action, inference, pct_bias, itcv, r_con)
    names(out) <- c("t", "df", "action", "inference", "pct_bias_to_change_inference", "itcv", "r_con")
    
    out$pct_bias_to_change_inference <- round(out$pct_bias_to_change_inference, 3)
    out$itcv <- round(out$itcv, 3)
    out$action <- as.character(out$action)
    out$inference <- as.character(out$inference)
    
    return(out)
}

Let’s test it out. Imagine we have a t statistic of 3 for a hypothesis test associated with 100 degrees of freedom.

core_sensitivity_mkonfound(3, 100)

Works.

It looks like in order to invalidate the inference, around 32% of the effect would be need to be due to bias; or, an committed variable would need to be correlated with both the predictor of interest and the dependent variable at .339 in order to invalidate the inference. You can read more about sensitivity analysis and an in-development R package on the approach to sensitivity analysis with Ran Xu and Ken Frank here.

The second version of the function

How could we write a function to provide output not only for one t and its associated df, but rather many values?

We can write a simple function to iterate through multiple values and to bind them together. The key is the map() function (from the tidyverse package purrr; if you are familiar with R, it is similar to many of the apply() functions). Specifically, because we:

We use map2_dfr(). Check out this helpful chapter of R for Data Science for more on iteration using approaches such as for and while loops as well as the useful apply()/ map() families of functions.

Here is what a function could look like:

library(purrr)

mkonfound <- function(t, df, alpha = .05, tails = 2) {
    map2_dfr(.x = t, .y = df, .f = core_sensitivity_mkonfound)
}

Simple! But does it work? :)

Instead of passing a single t and df, as we did above with the core_sensitivity_monfound() function, we can pass vectors of t and df values:

mkonfound(t = c(3, 2, 2.5), 
          d = c(100, 200, 150))

We could also do something like binding t and df together into a small data.frame:

d <- data.frame(t = c(3, 2, 2.5), 
                df = c(100, 200, 150))
d

And then mkonfound() could work like this:

mkonfound(d$t, d$df)

The third version of the function

Still seems to work fine. Those of you familiar with the tidyverse may sense another possible improvement. Namely, the function could be written to both input and output a data.frame, and be a bit more intuitive to use via non-standard evaluation.

The goal is to add an additional argument for the data.frame (d), and then use non-standard evaluation to capture and then later evaluate in the context of the data.frame the names of the t and df columns

library(rlang)
library(dplyr)

mkonfound <- function(d, t, df, alpha = .05, tails = 2) {
    
    t_enquo <- enquo(t)
    df_enquo <- enquo(df)
    
    t = pull(select(d, !!t_enquo))
    df = pull(select(d, !!df_enquo))
    
    map2_dfr(.x = t, .y = df, .f = core_sensitivity_mkonfound)
}

But does it work? Now, the first argument is the name of the data.frame, the second is the unquoted name of the column with the t statistics, and the third is the same as the second, but for the df associated with the t’s.

mkonfound(d, t, df)

If we have an entire spreadsheet, read in R as a data.frame using the read.csv() (or, read_csv() from the very useful readr package) function, then we can easily compute output for all of the statistics in the spreadsheet. Here is a spreadsheet from a website (from Ken’s):

spreadsheet_of_vals <- read.csv("https://msu.edu/~kenfrank/example%20dataset%20for%20mkonfound.csv")
head(spreadsheet_of_vals)

We would use it the same way as above but with d replaced with what we named the data.frame we read from the website, spreadsheet_of_vals:

mkonfound(spreadsheet_of_vals, t, df)

Since the output is in a data.frame, we can, for example, easily plot output:

results_df <- mkonfound(spreadsheet_of_vals, t, df)
library(ggplot2)

results_df$action <- dplyr::case_when(
    results_df$action == "to_invalidate" ~ "To Invalidate",
    results_df$action == "to_sustain" ~ "To Sustain"
)

ggplot(results_df, aes(x = pct_bias_to_change_inference, fill = action)) +
    geom_histogram() +
    scale_fill_manual("", values = c("#1F78B4", "#A6CEE3")) +
    theme_bw() +
    ggtitle("Histogram of Percent Bias") +
    facet_grid(~ action) +
    theme(legend.position = "none") +
    ylab("Count") +
    xlab("Percent Bias")

Like many functions in R, this could be written many different ways, and this post shows just one approach to writing a function.

In some cases, non-standard evaluation makes the function a bit harder to use - particularly in cases in which we are interested in the output from only a single study.

In that case, we would want to go back to the function we initially wrote (core_sensitivity_mkonfound()) or would have to write something a bit like:

single_study <- data.frame(t = 3, df = 100)
mkonfound(single_study, t, df)

So, this is one approach that is useful in one use - for the in-development package for sensitivity analysis with a number of functions, with a version of this mkonfound() function for meta-analyses that make use of the approach.

Oh, and if you are interested in sensitivity analysis, please check out the konfound package this is for here.