Writing a recursive function in R - or, tweets on tweets

Background

Recursion has been a topic I’ve heard about, and even read about it, but have never used. Recently, I was analyzing data for the Next Generation Science Standards hashtag on Twitter, and was puzzled to find that some of the Tweets I had accessed were replies to Tweets which were themselves not in the data set.

Fortunately, the rtweet R package returns data on which Tweet a reply is to; using these, it is possible (using rtweet) to access the original Tweet.

But, this raises an issue: What if the Tweet a reply is to is itself a reply (to another Tweet)? This problem starts to become a little complex. The solution I had tried was to use a while loop; this worked basically fine, but I sensed there might be a better way. I talked this through with a student, and determined to see whether I could get it to work this way.

This post represents my attempt to write a recursive function - one that involves calling itself.

First, let’s access the data. I use rtweet to access 1,000 recent Tweets with the #rstats hashtag.

``````library(rtweet)

d <- search_tweets("#rstats", n = 1000)``````

Writing a recursive function

Here is my attempt at writing a recursive function; it takes one argument, `statuses`, which is the status ids for the Tweets a reply is to; the `reply_to_status_id` column in the data collected from rtweet.

``````get_replies_recursive <- function(statuses, existing_data = NULL) {
statuses <- statuses[!is.na(statuses)] # this line removes the NAs, which are very common because most Tweets are not replies
new_data <- lookup_statuses(statuses)

print(paste0("Accessed ", nrow(new_data), " new Tweets"))

if (is.null(existing_data)) {
out_data <- new_data
} else {
out_data <- rbind(new_data, existing_data)
}

if (length(new_statuses) > 0) {
# this is the key line where the function calls itself, but passing new statuses
get_replies_recursive(new_statuses, existing_data = out_data)
} else {
print(paste0("Accessed ", nrow(out_data), " new Tweets in total"))
return(out_data)
}
}``````

Using the function

Using it is simple; you just pass the variable for the status ids for the Tweets a reply is to.

``new_replies <- get_replies_recursive(d\$reply_to_status_id)``
``````## [1] "Accessed 17 new Tweets"
## [1] "Accessed 4 new Tweets"
## [1] "Accessed 4 new Tweets"
## [1] "Accessed 3 new Tweets"
## [1] "Accessed 1 new Tweets"
## [1] "Accessed 1 new Tweets"
## [1] "Accessed 1 new Tweets"
## [1] "Accessed 31 new Tweets in total"``````

fin

I’m unsure exactly how much more better (clear, efficient) using recursion is here, compared to the former approach I tried; I think this code is a bit simpler. I am also unsure how generally useful recursion is in real-life programming. Lastly, I’m not sure I did this quite right; I went down a bit of a recursive rabbit holem and the part that is most unusual to me is that the result is something like iteration, arrived at a different way. But, this seemed to work, and I welcome any feedback about it.