Background
Recursion has been a topic I’ve heard about, and even read about it, but have never used. Recently, I was analyzing data for the Next Generation Science Standards hashtag on Twitter, and was puzzled to find that some of the Tweets I had accessed were replies to Tweets which were themselves not in the data set.
Fortunately, the rtweet R package returns data on which Tweet a reply is to; using these, it is possible (using rtweet) to access the original Tweet.
But, this raises an issue: What if the Tweet a reply is to is itself a reply (to another Tweet)? This problem starts to become a little complex. The solution I had tried was to use a while loop; this worked basically fine, but I sensed there might be a better way. I talked this through with a student, and determined to see whether I could get it to work this way.
This post represents my attempt to write a recursive function - one that involves calling itself.
Twitter Data
First, let’s access the data. I use rtweet to access 1,000 recent Tweets with the #rstats hashtag.
library(rtweet)
d <- search_tweets("#rstats", n = 1000)
Writing a recursive function
Here is my attempt at writing a recursive function; it takes one argument, statuses
, which is the status ids for the Tweets a reply is to; the reply_to_status_id
column in the data collected from rtweet.
get_replies_recursive <- function(statuses, existing_data = NULL) {
statuses <- statuses[!is.na(statuses)] # this line removes the NAs, which are very common because most Tweets are not replies
new_data <- lookup_statuses(statuses)
print(paste0("Accessed ", nrow(new_data), " new Tweets"))
new_statuses <- new_data$reply_to_status_id[!is.na(new_data$reply_to_status_id)]
if (is.null(existing_data)) {
out_data <- new_data
} else {
out_data <- rbind(new_data, existing_data)
}
if (length(new_statuses) > 0) {
# this is the key line where the function calls itself, but passing new statuses
get_replies_recursive(new_statuses, existing_data = out_data)
} else {
print(paste0("Accessed ", nrow(out_data), " new Tweets in total"))
return(out_data)
}
}
Using the function
Using it is simple; you just pass the variable for the status ids for the Tweets a reply is to.
new_replies <- get_replies_recursive(d$reply_to_status_id)
## [1] "Accessed 17 new Tweets"
## [1] "Accessed 4 new Tweets"
## [1] "Accessed 4 new Tweets"
## [1] "Accessed 3 new Tweets"
## [1] "Accessed 1 new Tweets"
## [1] "Accessed 1 new Tweets"
## [1] "Accessed 1 new Tweets"
## [1] "Accessed 31 new Tweets in total"
fin
I’m unsure exactly how much more better (clear, efficient) using recursion is here, compared to the former approach I tried; I think this code is a bit simpler. I am also unsure how generally useful recursion is in real-life programming. Lastly, I’m not sure I did this quite right; I went down a bit of a recursive rabbit holem and the part that is most unusual to me is that the result is something like iteration, arrived at a different way. But, this seemed to work, and I welcome any feedback about it.