What are the best rail-trails in Michigan?

2017/07/24

Background

I was curious about what rail trails were the best in Michigan, and so to figure out an answer, I checked out the TrailLink website, sponsored by the Rails-to-Trails Conservancy. I had just purchased a copy of their book Rail-Trails Michigan and Wisconsin, and wanted to see whether I could learn more from the website.

To start, I checked whether they had a way to access the reviews on the site through an API. They didn’t, so I checked their robots.txt file at http://traillink.com/robots.txt. They didn’t disallow access to their reviews for each state, so I was able to download all of the reviews for the 259 trails with reviews in Michigan.

library(tidyverse)
library(hrbrthemes)
library(viridis)
library(forcats)
library(stringr)
library(lme4)
library(broom)

f <- here::here("static", "data", "mi.rds")
df <- read_rds(f) # this is a file with the rail-trail data - you can get it from here: https://github.com/jrosen48/railtrail

df <- df %>% 
    unnest(raw_reviews) %>% 
    filter(!is.na(raw_reviews)) %>% 
    rename(raw_review = raw_reviews,
           trail_name = name) %>% 
    mutate(trail_name = str_sub(trail_name, end = -7L),
           distance = str_sub(distance, end = -6L),
           distance = as.numeric(distance),
           n_reviews = str_sub(n_reviews, end = -9L),
           n_reviews = as.numeric(n_reviews))

What are the characteristics of the best trails?

On the site, there are “surfaces” (i.e., asphalt and gravel) and “categories” (i.e., rail-trail and paved pathway), so I tried to group them into a few categories.

df <- df %>% 
    mutate(category = as.factor(category),
           category = forcats::fct_recode(category, "Greenway/Non-RT" = "Canal"),
           mean_review = ifelse(mean_review == 0, NA, mean_review))

df <- mutate(df,
             surface_rc = case_when(
                 surface == "Asphalt" ~ "Paved",
                 surface == "Asphalt, Concrete" ~ "Paved",
                 surface == "Concrete" ~ "Paved",
                 surface == "Asphalt, Boardwalk" ~ "Paved",
                 str_detect(surface, "Stone") ~ "Crushed Stone",
                 str_detect(surface, "Ballast") ~ "Crushed Stone",
                 str_detect(surface, "Gravel") ~ "Crushed Stone",
                 TRUE ~ "Other"
             )
)

Then, I checked out their mean reviews, from one to five stars.

Some trails had a ton of reviews:

df %>% 
    select(trail_name, surface_rc, category, distance, n_reviews) %>% 
    distinct() %>% 
    arrange(desc(n_reviews)) %>% 
    head(5) %>% 
    knitr::kable()
trail_name surface_rc category distance n_reviews
Lakelands Trail State Park Crushed Stone Rail-Trail 26.0 78
Pere Marquette Rail-Trail Paved Rail-Trail 30.0 75
Fred Meijer White Pine Trail State Park Crushed Stone Rail-Trail 92.6 66
William Field Memorial Hart-Montague Trail State Park Paved Rail-Trail 22.7 48
Kal-Haven Trail Sesquicentennial State Park Crushed Stone Rail-Trail 34.0 47

And some had very few reviews- 60 of the trails had only one review!

Some of these reviews for trails with one review were high (five stars):

df %>% 
    select(trail_name, surface_rc, category, distance, n_reviews, mean_review) %>% 
    distinct() %>% 
    filter(n_reviews == 1) %>% 
    arrange(desc(mean_review)) %>% 
    head(5) %>% 
    knitr::kable()
trail_name surface_rc category distance n_reviews mean_review
Big Rapids Riverwalk Crushed Stone Greenway/Non-RT 3.8 1 5
Boardman Lake Trail Crushed Stone Rail-Trail 2.0 1 5
Cannon Township Trail Paved Greenway/Non-RT 4.0 1 5
Chippewa Trail Paved Greenway/Non-RT 4.1 1 5
Grass River Natural Area Rail Trail Crushed Stone Rail-Trail 2.2 1 5

Some of the trails with one review were very low:

df %>% 
    select(trail_name, surface_rc, category, distance, n_reviews, mean_review) %>% 
    distinct() %>% 
    filter(n_reviews == 1) %>% 
    arrange(mean_review) %>% 
    head(5) %>% 
    knitr::kable()
trail_name surface_rc category distance n_reviews mean_review
Alpena to Hillman Trail Crushed Stone Rail-Trail 22.0 1 1
Felch Grade Trail Crushed Stone Rail-Trail 38.0 1 1
Interurban Trail (Kent County) Paved Rail-Trail 2.0 1 2
Linear Trail Park Paved Greenway/Non-RT 16.9 1 2
Albion River Trail Paved Rail-Trail 1.6 1 3

Building a model

To try to figure out what trails had many good reviews, I used an approach that is not an average of all of the reviews for the trail, but a rating that uses the value of the individual reviews for a trail as well as how different they are from each other and how different they are from the “average” review across every trail.

What if, intsead, we just looked at the top-reviewed trails and then sorted them by how many reviews they had? Because many trails’ average review was five, this does not help much

These ratings - model_based_rating below - are from the mixed effects model specified here:

m1 <- lmer(raw_review ~ 1 + (1|trail_name), data = df)

The data has to be merged back into the data frame with the other characteristics of the trail:

m1_tidied <- tidy(m1)

m1_fe <- filter(m1_tidied, group == "fixed")

estimated_trail_means <- ranef(m1)$trail_name %>% 
    rownames_to_column() %>% 
    as_tibble() %>% 
    rename(trail_name = rowname, estimated_mean = `(Intercept)`) %>% 
    mutate(model_based_rating = estimated_mean + m1_fe$estimate)

df_ss <- df %>% 
    group_by(trail_name) %>% 
    summarize(raw_mean = mean(raw_review))

df_out <- left_join(df_ss, estimated_trail_means)
df_out <- left_join(df_out, df)

So, where are we riding next?

Here are the top-10 trails of any length:

df_out %>% 
    select(trail_name, surface_rc, distance, category, estimated_mean, raw_mean, n_reviews) %>% 
    distinct() %>% 
    arrange(desc(estimated_mean)) %>% 
    mutate_if(is.numeric, function(x) round(x, 3)) %>% 
    head(10) %>% 
    knitr::kable()
trail_name surface_rc distance category estimated_mean raw_mean n_reviews
Saginaw Valley Rail Trail Paved 11.0 Rail-Trail 0.886 4.941 36
Clinton River Park Trail Paved 4.5 Greenway/Non-RT 0.875 4.933 17
Leelanau Trail Paved 16.6 Rail-Trail 0.829 4.900 20
Wayne County Metroparks Trail Paved 16.3 Greenway/Non-RT 0.815 4.889 9
Southern Links Trailway Other 10.2 Rail-Trail 0.811 4.853 39
Mackinac Island Loop (State Highway 185) Paved 8.3 Greenway/Non-RT 0.796 4.875 11
Detroit RiverWalk Paved 3.5 Greenway/Non-RT 0.779 5.000 3
Fred Meijer Pioneer Trail Paved 5.4 Rail-Trail 0.779 5.000 3
Grand Haven Waterfront Trail Paved 2.5 Rail-Trail 0.779 5.000 4
Granger Meadows Park Trail Paved 1.9 Greenway/Non-RT 0.779 5.000 2

What if we wanted to take a shorter trip - one less than 10 miles?

df_out %>% 
    select(trail_name, surface_rc, distance, category, estimated_mean, raw_mean, n_reviews) %>% 
    distinct() %>% 
    filter(distance < 10) %>% 
    arrange(desc(estimated_mean), desc(n_reviews)) %>% 
    head(10) %>% 
    knitr::kable()
trail_name surface_rc distance category estimated_mean raw_mean n_reviews
Clinton River Park Trail Paved 4.5 Greenway/Non-RT 0.8747665 4.933333 17
Mackinac Island Loop (State Highway 185) Paved 8.3 Greenway/Non-RT 0.7962137 4.875000 11
Grand Haven Waterfront Trail Paved 2.5 Rail-Trail 0.7789488 5.000000 4
Stony Creek Metropark Trail Paved 6.2 Greenway/Non-RT 0.7789488 5.000000 4
Detroit RiverWalk Paved 3.5 Greenway/Non-RT 0.7789488 5.000000 3
Fred Meijer Pioneer Trail Paved 5.4 Rail-Trail 0.7789488 5.000000 3
Granger Meadows Park Trail Paved 1.9 Greenway/Non-RT 0.7789488 5.000000 2
Western Gateway Trail Paved 6.0 Rail-Trail 0.7789488 5.000000 2
Paint Creek Trail (MI) Crushed Stone 8.9 Rail-Trail 0.7301712 4.785714 26
Dequindre Cut Greenway Paved 1.8 Rail-Trail 0.7091632 4.777778 12

Conclusion

This approach that uses a model is powerful because we can figure out what trails are higher (or lower) when we consider how many reviews we have about each trail. Needless to say, this approach is powerful in research, as well: Grades for students in classrooms, for example, can be analyzed in the same way if we want to learn what students are consistently performing differently (for better or worse!).

The code to download the reviews is here. The code in this post can be used to do a similar analysis.