Background

I was curious about what rail trails were the best in Michigan, and so to figure out an answer, I checked out the TrailLink website, sponsored by the Rails-to-Trails Conservancy. I had just purchased a copy of their book Rail-Trails Michigan and Wisconsin, and wanted to see whether I could learn more from the website.

To start, I checked whether they had a way to access the reviews on the site through an API. They didn’t, so I checked their robots.txt file at http://traillink.com/robots.txt. They didn’t disallow access to their reviews for each state, so I was able to download all of the reviews for the 259 trails with reviews in Michigan.

library(tidyverse)
library(hrbrthemes)
library(viridis)
library(forcats)
library(stringr)
library(lme4)
library(broom)

f <- here::here("static", "data", "mi.rds")
df <- read_rds(f) # this is a file with the rail-trail data - you can get it from here: https://github.com/jrosen48/railtrail

df <- df %>% 
    unnest(raw_reviews) %>% 
    filter(!is.na(raw_reviews)) %>% 
    rename(raw_review = raw_reviews,
           trail_name = name) %>% 
    mutate(trail_name = str_sub(trail_name, end = -7L),
           distance = str_sub(distance, end = -6L),
           distance = as.numeric(distance),
           n_reviews = str_sub(n_reviews, end = -9L),
           n_reviews = as.numeric(n_reviews))

What are the characteristics of the best trails?

On the site, there are “surfaces” (i.e., asphalt and gravel) and “categories” (i.e., rail-trail and paved pathway), so I tried to group them into a few categories.

df <- df %>% 
    mutate(category = as.factor(category),
           category = forcats::fct_recode(category, "Greenway/Non-RT" = "Canal"),
           mean_review = ifelse(mean_review == 0, NA, mean_review))

df <- mutate(df,
             surface_rc = case_when(
                 surface == "Asphalt" ~ "Paved",
                 surface == "Asphalt, Concrete" ~ "Paved",
                 surface == "Concrete" ~ "Paved",
                 surface == "Asphalt, Boardwalk" ~ "Paved",
                 str_detect(surface, "Stone") ~ "Crushed Stone",
                 str_detect(surface, "Ballast") ~ "Crushed Stone",
                 str_detect(surface, "Gravel") ~ "Crushed Stone",
                 TRUE ~ "Other"
             )
)

Then, I checked out their mean reviews, from one to five stars.

Some trails had a ton of reviews:

df %>% 
    select(trail_name, surface_rc, category, distance, n_reviews) %>% 
    distinct() %>% 
    arrange(desc(n_reviews)) %>% 
    head(5) %>% 
    knitr::kable()

trail_name	surface_rc	category	distance	n_reviews
Lakelands Trail State Park	Crushed Stone	Rail-Trail	26.0	78
Pere Marquette Rail-Trail	Paved	Rail-Trail	30.0	75
Fred Meijer White Pine Trail State Park	Crushed Stone	Rail-Trail	92.6	66
William Field Memorial Hart-Montague Trail State Park	Paved	Rail-Trail	22.7	48
Kal-Haven Trail Sesquicentennial State Park	Crushed Stone	Rail-Trail	34.0	47

And some had very few reviews- 60 of the trails had only one review!

Some of these reviews for trails with one review were high (five stars):

df %>% 
    select(trail_name, surface_rc, category, distance, n_reviews, mean_review) %>% 
    distinct() %>% 
    filter(n_reviews == 1) %>% 
    arrange(desc(mean_review)) %>% 
    head(5) %>% 
    knitr::kable()

trail_name	surface_rc	category	distance	n_reviews	mean_review
Big Rapids Riverwalk	Crushed Stone	Greenway/Non-RT	3.8	1	5
Boardman Lake Trail	Crushed Stone	Rail-Trail	2.0	1	5
Cannon Township Trail	Paved	Greenway/Non-RT	4.0	1	5
Chippewa Trail	Paved	Greenway/Non-RT	4.1	1	5
Grass River Natural Area Rail Trail	Crushed Stone	Rail-Trail	2.2	1	5

Some of the trails with one review were very low:

df %>% 
    select(trail_name, surface_rc, category, distance, n_reviews, mean_review) %>% 
    distinct() %>% 
    filter(n_reviews == 1) %>% 
    arrange(mean_review) %>% 
    head(5) %>% 
    knitr::kable()

trail_name	surface_rc	category	distance	n_reviews	mean_review
Alpena to Hillman Trail	Crushed Stone	Rail-Trail	22.0	1	1
Felch Grade Trail	Crushed Stone	Rail-Trail	38.0	1	1
Interurban Trail (Kent County)	Paved	Rail-Trail	2.0	1	2
Linear Trail Park	Paved	Greenway/Non-RT	16.9	1	2
Albion River Trail	Paved	Rail-Trail	1.6	1	3

Building a model

To try to figure out what trails had many good reviews, I used an approach that is not an average of all of the reviews for the trail, but a rating that uses the value of the individual reviews for a trail as well as how different they are from each other and how different they are from the “average” review across every trail.

What if, intsead, we just looked at the top-reviewed trails and then sorted them by how many reviews they had? Because many trails’ average review was five, this does not help much

These ratings - model_based_rating below - are from the mixed effects model specified here:

m1 <- lmer(raw_review ~ 1 + (1|trail_name), data = df)

The data has to be merged back into the data frame with the other characteristics of the trail:

m1_tidied <- tidy(m1)

m1_fe <- filter(m1_tidied, group == "fixed")

estimated_trail_means <- ranef(m1)$trail_name %>% 
    rownames_to_column() %>% 
    as_tibble() %>% 
    rename(trail_name = rowname, estimated_mean = `(Intercept)`) %>% 
    mutate(model_based_rating = estimated_mean + m1_fe$estimate)

df_ss <- df %>% 
    group_by(trail_name) %>% 
    summarize(raw_mean = mean(raw_review))

df_out <- left_join(df_ss, estimated_trail_means)
df_out <- left_join(df_out, df)

So, where are we riding next?

Here are the top-10 trails of any length:

df_out %>% 
    select(trail_name, surface_rc, distance, category, estimated_mean, raw_mean, n_reviews) %>% 
    distinct() %>% 
    arrange(desc(estimated_mean)) %>% 
    mutate_if(is.numeric, function(x) round(x, 3)) %>% 
    head(10) %>% 
    knitr::kable()

trail_name	surface_rc	distance	category	estimated_mean	raw_mean	n_reviews
Saginaw Valley Rail Trail	Paved	11.0	Rail-Trail	0.886	4.941	36
Clinton River Park Trail	Paved	4.5	Greenway/Non-RT	0.875	4.933	17
Leelanau Trail	Paved	16.6	Rail-Trail	0.829	4.900	20
Wayne County Metroparks Trail	Paved	16.3	Greenway/Non-RT	0.815	4.889	9
Southern Links Trailway	Other	10.2	Rail-Trail	0.811	4.853	39
Mackinac Island Loop (State Highway 185)	Paved	8.3	Greenway/Non-RT	0.796	4.875	11
Detroit RiverWalk	Paved	3.5	Greenway/Non-RT	0.779	5.000	3
Fred Meijer Pioneer Trail	Paved	5.4	Rail-Trail	0.779	5.000	3
Grand Haven Waterfront Trail	Paved	2.5	Rail-Trail	0.779	5.000	4
Granger Meadows Park Trail	Paved	1.9	Greenway/Non-RT	0.779	5.000	2

What if we wanted to take a shorter trip - one less than 10 miles?

df_out %>% 
    select(trail_name, surface_rc, distance, category, estimated_mean, raw_mean, n_reviews) %>% 
    distinct() %>% 
    filter(distance < 10) %>% 
    arrange(desc(estimated_mean), desc(n_reviews)) %>% 
    head(10) %>% 
    knitr::kable()

trail_name	surface_rc	distance	category	estimated_mean	raw_mean	n_reviews
Clinton River Park Trail	Paved	4.5	Greenway/Non-RT	0.8747665	4.933333	17
Mackinac Island Loop (State Highway 185)	Paved	8.3	Greenway/Non-RT	0.7962137	4.875000	11
Grand Haven Waterfront Trail	Paved	2.5	Rail-Trail	0.7789488	5.000000	4
Stony Creek Metropark Trail	Paved	6.2	Greenway/Non-RT	0.7789488	5.000000	4
Detroit RiverWalk	Paved	3.5	Greenway/Non-RT	0.7789488	5.000000	3
Fred Meijer Pioneer Trail	Paved	5.4	Rail-Trail	0.7789488	5.000000	3
Granger Meadows Park Trail	Paved	1.9	Greenway/Non-RT	0.7789488	5.000000	2
Western Gateway Trail	Paved	6.0	Rail-Trail	0.7789488	5.000000	2
Paint Creek Trail (MI)	Crushed Stone	8.9	Rail-Trail	0.7301712	4.785714	26
Dequindre Cut Greenway	Paved	1.8	Rail-Trail	0.7091632	4.777778	12

Conclusion

This approach that uses a model is powerful because we can figure out what trails are higher (or lower) when we consider how many reviews we have about each trail. Needless to say, this approach is powerful in research, as well: Grades for students in classrooms, for example, can be analyzed in the same way if we want to learn what students are consistently performing differently (for better or worse!).

The code to download the reviews is here. The code in this post can be used to do a similar analysis.

What are the best rail-trails in Michigan?

2017/07/24

Background

What are the characteristics of the best trails?

Building a model

So, where are we riding next?

Conclusion