Basic {sf} functionality for analyzing and plotting geospatial data

Joshua Rosenberg

2020/06/08

This post has some information on using the {sf} (or simple features) R package for analyzing and plotting geospatial data. It is mostly for myself to reference in the future. A lot of this is from the Geocomputation With R book.

Loading packages, setting up

Load the {sf} and {tidyverse} packages:

library(sf)
library(tidyverse)

Creating test data

Create some test data. Here are five longitudes and latitudes. Two are special

The others are random (and are not within the United States).

points <- data.frame(lon = c(-65,-60, -55, -83.92, -84.55), lat = c(50, 45, 40, 35.96, 42.73))

The following function creates an sf object (telling R which columns within points are the longitude and latitude).

points <- st_as_sf(points, coords = c("lon", "lat"))

One thing I’m uncertain about is how {sf} knows that “lon” and “lat” have special meanings; what if, instead, I had named these something else? Would {sf} know how to prepare to plot them in the correct way? Let me know if you know the answer.

Using sf objects as data frames

Attributes can be added to the sf object. The book suggests thinking of sf objects as containing spatial (e.g. lon, lat) data, and attributes; in addition to/because of this, sf objects can be treated as data frames (in many/most cases).

Here, I add names to the five points, using indexing as if the object was a data frame.

points$name <- c("random_place_c", "random_place_b", "random_place_c", "knoxville", "lansing")

It’s possible to drop the spatial/geometric part (class) of the object, which can be useful in some cases; the book mentions that the spatial part is “sticky” and can be difficult to ignore, even when you use the object as a data frame.

st_drop_geometry(points)
##             name
## 1 random_place_c
## 2 random_place_b
## 3 random_place_c
## 4      knoxville
## 5        lansing

Here, it is easy to use a dplyr verb to remove the random places:

points %>% 
    filter(!str_detect(name, "random_place"))
## Simple feature collection with 2 features and 1 field
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: -84.55 ymin: 35.96 xmax: -83.92 ymax: 42.73
## CRS:            NA
##               geometry      name
## 1 POINT (-83.92 35.96) knoxville
## 2 POINT (-84.55 42.73)   lansing

Spatial operations

This step was the big leap for me, but it also illustrates why it can be useful to treat spatial data in a special way.

U.S. states can be represented with a shape representing each state’s boundaries.

There are a number of R packages that provide these shapes in sf format. I am using USAboundaries (mostly because I found it first).

library(USAboundaries) 

USA <- us_boundaries()

I’m not fully grasping the details, but coordinate reference systems (CRSs) link spatial coordinates (e.g., lon, lat) to the three-dimensional shape of the Earth. There are short codes for these; a common CRS is 4269 (I have more to learn about this). This is important for the next step, the spatial operation.

USA <- st_set_crs(USA, "4269")

st_within() provides information about relationships between pairs of sf objects. There are many spatial operations in addition to st_within(), including st_intersects() and st_overlaps().

points_within_USA <- st_within(points, USA)

points_within_USA
## Sparse geometry binary predicate list of length 5, where the predicate was `within'
##  1: (empty)
##  2: (empty)
##  3: (empty)
##  4: 23
##  5: 18

This can be turned into an integer vector (or another kind of vector, like a logical vector).

as.integer(points_within_USA)
## [1] NA NA NA 23 18

This can be used to index USA to see which state the points in dd are within:

USA[as.integer(points_within_USA), ] %>% 
    filter(!is.na(name))
## Simple feature collection with 2 features and 12 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -90.41814 ymin: 34.98299 xmax: -81.6469 ymax: 48.21065
## CRS:            NA
##   statefp  statens    affgeoid geoid stusps      name lsad        aland
## 1      47 01325873 0400000US47    47     TN Tennessee   00 106797662267
## 2      26 01779789 0400000US26    26     MI  Michigan   00 146455251245
##         awater state_name state_abbr jurisdiction_type
## 1   2355188876  Tennessee         TN             state
## 2 104031574060   Michigan         MI             state
##                         geometry
## 1 MULTIPOLYGON (((-90.3007 35...
## 2 MULTIPOLYGON (((-84.61622 4...

It may be easier to specify the column with the name of the state, too

USA[as.integer(points_within_USA), "name"] %>% 
    filter(!is.na(name))
## Simple feature collection with 2 features and 1 field
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -90.41814 ymin: 34.98299 xmax: -81.6469 ymax: 48.21065
## CRS:            NA
##        name                       geometry
## 1 Tennessee MULTIPOLYGON (((-90.3007 35...
## 2  Michigan MULTIPOLYGON (((-84.61622 4...

Plotting

sf objects can be plotted with ggplot2 using the sf geom; here is a plot of the state boundaries and the five points.

I’ll plot only the continental U.S.

ggplot() +
    geom_sf(data = filter(USA, name != "Hawaii" & name != "Alaska")) +
    geom_sf(data = points)

The projection (and theme) of the map can certainly be improved; this is more to demonstrate that {sf} objects can be plotted with {ggplot2}.

Many packages make it easier to plot spatial data of interest. Here is a plot of all of the school districts in Tennessee.

library(leaidr)
library(mapproj)

# only needed the first time
# lea_get(path = "./leaid_sh")

# 47 is the FIPS code for TN
tn <- lea_prep(path = "./leaid_sh", fips = "47")
## OGR data source with driver: ESRI Shapefile 
## Source: "/Users/jrosenb8/utk-homepage/content/post/leaid_sh/schooldistrict_sy1819_tl19.shp", layer: "schooldistrict_sy1819_tl19"
## with 13315 features
## It has 18 fields
tn_df <- ggplot2::fortify(tn)

tn_map <-
    ggplot() +
    geom_path(data = tn, 
              aes(x = long, y = lat, group = group),
              color = "gray", size = .2) +
    theme_void()

tn_map_projected <- 
    tn_map +
    coord_map()

print(tn_map_projected)

(h/t Isabella Velásquez for making this so seamless (even with fairly ginormous files) to do with the {leaidr} package).

fin

There is a lot more (for me) to learn; this is just a start.

Resources: