This post has some information on using the {sf} (or simple features) R package for analyzing and plotting geospatial data. It is mostly for myself to reference in the future. A lot of this is from the Geocomputation With R book.
Loading packages, setting up
Load the {sf} and {tidyverse} packages:
library(sf)
library(tidyverse)
Creating test data
Create some test data. Here are five longitudes and latitudes. Two are special
- the fourth pair correspond to Knoxville, TN, USA
- the fifth pair corresponds to Lansing, MI, USA
The others are random (and are not within the United States).
points <- data.frame(lon = c(-65,-60, -55, -83.92, -84.55), lat = c(50, 45, 40, 35.96, 42.73))
The following function creates an sf object (telling R which columns within
points
are the longitude and latitude).
points <- st_as_sf(points, coords = c("lon", "lat"))
One thing I’m uncertain about is how {sf} knows that “lon” and “lat” have special meanings; what if, instead, I had named these something else? Would {sf} know how to prepare to plot them in the correct way? Let me know if you know the answer.
Using sf objects as data frames
Attributes can be added to the sf object. The book suggests thinking of sf objects as containing spatial (e.g. lon, lat) data, and attributes; in addition to/because of this, sf objects can be treated as data frames (in many/most cases).
Here, I add names to the five points, using indexing as if the object was a data frame.
points$name <- c("random_place_c", "random_place_b", "random_place_c", "knoxville", "lansing")
It’s possible to drop the spatial/geometric part (class) of the object, which can be useful in some cases; the book mentions that the spatial part is “sticky” and can be difficult to ignore, even when you use the object as a data frame.
st_drop_geometry(points)
## name
## 1 random_place_c
## 2 random_place_b
## 3 random_place_c
## 4 knoxville
## 5 lansing
Here, it is easy to use a dplyr verb to remove the random places:
points %>%
filter(!str_detect(name, "random_place"))
## Simple feature collection with 2 features and 1 field
## geometry type: POINT
## dimension: XY
## bbox: xmin: -84.55 ymin: 35.96 xmax: -83.92 ymax: 42.73
## CRS: NA
## geometry name
## 1 POINT (-83.92 35.96) knoxville
## 2 POINT (-84.55 42.73) lansing
Spatial operations
This step was the big leap for me, but it also illustrates why it can be useful to treat spatial data in a special way.
U.S. states can be represented with a shape representing each state’s boundaries.
There are a number of R packages that provide these shapes in sf format. I am using USAboundaries (mostly because I found it first).
library(USAboundaries)
USA <- us_boundaries()
I’m not fully grasping the details, but coordinate reference systems (CRSs) link spatial coordinates (e.g., lon, lat) to the three-dimensional shape of the Earth. There are short codes for these; a common CRS is 4269 (I have more to learn about this). This is important for the next step, the spatial operation.
USA <- st_set_crs(USA, "4269")
st_within()
provides information about relationships between pairs of sf objects.
There are many spatial operations in addition to st_within()
, including st_intersects()
and st_overlaps()
.
points_within_USA <- st_within(points, USA)
points_within_USA
## Sparse geometry binary predicate list of length 5, where the predicate was `within'
## 1: (empty)
## 2: (empty)
## 3: (empty)
## 4: 23
## 5: 18
This can be turned into an integer vector (or another kind of vector, like a logical vector).
as.integer(points_within_USA)
## [1] NA NA NA 23 18
This can be used to index USA
to see which state the points in dd
are within:
USA[as.integer(points_within_USA), ] %>%
filter(!is.na(name))
## Simple feature collection with 2 features and 12 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -90.41814 ymin: 34.98299 xmax: -81.6469 ymax: 48.21065
## CRS: NA
## statefp statens affgeoid geoid stusps name lsad aland
## 1 47 01325873 0400000US47 47 TN Tennessee 00 106797662267
## 2 26 01779789 0400000US26 26 MI Michigan 00 146455251245
## awater state_name state_abbr jurisdiction_type
## 1 2355188876 Tennessee TN state
## 2 104031574060 Michigan MI state
## geometry
## 1 MULTIPOLYGON (((-90.3007 35...
## 2 MULTIPOLYGON (((-84.61622 4...
It may be easier to specify the column with the name of the state, too
USA[as.integer(points_within_USA), "name"] %>%
filter(!is.na(name))
## Simple feature collection with 2 features and 1 field
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -90.41814 ymin: 34.98299 xmax: -81.6469 ymax: 48.21065
## CRS: NA
## name geometry
## 1 Tennessee MULTIPOLYGON (((-90.3007 35...
## 2 Michigan MULTIPOLYGON (((-84.61622 4...
Plotting
sf objects can be plotted with ggplot2 using the sf geom; here is a plot of the state boundaries and the five points.
I’ll plot only the continental U.S.
ggplot() +
geom_sf(data = filter(USA, name != "Hawaii" & name != "Alaska")) +
geom_sf(data = points)
The projection (and theme) of the map can certainly be improved; this is more to demonstrate that {sf} objects can be plotted with {ggplot2}.
Many packages make it easier to plot spatial data of interest. Here is a plot of all of the school districts in Tennessee.
library(leaidr)
library(mapproj)
# only needed the first time
# lea_get(path = "./leaid_sh")
# 47 is the FIPS code for TN
tn <- lea_prep(path = "./leaid_sh", fips = "47")
## OGR data source with driver: ESRI Shapefile
## Source: "/Users/jrosenb8/utk-homepage/content/post/leaid_sh/schooldistrict_sy1819_tl19.shp", layer: "schooldistrict_sy1819_tl19"
## with 13315 features
## It has 18 fields
tn_df <- ggplot2::fortify(tn)
tn_map <-
ggplot() +
geom_path(data = tn,
aes(x = long, y = lat, group = group),
color = "gray", size = .2) +
theme_void()
tn_map_projected <-
tn_map +
coord_map()
print(tn_map_projected)
(h/t Isabella Velásquez for making this so seamless (even with fairly ginormous files) to do with the {leaidr} package).
fin
There is a lot more (for me) to learn; this is just a start.
Resources: