I would solve the problem as follows. First, load the packages. tmap
is used just for the map, you can easily ignore that
library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
library(sfheaders)
library(tmap)
Download and read-in data
temp_zip <- tempfile(fileext = ".zip")
download.file(
"https://www2.census.gov/geo/tiger/GENZ2019/shp/cb_2019_us_state_500k.zip",
destfile = temp_zip
)
unzip(temp_zip, exdir = tempdir())
us <- st_read(file.path(tempdir(), "cb_2019_us_state_500k.shp"))
#> Reading layer `cb_2019_us_state_500k' from data source `C:UsersUtenteAppDataLocalTempRtmpCakscOcb_2019_us_state_500k.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 56 features and 9 fields
#> geometry type: MULTIPOLYGON
#> dimension: XY
#> bbox: xmin: -179.1489 ymin: -14.5487 xmax: 179.7785 ymax: 71.36516
#> geographic CRS: NAD83
Filter down to the lower 48 + DC.
us = us[us$STUSPS %in% setdiff(c(state.abb, "DC"), c("AK", "HI")),]
us = st_transform(us, crs = 2163)
Define increment, grid and sfc
object. The key part is to create and sfc_point
object for subsequent operations
increment = 1e5
g = expand.grid(
x = seq(st_bbox(us)$xmin, st_bbox(us)$xmax, by = increment),
y = seq(st_bbox(us)$ymin, st_bbox(us)$ymax, by = increment)
)
g_sfc <- sfc_point(as.matrix(g)) %>%
st_set_crs(2163)
Find the ID(s) of points included in the US
my_ids <- unlist(st_contains(us, g_sfc))
Visualise result
tm_shape(g_sfc) +
tm_dots(col = "grey", size = 0.05) +
tm_shape(g_sfc[my_ids]) +
tm_dots(col = "darkred", size = 0.05) +
tm_shape(us) +
tm_borders(lwd = 1.3)
Repeat for 1e3 (but I won't add any plot since that's almost 13 million points)
increment = 1e3
g = expand.grid(
x = seq(st_bbox(us)$xmin, st_bbox(us)$xmax, by = increment),
y = seq(st_bbox(us)$ymin, st_bbox(us)$ymax, by = increment)
)
It takes approximately 20 seconds to generate the data
system.time({
g_sfc <- sfc_point(as.matrix(g)) %>%
st_set_crs(2163)
})
#> user system elapsed
#> 16.29 0.92 17.27
and 80 seconds to find the IDs of the points within US.
system.time({
my_ids <- unlist(st_contains(us, g_sfc))
})
#> user system elapsed
#> 67.75 8.32 80.86
Created on 2021-01-13 by the reprex package (v0.3.0)
If you need something even more efficient, I suggest you polyclip
.