Practice with Spatial Data

Malaria, Ocean Currents, Baseball…

Maps and Spatial Fields are fun!
Portfolio
DataViz
Spatial
GGPlot
Assignment
Author

Barrie Robison

Published

April 18, 2024

OVERVIEW

In this assignment, we’ll consider some of the tools and techniques for visualizing spatial data. Spatial data comes in two broad categories, geographic and spatial fields. Let’s practice a few visualizations to get a feel for how these things work!

GEOGRAPHIC MAPS!

In class I bet Ronald that he would end up creating some kind of map based visualization before he graduated with his PHD. This is because he works on Malaria - a terrible disease with a strong spatial component to its risk levels. Let’s get some Malaria data and map it!

The data I obtained were from the Malaria Atlas. I downloaded a csv for 10 years of data for all the countries the had on file.

Code
library(tidyverse)
Warning: package 'ggplot2' was built under R version 4.3.2
Warning: package 'tidyr' was built under R version 4.3.2
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.0     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Code
library(readxl)
library(rnaturalearth)
library(rnaturalearthdata)
Warning: package 'rnaturalearthdata' was built under R version 4.3.2

Attaching package: 'rnaturalearthdata'

The following object is masked from 'package:rnaturalearth':

    countries110
Code
library(dplyr)

Malaria <- read.csv("National_Unit_data.csv")

Incidence<- Malaria%>%
  filter(Metric == "Infection Prevalence")%>%
  mutate(Prevalence = Value, Year = as.factor(Year))

#%>%
  #select(c(ISO3, Prevalence, Year))

Now I’m going to use the rnaturalearth package to create contry polygons. Then I’ll add the Malaria data to that data frame.

Code
world_map <- ne_countries(scale = "medium", returnclass = "sf")

map_data <- world_map %>%
  left_join(Incidence, by = c("iso_a3" = "ISO3"))%>%
  filter(!is.na(Prevalence))

Now I will make a plot!

Code
library(gganimate)
Warning: package 'gganimate' was built under R version 4.3.2
Code
library(transformr)
Warning: package 'transformr' was built under R version 4.3.2
Code
library(magick)
Linking to ImageMagick 6.9.12.93
Enabled features: cairo, fontconfig, freetype, heic, lcms, pango, raw, rsvg, webp
Disabled features: fftw, ghostscript, x11
Code
library(gifski)
# 
ggplot() +
  geom_sf(data = map_data%>%
            filter(continent=="Africa"),
          aes(fill = Prevalence)) +
  scale_fill_gradient(low = "white", high = "red", na.value = "gray", name = "Malaria Prevalence") +
  theme_minimal() +
  theme(axis.text = element_blank(), axis.ticks = element_blank(), axis.title = element_blank()) +
  labs(title = "Malaria Prevalence by Country")

Oh SNAP! We made a MAP!

Hey! That rhymes!

What is missing? Units? Is that actually prevalence? We sure left a lot of data on the table? Can we add some kind of time thing? Change the variable?

SPATIAL FIELDS

Spatial field data refers to data that has a continuous spatial distribution and can be measured at any location within the defined area. Here are some interesting examples of spatial field data:

  1. Air temperature: Air temperature data collected from weather stations or remote sensing technologies can be used to create temperature maps or to study climate change, urban heat islands, and other environmental phenomena. TROPICAL CYCLONE!

  2. Precipitation: Rainfall, snowfall, or other forms of precipitation data collected from weather stations or satellites can be used to study the hydrological cycle, flood risk, droughts, or water resource management.

  3. Soil moisture: Soil moisture data collected from in situ sensors or remote sensing technologies can be used to study agricultural productivity, irrigation management, droughts, and land degradation. Elevation data (Digital Elevation Models, DEMs):

  4. Elevation data collected from satellite-based radar, LiDAR, or photogrammetry can be used to study topography, watershed delineation, flood risk, landslides, or geomorphology.

  5. Vegetation indices: Indices like the Normalized Difference Vegetation Index (NDVI) or Enhanced Vegetation Index (EVI) derived from satellite imagery can be used to study vegetation health, land cover change, deforestation, agricultural productivity, and carbon sequestration.

  6. Air quality: Data on air pollutants like PM2.5, PM10, NO2, SO2, O3, and CO collected from ground-based monitors or satellites can be used to study the impact of pollution on human health, urban planning, or environmental policy.

  7. Oceanographic data: Sea surface temperature, salinity, and chlorophyll-a concentration data collected from buoys, ships, or satellites can be used to study ocean currents, climate change, or marine ecosystems. OCEAN CURRENTS!

  8. Population density: Spatially explicit population density data can be used to study urbanization, migration patterns, infrastructure planning, or public health.

  9. Land use and land cover: Land use and land cover data collected from satellite imagery can be used to study urban growth, deforestation, habitat fragmentation, or landscape ecology.

  10. Seismic activity: Spatial distribution of earthquakes and their magnitudes can be used to study tectonics, fault zones, seismic hazards, or infrastructure resilience. DARK NIGHTS IN ANTAKYA

  11. SPORTS! Let’s check out a baseball example!

Code
#install.packages("baseballr")
library(remotes)

library(devtools)
Loading required package: usethis

Attaching package: 'usethis'
The following object is masked from 'package:remotes':

    git_credentials

Attaching package: 'devtools'
The following objects are masked from 'package:remotes':

    dev_package_deps, install_bioc, install_bitbucket, install_cran,
    install_deps, install_dev, install_git, install_github,
    install_gitlab, install_local, install_svn, install_url,
    install_version, update_packages
Code
#install_github("bayesball/CalledStrike")
Code
library(CalledStrike)
Loading required package: shiny
Warning: package 'shiny' was built under R version 4.3.2
Loading required package: baseballr
Loading required package: mgcv
Loading required package: nlme

Attaching package: 'nlme'
The following object is masked from 'package:dplyr':

    collapse
This is mgcv 1.8-42. For overview type 'help("mgcv-package")'.
Loading required package: metR
Warning: package 'metR' was built under R version 4.3.2

Attaching package: 'metR'
The following object is masked from 'package:purrr':

    cross
Code
library(baseballr)
#ShinyDemo()

This example is the Pitch_Locations example from byaesball’s CalledStrike Github Repository

Introduction

There are three functions for visualizing pitch locations.

  • The function location_compare() will graph the pitch location for a data frame or list of data frames.

  • The function location_count() will show the locations of pitches for a specific pitcher on a particular count.

  • The function location_count_compare() will graph the pitch locations for a specific pitcher for several values of the count.

Data

The package includes the dataset sc_pitchers_2019 that contains Statcast data for 20 pitchers for the 2019 season.

Pitch Locations for a List

Suppose we want to compare the locations of the fastballs thrown by Aaron Nola and Trevor Bauer.

I find the subset of data I need and then create a list dividing the data by pitcher.

Code
d <- filter(sc_pitchers_2019, 
            pitcher %in% c(605400, 545333),
            pitch_type == "FF")
ds <- split(d, d$pitcher)
names(ds) <- c("Bauer", "Nola")

Now we can construct the graph.

Code
location_compare(ds)
Warning: Removed 22 rows containing non-finite outside the scale range
(`stat_density2d_filled()`).

Pitch Locations for a Specific Count

Suppose we want to look at the locations of Aaron Nola’s pitches on a 0-0 count. I can find Nola’s MLBAM id number by use of the chadwick dataset (also included in the package) that contains the id numbers for all players.

Code
chadwick %>% 
  filter(name_last == "Nola", name_first == "Aaron")
  name_first name_last key_mlbam
1      Aaron      Nola    605400

To produce the graph, type

Code
location_count(sc_pitchers_2019, 
               605400, "Aaron Nola", "0-0")
Warning: Removed 4 rows containing non-finite outside the scale range
(`stat_density2d_filled()`).

Pitch Locations Across a Group of Counts

Suppose we want to compare Nola’s pitch locations across the counts “0-0”, “1-0”, “0-1”, “0-2”

Code
location_count_compare(sc_pitchers_2019, 
               605400, "Aaron Nola", 
               "R", "Offspeed", 
               c("0-0", "1-0", "0-1", "0-2"))
Warning: Removed 4 rows containing non-finite outside the scale range
(`stat_density2d_filled()`).

END