End Of Line…

Do, or do not. There is no try.
Portfolio
DataViz
Assignment
Author

Barrie Robison

Published

April 16, 2024

OVERVIEW

This assignment provides you the opportunity to synthesize all of the concepts we’ve covered in the course to date. The basic framework is that you will create a COMPLETE data visualization BLOG post that is suitable as a showcase component of your Data Science Portfolio. The point is to SHOW people your skills.

STRUCTURE

The basic formatting guidelines for this assignment are:

  1. Include code fold or code tools options (or both) that allow users to view and copy your code while maintaining overall readability of your post.
  2. Suppress all output and warnings that might distract from your visualizations and writing.
  3. Properly title your assignment. The main title should be “BCB 520 - Final Project”, and the subtitle should be a descriptive title related to your question or topic.
  4. Include author, date, categories, and a description in your YAML header.
  5. Write clear, complete sentences for a target audience with some scientific background but little training in your specific discipline.
  6. Include references if appropriate and use hyperlinks to external sources of data, inspiration, or examples.
  7. Use the header hierarchy and create a sensible document outline with white space. Format for readability! Use bold and italic fonts to emphasize things! Use color by customizing your .css file!

In addition to the above formatting guidelines, your portfolio post must contain the following sections:

Preamble

Write a brief paragraph describing the primary question or purpose of the post. Ideally, the concept should be challenging enough that it requires at least two visualizations that use different idioms (ie. don’t just make two scatterplots with different variables). The concept should also be challenging enough that it captures the interest of the reader (i.e. a plot of height and weight that shows they are correlated is trivial and not appropriate). The best approach is to explore a topic or question in which YOU are very interested.

Data

Write a summary of the data sources you will use. Include a Data Dictionary table that fully describes each individual data file used. You may use your own research data or publicly available data from any source you like (with attribution). There aren’t any minimum or maximum data set size requirements, other than you need something big enough to be interesting and not so big that we don’t have a supercomputer capable of creating your visualization.

NEW REQUIREMENT: Your assignment must feature one of the two new data types we have considered since the midterm: NETWORK DATA or SPATIAL DATA.

Visualizations

Create your visualizations and include text that explains any steps or design choices. Be sure to include clearly labeled axes and a concise but complete figure caption for each visualization. Make deliberate choices for color palettes, point marks, line types, etc. Demonstrate that you understand the concepts we have covered!

Conclusions or Summary

Answer your research question. Draw a conclusion or inference related to your topic. Summarize your results. What new questions have emerged as a result of your visualizations? What interesting next steps have emerged?

RUBRIC

I will evaluate the following for your portfolio post:

1. Clarity of writing (15%): Complete, clear sentences. Good Grammar. Understandable to target audience. Logical flow of ideas.

2. Adherence to format (10%): Did you follow directions?

3. Topic suitability (15%): Is the topic interesting? Are the visualizations challenging and interesting enough to showcase your skills?

4. Viz Execution (40%): Are the visualizations effective? Do they adhere to the principles of effectiveness? Are choices for idiom, marks, channels, etc made deliberately and well justified?

5. Creativity (20%): Did you push your boundaries and learn new techniques? Is the overall post compelling and interesting? Are the visualizations inspiring, creative, unique, and generally impressive? If I were recruiting a new data scientist (and I often am), would this portfolio post impress me, or would it damage your candidacy during review?

Code
library(ggplot2)
Warning: package 'ggplot2' was built under R version 4.3.2
Code
library(sf)
Warning: package 'sf' was built under R version 4.3.2
Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
Code
library(tigris)
To enable caching of data, set `options(tigris_use_cache = TRUE)`
in your R script or .Rprofile.
Code
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
Code
us_counties <- tigris::counties(cb = TRUE, resolution = "20m", year = 2020, class = "sf")

  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |========                                                              |  11%
  |                                                                            
  |=========                                                             |  13%
  |                                                                            
  |==========                                                            |  15%
  |                                                                            
  |============                                                          |  17%
  |                                                                            
  |=============                                                         |  18%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |===============                                                       |  22%
  |                                                                            
  |=================                                                     |  24%
  |                                                                            
  |==================                                                    |  26%
  |                                                                            
  |===================                                                   |  28%
  |                                                                            
  |====================                                                  |  29%
  |                                                                            
  |======================                                                |  31%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |========================                                              |  34%
  |                                                                            
  |=========================                                             |  36%
  |                                                                            
  |===========================                                           |  38%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |=============================                                         |  42%
  |                                                                            
  |===============================                                       |  44%
  |                                                                            
  |================================                                      |  45%
  |                                                                            
  |=================================                                     |  47%
  |                                                                            
  |==================================                                    |  49%
  |                                                                            
  |====================================                                  |  51%
  |                                                                            
  |=====================================                                 |  53%
  |                                                                            
  |======================================                                |  55%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |=========================================                             |  58%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |===========================================                           |  62%
  |                                                                            
  |=============================================                         |  64%
  |                                                                            
  |==============================================                        |  66%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |======================================================                |  77%
  |                                                                            
  |=======================================================               |  78%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |=========================================================             |  82%
  |                                                                            
  |===========================================================           |  84%
  |                                                                            
  |============================================================          |  86%
  |                                                                            
  |=============================================================         |  87%
  |                                                                            
  |==============================================================        |  89%
  |                                                                            
  |================================================================      |  91%
  |                                                                            
  |=================================================================     |  93%
  |                                                                            
  |==================================================================    |  95%
  |                                                                            
  |====================================================================  |  97%
  |                                                                            
  |===================================================================== |  98%
  |                                                                            
  |======================================================================| 100%
Code
set.seed(123)
data <- data.frame(
  GEOID = us_counties$GEOID,
  value = runif(length(us_counties$GEOID), 0, 100)
)


us_counties_data <- left_join(us_counties, data, by = "GEOID")

ggplot() +
  geom_sf(data = us_counties_data, aes(fill = value), color = "white", size = 0.1) +
  scale_fill_viridis_c() +  # You can choose other color palettes
  theme_minimal() +
  theme(axis.text = element_blank(),
        axis.title = element_blank(),
        axis.ticks = element_blank(),
        panel.grid = element_blank())

Code
us_counties_contiguous <- us_counties %>% 
  filter(
    !(STATEFP %in% c("02", "15", "60", "66", "69", "72", "78"))
  )

us_counties_data_contiguous <- left_join(us_counties_contiguous, data, by = "GEOID")


ggplot() +
  geom_sf(data = us_counties_data_contiguous, aes(fill = value), color = "white", size = 0.1) +
  scale_fill_viridis_c() +
  theme_minimal() +
  theme(axis.text = element_blank(),
        axis.title = element_blank(),
        axis.ticks = element_blank(),
        panel.grid = element_blank())