Are we good at grants and stuff?
Assignment
DataViz
Tables
Project
Author

Barrie Robison

Published

March 7, 2024

OVERVIEW

This assignment provides you the opportunity to synthesize all of the concepts we’ve covered in the course to date. The basic framework is that you will create a COMPLETE data visualization BLOG post that is suitable as a showcase component of your Data Science Portfolio. The point is to SHOW people your skills.

STRUCTURE

The basic formatting guidelines for this assignment are:

  1. Include code fold or code tools options (or both) that allow users to view and copy your code while maintaining overall readability of your post.
  2. Suppress all output and warnings that might distract from your visualizations and writing.
  3. Properly title your assignment. The main title should be “BCB 520 - Midterm Portfolio Post”, and the subtitle should be a descriptive title related to the topic.
  4. Include author, date, categories, and a description in your YAML header.
  5. Write clear, complete sentences for a target audience with some scientific background but little training in formal data science. Imagine your report is intended for high level University leadership (presidents and vice presidents) and the State Board of Education.
  6. Use the header hierarchy and create a sensible document outline with white space. Format for readability! Use bold and italic fonts to emphasize things! Use color by customizing your .css file!

In addition to the above formatting guidelines, your portfolio post must contain the following sections:

Preamble

Write a brief paragraph describing the primary questions or purpose of the post. Use clear, concise language to justify the post to a naive reader. Why is the question or purpose important?

Data

Write a summary of the data sources you have used. Include a Data Dictionary table that fully describes each individual data source used. You may supplement the data supplied to you with additional sources if you like, but this is not required.

Visualizations

Create your visualizations in response to the questions and prompts. Answer the questions and prompts directly. Draw a conclusion or inference related to each. Identify limitations and the types of data you would need to mitigate those limitations. Also include text that explains any steps or design choices you considered while exploring the vizualization options [this normally wouldn’t appear in the report to the vice president, but I’d like a window into your process]. Be sure to include clearly labeled axes and a concise but complete figure caption for each visualization. Make deliberate choices for color palettes, point marks, line types, etc. Demonstrate that you understand the concepts we have covered!

Conclusions or Summary

Summarize your results. What new questions have emerged as a result of your visualizations? What interesting next steps have emerged?

RUBRIC

I will evaluate the following for your portfolio post:

1. Clarity of writing (15%): Complete, clear sentences. Good Grammar. Understandable to target audience. Logical flow of ideas.

2. Adherence to format (15%): Did you follow directions?

3. Viz Execution (40%): Are the visualizations effective? Do they adhere to the principles of effectiveness? Are choices for idiom, marks, channels, etc made deliberately and well justified?

5. Creativity (30%): Did you push your boundaries and learn new techniques? Is the overall post compelling and interesting? Are the visualizations inspiring, creative, unique, and generally impressive? If I were recruiting a new data scientist (and I often am), would this portfolio post impress me, or would it damage your candidacy during review?

TASK

I am assigning you a real task that has multiple paths to a high quality response. It is related to one of the projects within IIDS, in which we are helping the UI Office of Research and Economic Development leverage institutional data for strategic decisions.

In this case, we’ll work with award (grant) data from four federal sponsors:

  1. The National Science Foundation
  2. The National Institutes of Health
  3. The Department of Energy
  4. The US Department of Agriculture

THE DATA

Department of Agriculture (NIFA)

Their award database is here, and I’ve downloaded the awards to the University of Idaho and put the file (USDAtoUI) in the post directory.

Department of Energy

I’ve downloaded their whole award data set from here and put the file (DOEawards.xlsx) in the github directory for this post.

National Institutes of Health

We can grab data from the NIH API, which is well described here.

National Science Foundation

The NSF also has an API, and the following code will pull down awards to the University of Idaho into a data frame called NSFtoUI.

Code
library(httr)
library(jsonlite)
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter()  masks stats::filter()
✖ purrr::flatten() masks jsonlite::flatten()
✖ dplyr::lag()     masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Code
library(readxl)


# Base URL for the API
base_url <- "https://www.research.gov/awardapi-service/v1/awards.json?awardeeName=%22regents+of+the+university+of+idaho%22"

printFields <- "rpp,offset,id,agency,awardeeCity,awardeeCountryCode,awardeeDistrictCode,awardeeName,awardeeStateCode,awardeeZipCode,cfdaNumber,coPDPI,date,startDate,expDate,estimatedTotalAmt,fundsObligatedAmt,ueiNumber,fundProgramName,parentUeiNumber,pdPIName,perfCity,perfCountryCode,perfDistrictCode,perfLocation,perfStateCode,perfZipCode,poName,primaryProgram,transType,title,awardee,poPhone,poEmail,awardeeAddress,perfAddress,publicationResearch,publicationConference,fundAgencyCode,awardAgencyCode,projectOutComesReport,abstractText,piFirstName,piMiddeInitial,piLastName,piEmail"

# Initialize an empty data frame to store results
NSFtoUI <- tibble()

# Number of results per page (as per API settings)
results_per_page <- 25

# Variable to keep track of the current page number
current_page <- 1

# Variable to control the loop
keep_going <- TRUE

while(keep_going) {
    # Calculate the offset for the current page
    offset <- (current_page - 1) * results_per_page + 1

    # Construct the full URL with offset
    url <- paste0(base_url, "&offset=", offset, "&printFields=", printFields)

    # Make the API call
    response <- GET(url)

    # Check if the call was successful
    if (status_code(response) == 200) {
        # Extract and parse the JSON data
        json_data <- content(response, type = "text", encoding = "UTF-8")
        parsed_data <- fromJSON(json_data, flatten = TRUE)

        # Extract the 'award' data and add to the all_awards data frame
        awards_data <- parsed_data$response$award
        NSFtoUI <- bind_rows(NSFtoUI, as_tibble(awards_data))

        # Debug: Print the current page number and number of awards fetched
        print(paste("Page:", current_page, "- Awards fetched:", length(awards_data$id)))

        # Check if the current page has less than results_per_page awards, then it's the last page
        if (length(awards_data$id) < results_per_page) {
            keep_going <- FALSE
        } else {
            current_page <- current_page + 1
        }
    } else {
        print(paste("Failed to fetch data: Status code", status_code(response)))
        keep_going <- FALSE
    }
}
[1] "Page: 1 - Awards fetched: 25"
[1] "Page: 2 - Awards fetched: 25"
[1] "Page: 3 - Awards fetched: 25"
[1] "Page: 4 - Awards fetched: 25"
[1] "Page: 5 - Awards fetched: 25"
[1] "Page: 6 - Awards fetched: 25"
[1] "Page: 7 - Awards fetched: 25"
[1] "Page: 8 - Awards fetched: 25"
[1] "Page: 9 - Awards fetched: 25"
[1] "Page: 10 - Awards fetched: 25"
[1] "Page: 11 - Awards fetched: 25"
[1] "Page: 12 - Awards fetched: 25"
[1] "Page: 13 - Awards fetched: 25"
[1] "Page: 14 - Awards fetched: 25"
[1] "Page: 15 - Awards fetched: 25"
[1] "Page: 16 - Awards fetched: 25"
[1] "Page: 17 - Awards fetched: 25"
[1] "Page: 18 - Awards fetched: 25"
[1] "Page: 19 - Awards fetched: 25"
[1] "Page: 20 - Awards fetched: 25"
[1] "Page: 21 - Awards fetched: 25"
[1] "Page: 22 - Awards fetched: 25"
[1] "Page: 23 - Awards fetched: 25"
[1] "Page: 24 - Awards fetched: 8"

QUESTION 1:

Provide a visualization that shows our active awards from each sponsor. I need to see their start date and end date, the amount of the award, and the name of the Principal Investigator. I’m really interested in seeing how far into the future our current portfolio will exist. Are there a bunch of awards about to expire? Are there a bunch that just got funded and will be active for a while? Does this vary across sponsors?

QUESTION 2:

What is the proportional representation of new awards to the UI from these various sources over the past 5 to 10 years? Are there any trends that are encouraging or discouraging?

QUESTION 3:

How is UI performing with these sponsors when compared to the following peer institutions?

  1. Boise State University
  2. Idaho State University
  3. Montana State University
  4. University of Montana
  5. Washington State University

Note that “performing” can mean a variety of different things. You must choose your metrics of performance and justify them.