ASSIGNMENT 1 - Fun with literate programming.

Source code hybridized with documentation adds to your data science spellbook.
Assignment
DataViz
Quarto
Author

Barrie Robison

Published

January 10, 2024

SUMMARY

The idea of Literate Programming is that source code that is executed as part of the program’s purpose is interspersed with documentation that describes the program’s logic. The concept of literate programming was first articulated by David Knuth in 1984. You know… back when music was good? Modern Data Science leans pretty heavily on literate programming, and to be honest, there aren’t very many good arguments as to why you WOULDN’T want to implement this approach in your own work. Bearing this in mind, we will adopt this framework for most of the activities, exercises, and assignments in this course. All of us will benefit by practicing these skills.

LITERATE PROGRAMMING PUBLISHING SYSTEMS

I’m trying to keep this course as technology agnostic as I can. The idea is that you should be practicing and building competencies in the languages and algorithms that are most useful to you. Who am I to tell you to use R instead of Python? If you have skills in a particular language I encourage you to keep using that during this course. That being said, I am going to work the examples using R and R Studio, and I will (mostly) use Quarto as the literate programming framework.

If all of this is new to you, no problem. Just follow along in R and Quarto and start your skill building journey with those languages.

If you are a Python person, great! Quarto can accommodate that language as well. If you have another preference for literate programming, such as sticking with R Markdown until the Quarto bugs are fixed, that is great. Find the framework and tools that work for you, and practice, practice, practice!

Quarto

An open source publishing system that allows you to create websites, documents, blogs, books, publications, presentations, and more while using R, Python, Julia, or Observable. Quarto is intended to be the more functional successor of R Markdown. I intend to use Quarto for most of my work in this course.

R Markdown

Another publishing system for creating all the things … websites, slides, manuscripts, dashboards, etc. While most people (including me!) instinctively think of R and Python within R Markdown, the list of supported language engines is pretty extensive.

names(knitr::knit_engines$get())
 [1] "awk"       "bash"      "coffee"    "gawk"      "groovy"    "haskell"  
 [7] "lein"      "mysql"     "node"      "octave"    "perl"      "php"      
[13] "psql"      "Rscript"   "ruby"      "sas"       "scala"     "sed"      
[19] "sh"        "stata"     "zsh"       "asis"      "asy"       "block"    
[25] "block2"    "bslib"     "c"         "cat"       "cc"        "comment"  
[31] "css"       "ditaa"     "dot"       "embed"     "eviews"    "exec"     
[37] "fortran"   "fortran95" "go"        "highlight" "js"        "julia"    
[43] "python"    "R"         "Rcpp"      "sass"      "scss"      "sql"      
[49] "stan"      "targets"   "tikz"      "verbatim"  "ojs"       "mermaid"  

LANGUAGES AND TOOLSETS

There are quite a few, but the five that seemed to keep coming up as I prepped this course are:

R

A very powerful open source framework for statistical computing and graphics. R has a lot of base functionality, and its capabilities are increased by 100 fold with packages created by R users. Packages are the core units of R code. I’m going to use R for the vast majority of demonstrations in this course.

Python

Python is an open source general purpose programming language. It wasn’t developed just for statistical computing or data science, and people use this language for tons of different applications. There is no denying it has become a very powerful language for data science and data visualization.

Tableau

Tableau is proprietary software that is very powerful for creating beautiful and functional data visualizations. It can integrate with all sorts of data sources and is used a lot for analytics, especially in the business world. The downsides (that occur to me at least) are that it costs money, it is not open source, and is more of a one-trick-pony than the programming languages on this list.

Javascript

Javascript has been around for about 25 years, and is (I think) the world’s most popular programming language. Along with HTML and CSS, Javascript drives pretty much the entire internet. I mention Javascript here because it has the D3 library, which can create super cool interactive data visualizaitons. In my experience, the learning curve with Javascript and D3 was pretty steep. I bought a book about it once, but just haven’t been able to allocate the amount of time necessary to really start using it. Check out the gallery of examples. Amazing!

Observable / D3

Observable is a set of extensions to Javascript that features something called reactive runtime. This means that the code blocks are executed and compiled as they are written, and changes are implemented instantaneously. Observable is pretty great for data exploration, and is well supported by Quarto. In addition, you can use the Observable JS libraries in Quarto to access D3. We’ll use some of these tools in this course, especially when we start considering interactivity.

ASSIGNMENT

After that long introduction, I suppose you are wondering what I want you to actually DO today.

Well, I want you to set up your publishing system and preferred language on your computer. Then I want you to recreate the classic figure from Anscombe’s Quartet.

Now, you might be asking…

“How am I supposed to do that? You haven’t taught me how to do anything yet!”

Here is the dirty little secret of modern education.

The Internet Exists.

While I could use up an entire 90 minute lecture telling you how to:

I’m not going to do that.

Instead, I want you to use the resources I point towards, or other resources that make more sense to you, to figure out how to do those things.

RESOURCES

Tidyverse and Anscombe’s Quartet

Handy cheat-sheets for many different R packages

Tutorial 1 - Literate Programming

Tutorial 2 - Literate Programming and Anscombe’s Quartet

Tutorial 3 - Python