2024-02-20
Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively.
Semantics
Data Table
ITEM: Person
ATTRIBUTES: Name, Age, Shirt Size, Favorite Fruit
One ITEM per row
Each column is an ATTRIBUTE
A cell holds the VALUE for an item/attribute pair
A unique KEY can be used (implicitly or explicitly) to identify each item even if they share all measured attributes
Indexing based on multiple keys (eg genes, patients)
KEY: an independent attribute used as unique index to look up items.
Simple tables: 1 key
Multidimensional tables: multiple keys
Value: a dependent attribute, value of cell
We will classify visualization idioms by the number of keys used.
0, 1, 2, …
0 KEYS: Only values are available or salient to the task.
Express values (magnitudes) of quantitative attributes.
Data: 2 quantitative attributes
Mark: points
Channels: horizontal and verical position
Tasks: find trends, outliers, distribution, correlation, clusters
Scalability: hundreds of items
0 KEYS: Only values are available or salient to the task.
Express values (magnitudes) of quantitative attributes.
Data: 2 quantitative attributes
Mark: points
Channels: horizontal and verical position
Tasks: find trends, outliers, distribution, correlation, clusters
Scalability: hundreds of items
Additional channels are viable with scatterplots since we are using point marks.
Examples:
Color: But pay attention to Discriminability!
Size: But pay attention to using a quantitative attribute used to control 2D area. Directly encoding radius would mislead. Take the square root since area grows quadratically.
Shape:
Correlation
Clusters/groups, and clusters vs classes
Best case!
Limitation: hard to know rank. what’s 4th? what’s 7th?
Limitation: hard to make comparisons with size (vs aligned position)
One key, One value
Data: 1 categorical attribute, 1 quantitative attribute
Mark: Lines
Channels: Length to express quantitative value
Spatial regions: one per mark - separated horizontally, aligned vertically. Ordered by quantitative attribute: by label (alphabetical), by length attribute (data-driven)
Tasks: compare, lookup values
Scalability: dozens to hundreds of levels for key attribute (bars), hundreds for values.
Two keys, One value
Data: 2 categorical attribute, 1 quantitative attribute
Mark: vertical stack of line marks
Glyph: composite object, internal structure from multiple marks
Channels: length and color hue
Spatial regions: one per glyph
Aligned: full glyph, lowest bar component
Unaligned: other bar components
Task: part-to-whole relationship
Scalability: asymmetric for stacked key attrib, 10-12 levels segments for main key attrib, dozens to hundreds of levels bars
Generalized stacked graph emphasizing horizontal continuity vs vertical items (example)
Data: 1 categ key attrib (movies) 1 ordered key attrib (time) 1 quant value attrib (counts) derived data geometry: layers, where height encodes counts 1 quant attrib (layer ordering)
Mark: vertical stack of line marks
Channels: length and color hue
Task: part-to-whole relationship
Scalability: hundreds of time keys dozens to hundreds of movies keys more than stacked bars: most layers don’t extend across whole chart
One key, One value
Data:2 quant attribs
Mark: points AND line connection marks between them
Channels: aligned lengths to express quant value separated and ordered by key attrib into horizontal regions
Task: find trend connection marks emphasize ordering of items along key axis by explicitly showing relationship between one item and the next
Scalability: hhundreds of key levels, hundreds of value levels
Choice depends on the type of key attributes.
Bar charts if categorical, Line charts if ordered. Do not use line charts for categorical key attributes as it violates the expressiveness principle. The implication of trend is so strong that it overrides semantics. “The more male a person is, the taller he/she is”
Best practice to label axes, with few exceptions. Individual small multiple views could share axis label.
Include 0 at bottom left or the slope misleads. There are some exceptions (arbitrary 0, small change matters).
Truncating the Y-Axis: Threat or Menace? Correll, Bertini, & Franconeri, CHI 2020.
Data: 2 quant attribs 1 key + 1 value
Derived data: new quant value attrib index plot instead of original value
Task: show change over time
Principle: normalized, not absolute
Scalability: same as standard line chart
one key, two (related) values
Data: 1 categ attrib, 2 quant attribs
Mark: line length: duration
Channels: horiz position: start time (+end from duration)
Task: mphasize temporal overlaps & start/end dependencies between items
Scalability: dozens of key levels bars hundreds of value levels durations
two values
Data: 2 quant value attribs (1 derived attrib: change magnitude)
Mark: point + line line connecting mark between pts
Channels: 2 vertical pos: express attrib value (linewidth/size, color)
Task: emphasize changes in rank/value
Scalability: hundreds of value levels dozens of item
two keys, one value
Data: 2 categ attribs (gene, experimental condition) 1 quant attrib (expression levels)
Mark: point separate and align in 2D matrix indexed by 2 categorical attributes
Channels: color by quant attrib (ordered diverging colormap)
Task: find clusters, outliers
Scalability: 1M items, 100s of categ levels, ~10 quant attrib levels
Use derived data: compute 2 cluster hierarchies and represent as a dendrogram. Parent-child relationships represented in tree with connection line marks, leaves aligned so interior branch heights are easy to compare. Heatmap marks are (re-)ordered by cluster hierarchy traversal. Task: assess quality of clusters found by automatic methods
Star plot: line mark, radial axes meet at central point
Radial bar chart: line mark, radial axes meet at central ring
Channels: length, angle/orientation
Bar chart: rectilinear axes, aligned vertically
Accuracy: length not aligned with radial layouts, making them less accurately perceived than rectilinear aligned layouts.
Radial line chart with point marks, radial layout, and connecting line marks. Avoid unless data are cyclic.
1 categorical key attribute, 1 quantitative value attribute
Pie chart: interlocking area marks with angle channel causes variation in 2D area. Keys are separated & ordered radially with uniform height. accuracy: area less accurate than rectilinear aligned line length.
Task: part-to-whole judgements.
Coxcomb chart: line marks with length channel, creating variation in only 1D length. Keys are separated & ordered radially with uniform width. These charts are a direct analog to radial bar charts.
Coxcomb / Nightengale Rose/ Polar Area Chart Invented by Florence Nightingale.
Encode: 1D length
Decode/perceive: 2D area
The nonuniform relationship between line/sector width and mark length causes area variation to scale nonlinearly with line mark length!
A standard bar chart is safer: bars are uniform width, so area is linear with line mark length in both radial & rectilinear cases.
Some empirical evidence that people respond to arc length.
Decode/perceive: arc length, maybe also areas, but not angles.
Donut charts are no worse than pie charts.
Not so bad for two (or few) levels and part-to-whole tasks. Dubious for several levels if details matter. Terrible for many levels.
Task: part-to-whole judgements
Normalized stacked bar chart: stacked bar chart, normalized to full vertical height. A single stacked bar is equivalent to a full pie chart.
High information density can be achieved with narrow rectangles. Pie chart information density is much more limited and requires a large circle.
Rectilinear glyphs are good for linear vs nonlinear trends. Radial glyphs are good for cyclic patterns and evaluating periodicity.
Scatterplot matrix (SPLOM): Rectilinear axes with point mark. All possible pairs of axes are visualized.
Scalability: about a dozen attributes and dozens to hundreds of items.
Scatterplot limitation: visual representation with orthogonal axes can show only two attributes with spatial position channel.
Alternative: Line up axes in parallel to show many attributes with position. Items are encoded with a line with n
segments (n
is the number of attributes shown). Ordering is a major challenge.
Scalability: dozens of attributes and hundreds of items.
Patterns only visible between neighboring axis pairs. How to pick axis order? The usual solution is reorderable axes and interactive exploration. The downside of interaction is human-powered search.
As usual, OBSERVABLE really shines for these types of interactive plots!
Rectilinear: Scalability is limited with regard to the number of axes. (2 axes best, 3 problematic, 4+ impossible.)
Parallel: Unfamiliarity and training time.
Radial: Perceptual limits include polar coordinate asymmetry, lower precision with angle channles compated to length channels, nonuniform sector width/size depending on radial distance. While thes limits are frequently problematic, they can sometimes be deliberately exploited. (Example: for 2 attribs of very unequal importance)
Uncovering Strengths and Weaknesses of Radial Visualizations - an Empirical Approach. Diehl, Beck and Burch. IEEE TVCG (Proc. InfoVis) 16(6):935–942, 2010.
Labelled axes are critical! Avoid cropping the y-axis (include 0 at bottom left) or the slope misleads the viewer.
This approach is controversial. Dual axes are acceptable if they are commensurate, but beware, as they make it very easy to mislead!
Scatterplot with line connection marks. These are popular in journalism.
Horiz + vert axes: value attributes
Line connection marks: temporal order
Alternative to dual-axis charts (horiz: time vert: two value attributes).
empirical study suggests these are engaging, but correlations are unclear.
The inverted y axis is evocative of blood dripping down on Poe.
What did we miss?