Overview

This document provides an overview of creating lineups in R with ggplot2 and nullabor. Hopefully, you can find a recipe that fits your class needs. I don’t have my intro students create their own lineups, rather I include lineups on slides and activity prompts. That said, I am also creating shiny apps so that students can make their own lineups without the need to teach all of the commands or load specialty packages.

library(nullabor) # for lineup and null plot functions
library(ggplot2)  # for visualization 
library(dplyr)    # for data manipulation
library(ggthemes) # for colorblind-safe palette

Comparing groups

Categorical response

To illustrate how to create lineups to compare a proportions/distributions of a categorical response across groups, consider the fly data set found in the ggmosaic R package. In this example, let’s consider how to create a lineup to investigate whether responses to the question “In general, is it rude to bring a baby on a plane?” (RudeToBringBaby) varies across gender (Gender).

To begin, let’s select only the variables of interest and omit the missing values (NAs) to avoid transparent segments representing the missing values. (An alternative strategy is to use forcats::fct_explicit_na to make the missing values explicit levels of the variable.)

data("fly", package = "ggmosaic")

fly_data <- fly %>%
  select(Gender, RudeToBringBaby) %>%
  na.omit()

glimpse(fly_data)
## Observations: 843
## Variables: 2
## $ Gender          <fct> Male, Male, Male, Male, Male, Male, Male, Male, …
## $ RudeToBringBaby <fct> "No, not at all rude", "Yes, somewhat rude", "Ye…

Notice that fly_data consists of two columns and 843 rows.

Next, we need to create a data set with one copy of the original data set and 19 null data sets generated under the null hypothesis of independence. To do this, we can use the nullabor::lineup() and nullabor::null_permute() functions.

lineup_data <- lineup(method = null_permute("RudeToBringBaby"), true = fly_data)
## decrypt("fEo5 4696 Yx UnLY9Ynx zZ")
glimpse(lineup_data)
## Observations: 16,860
## Variables: 3
## $ Gender          <fct> Male, Male, Male, Male, Male, Male, Male, Male, …
## $ RudeToBringBaby <fct> "No, not at all rude", "No, not at all rude", "Y…
## $ .sample         <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …

The resulting lineup_data now has 16860 (\(20 \times 843\)) rows and an additional column, .sample, indicating data set membership. The observed data have been assigned a number uniformly at random, and is printed as an encrypted messaged that you can decrpyt by running the decrypt(...) message in the console.

Once the data have been generated, the lineup is constructed via faceting:

lineup_data %>%
  ggplot() +
  geom_bar(mapping = aes(x = Gender, fill = RudeToBringBaby), position = "fill") +
  facet_wrap(~ .sample, ncol = 5) +
  scale_fill_colorblind()