Our illustrated penguins have reached the ggplot2 package! The photo backdrop is a snowy Antarctic wonderland featuring a Gentoo penguin with outstretched flippers

ggplot2: info



ggplot2 uses the “Grammar of Graphics” and layers graphical components together to help us create a plot

Let’s start by making a simple plot of our data!

R4DS book cover


R for Data Science: Ch 3 Data visualization

Package documentation: https://ggplot2.tidyverse.org

ggplot2: exercise

Get a full view of the dataset:

View(penguins)


Or catch a glimpse:

glimpse(penguins)
Rows: 344
Columns: 8
$ species           <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
$ island            <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
$ bill_length_mm    <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
$ bill_depth_mm     <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
$ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
$ body_mass_g       <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
$ sex               <fct> male, female, female, NA, female, male, female, male…
$ year              <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…

Let’s see if body mass varies by penguin sex using geom_point()

ggplot(data = penguins, aes(x = sex, y = body_mass_g)) + 
  geom_point()

A scatterplot with categorical penguin sex along the x axis and continuous body mass along the y axis. The three sex categories are female, male, and NA. The body mass appears to range between 2400g and 6500g. Because this is a scatterplot, there are various points scattered along the y axis in a line above each sex category, which doesn't tell us much about these data. There are other types of plots better suited for visualizing the relationship between a continuous variable and a categorical variable.

Let’s see if body mass varies by penguin sex, this time with geom_boxplot()

ggplot(data = penguins, aes(x = sex, y = body_mass_g)) +
  geom_boxplot()

A boxplot with penguin sex along the x axis and body mass along the y axis. Again, the three sex categories are female, male, and NA, and the body mass appears to range between 2400g and 6500g. Because this is a boxplot, we can visualize the minimum value, first quartile, median, third quartile, and maximum value of penguin body mass, for each penguin sex category. Female penguins have a lower median body mass than male penguins, while the NA sex category is somewhere in between the two. There are no outliers.

Let’s see if body mass varies by penguin sex, and now fill the boxplots
according to penguin species

ggplot(data = penguins, aes(x = sex, y = body_mass_g)) +
  geom_boxplot(aes(fill = species))

A boxplot with penguin sex along the x axis and body mass along the y axis. Again, the three sex categories are female, male, and NA, and the body mass appears to range between 2400g and 6500g. This time, instead of one boxplot per sex category, there is a boxplot for each species, per sex category, and these are filled with different colors. Gentoo boxplots are blue, Adélie boxplots are reddish, and Chinstrap boxplots are green. Male penguins have higher body mass across species, and Gentoo penguins stand out as having higher body mass than both Chinstrap and Adélie penguins. Low body mass outliers exist for female Chinstrap penguins and NA Gentoo penguins, and high body mass outliers exist for male Chinstrap penguins. There is no boxplot for Chinstrap penguins in the NA sex category.

The boxplot filled by species helps us see…

  • Gentoo penguins have higher body mass than Adélie and Chinstrap penguins
  • Higher body mass among male Gentoo penguins compared to female penguins
  • Pattern not as discernible when comparing Adélie and Chinstrap penguins
  • No NAs among Chinstrap penguin data points! sex was available for each observation

Same boxplot as the previous tab. Any additional data insights are listed in the text of this slide.