Our illustrated penguins have reached the tidyr package! The photo backdrop is a snowy Antarctic wonderland featuring a Gentoo penguin with outstretched flippers

tidyr: info

tidyr helps us transform our dataset into a tidy format

There are three interrelated rules which make a dataset tidy:

  • Each variable must have its own column.
  • Each observation must have its own row.
  • Each value must have its own cell. schematic representing the 3 earlier points

R4DS book cover



R for Data Science: Ch 12 Tidy data

Package documentation: https://tidyr.tidyverse.org

tidyr: exercise

Both penguin datasets are already tidy!

We can pretend that penguins wasn’t tidy and that it looked instead like untidy_penguins below, where body_mass_g was recorded separately for male, female, and NA sex penguins.

untidy_penguins <- penguins |> pivot_wider(names_from = sex, values_from = body_mass_g)
untidy_penguins
# A tibble: 344 × 9
   species island    bill_length_mm bill_dept…¹ flipp…²  year  male female  `NA`
   <fct>   <fct>              <dbl>       <dbl>   <int> <int> <int>  <int> <int>
 1 Adelie  Torgersen           39.1        18.7     181  2007  3750     NA    NA
 2 Adelie  Torgersen           39.5        17.4     186  2007    NA   3800    NA
 3 Adelie  Torgersen           40.3        18       195  2007    NA   3250    NA
 4 Adelie  Torgersen           NA          NA        NA  2007    NA     NA    NA
 5 Adelie  Torgersen           36.7        19.3     193  2007    NA   3450    NA
 6 Adelie  Torgersen           39.3        20.6     190  2007  3650     NA    NA
 7 Adelie  Torgersen           38.9        17.8     181  2007    NA   3625    NA
 8 Adelie  Torgersen           39.2        19.6     195  2007  4675     NA    NA
 9 Adelie  Torgersen           34.1        18.1     193  2007    NA     NA  3475
10 Adelie  Torgersen           42          20.2     190  2007    NA     NA  4250
# … with 334 more rows, and abbreviated variable names ¹​bill_depth_mm,
#   ²​flipper_length_mm

Now let’s make it tidy again!

We’ll use the help of pivot_longer()

untidy_penguins |>
  pivot_longer(cols = male:`NA`,           
               names_to = "sex",           
               values_to = "body_mass_g")
# A tibble: 1,032 × 8
   species island    bill_length_mm bill_depth_mm flipper_…¹  year sex   body_…²
   <fct>   <fct>              <dbl>         <dbl>      <int> <int> <chr>   <int>
 1 Adelie  Torgersen           39.1          18.7        181  2007 male     3750
 2 Adelie  Torgersen           39.1          18.7        181  2007 fema…      NA
 3 Adelie  Torgersen           39.1          18.7        181  2007 NA         NA
 4 Adelie  Torgersen           39.5          17.4        186  2007 male       NA
 5 Adelie  Torgersen           39.5          17.4        186  2007 fema…    3800
 6 Adelie  Torgersen           39.5          17.4        186  2007 NA         NA
 7 Adelie  Torgersen           40.3          18          195  2007 male       NA
 8 Adelie  Torgersen           40.3          18          195  2007 fema…    3250
 9 Adelie  Torgersen           40.3          18          195  2007 NA         NA
10 Adelie  Torgersen           NA            NA           NA  2007 male       NA
# … with 1,022 more rows, and abbreviated variable names ¹​flipper_length_mm,
#   ²​body_mass_g