Our illustrated penguins have reached the forcats package! The photo backdrop is a snowy Antarctic wonderland featuring a Gentoo penguin with outstretched flippers

forcats: info



forcats helps us work with categorical variables or factors

These are variables that have a fixed and known set of possible values, like species, island, and sex in our penguins dataset

R4DS book cover


R for Data Science: Ch 15 Factors

Package documentation: https://forcats.tidyverse.org

forcats: exercise

Currently the year variable in penguins is continuous from 2007 to 2009

Usually this isn’t what we want and we might want to turn it into a categorical variable instead

The factor() function is perfect for this

penguins |> mutate(year_factor = factor(year, levels = unique(year)))
# A tibble: 344 × 9
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
 1 Adelie  Torgersen           39.1          18.7               181        3750
 2 Adelie  Torgersen           39.5          17.4               186        3800
 3 Adelie  Torgersen           40.3          18                 195        3250
 4 Adelie  Torgersen           NA            NA                  NA          NA
 5 Adelie  Torgersen           36.7          19.3               193        3450
 6 Adelie  Torgersen           39.3          20.6               190        3650
 7 Adelie  Torgersen           38.9          17.8               181        3625
 8 Adelie  Torgersen           39.2          19.6               195        4675
 9 Adelie  Torgersen           34.1          18.1               193        3475
10 Adelie  Torgersen           42            20.2               190        4250
# ℹ 334 more rows
# ℹ 3 more variables: sex <fct>, year <int>, year_factor <fct>

The result is a new variable year_factor with factor levels 2007, 2008, and 2009

penguins_new <- penguins |> mutate(year_factor = factor(year, levels = unique(year)))
penguins_new |> head()
# A tibble: 6 × 9
  species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
  <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
1 Adelie  Torgersen           39.1          18.7               181        3750
2 Adelie  Torgersen           39.5          17.4               186        3800
3 Adelie  Torgersen           40.3          18                 195        3250
4 Adelie  Torgersen           NA            NA                  NA          NA
5 Adelie  Torgersen           36.7          19.3               193        3450
6 Adelie  Torgersen           39.3          20.6               190        3650
# ℹ 3 more variables: sex <fct>, year <int>, year_factor <fct>
class(penguins_new$year_factor)
[1] "factor"
levels(penguins_new$year_factor)
[1] "2007" "2008" "2009"