Our illustrated penguins have reached the forcats package! The photo backdrop is a snowy Antarctic wonderland featuring a Gentoo penguin with outstretched flippers

forcats: info



forcats helps us work with categorical variables or factors

These are variables that have a fixed and known set of possible values, like species, island, and sex in our penguins dataset

R4DS book cover


R for Data Science: Ch 15 Factors

Package documentation: https://forcats.tidyverse.org

forcats: exercise

Currently the year variable in penguins is continuous from 2007 to 2009

Usually this isn’t what we want and we might want to turn it into a categorical variable instead

The factor() function is perfect for this

penguins |> mutate(year_factor = factor(year, levels = unique(year)))
# A tibble: 344 × 9
   species island    bill_length_mm bill_d…¹ flipp…² body_…³ sex    year year_…⁴
   <fct>   <fct>              <dbl>    <dbl>   <int>   <int> <fct> <int> <fct>  
 1 Adelie  Torgersen           39.1     18.7     181    3750 male   2007 2007   
 2 Adelie  Torgersen           39.5     17.4     186    3800 fema…  2007 2007   
 3 Adelie  Torgersen           40.3     18       195    3250 fema…  2007 2007   
 4 Adelie  Torgersen           NA       NA        NA      NA <NA>   2007 2007   
 5 Adelie  Torgersen           36.7     19.3     193    3450 fema…  2007 2007   
 6 Adelie  Torgersen           39.3     20.6     190    3650 male   2007 2007   
 7 Adelie  Torgersen           38.9     17.8     181    3625 fema…  2007 2007   
 8 Adelie  Torgersen           39.2     19.6     195    4675 male   2007 2007   
 9 Adelie  Torgersen           34.1     18.1     193    3475 <NA>   2007 2007   
10 Adelie  Torgersen           42       20.2     190    4250 <NA>   2007 2007   
# … with 334 more rows, and abbreviated variable names ¹​bill_depth_mm,
#   ²​flipper_length_mm, ³​body_mass_g, ⁴​year_factor

The result is a new variable year_factor with factor levels 2007, 2008, and 2009

penguins_new <- penguins |> mutate(year_factor = factor(year, levels = unique(year)))
penguins_new |> head()
# A tibble: 6 × 9
  species island    bill_length_mm bill_de…¹ flipp…² body_…³ sex    year year_…⁴
  <fct>   <fct>              <dbl>     <dbl>   <int>   <int> <fct> <int> <fct>  
1 Adelie  Torgersen           39.1      18.7     181    3750 male   2007 2007   
2 Adelie  Torgersen           39.5      17.4     186    3800 fema…  2007 2007   
3 Adelie  Torgersen           40.3      18       195    3250 fema…  2007 2007   
4 Adelie  Torgersen           NA        NA        NA      NA <NA>   2007 2007   
5 Adelie  Torgersen           36.7      19.3     193    3450 fema…  2007 2007   
6 Adelie  Torgersen           39.3      20.6     190    3650 male   2007 2007   
# … with abbreviated variable names ¹​bill_depth_mm, ²​flipper_length_mm,
#   ³​body_mass_g, ⁴​year_factor
class(penguins_new$year_factor)
[1] "factor"
levels(penguins_new$year_factor)
[1] "2007" "2008" "2009"