Currently the year variable in penguins
is continuous from 2007 to 2009
Usually this isn’t what we want and we might want to turn it into a categorical variable instead
The factor()
function is perfect for this
penguins |> mutate(year_factor = factor(year, levels = unique(year)))
# A tibble: 344 × 9
species island bill_length_mm bill_d…¹ flipp…² body_…³ sex year year_…⁴
<fct> <fct> <dbl> <dbl> <int> <int> <fct> <int> <fct>
1 Adelie Torgersen 39.1 18.7 181 3750 male 2007 2007
2 Adelie Torgersen 39.5 17.4 186 3800 fema… 2007 2007
3 Adelie Torgersen 40.3 18 195 3250 fema… 2007 2007
4 Adelie Torgersen NA NA NA NA <NA> 2007 2007
5 Adelie Torgersen 36.7 19.3 193 3450 fema… 2007 2007
6 Adelie Torgersen 39.3 20.6 190 3650 male 2007 2007
7 Adelie Torgersen 38.9 17.8 181 3625 fema… 2007 2007
8 Adelie Torgersen 39.2 19.6 195 4675 male 2007 2007
9 Adelie Torgersen 34.1 18.1 193 3475 <NA> 2007 2007
10 Adelie Torgersen 42 20.2 190 4250 <NA> 2007 2007
# … with 334 more rows, and abbreviated variable names ¹bill_depth_mm,
# ²flipper_length_mm, ³body_mass_g, ⁴year_factor
The result is a new variable year_factor with factor levels 2007, 2008, and 2009
penguins_new <- penguins |> mutate(year_factor = factor(year, levels = unique(year)))
penguins_new |> head()
# A tibble: 6 × 9
species island bill_length_mm bill_de…¹ flipp…² body_…³ sex year year_…⁴
<fct> <fct> <dbl> <dbl> <int> <int> <fct> <int> <fct>
1 Adelie Torgersen 39.1 18.7 181 3750 male 2007 2007
2 Adelie Torgersen 39.5 17.4 186 3800 fema… 2007 2007
3 Adelie Torgersen 40.3 18 195 3250 fema… 2007 2007
4 Adelie Torgersen NA NA NA NA <NA> 2007 2007
5 Adelie Torgersen 36.7 19.3 193 3450 fema… 2007 2007
6 Adelie Torgersen 39.3 20.6 190 3650 male 2007 2007
# … with abbreviated variable names ¹bill_depth_mm, ²flipper_length_mm,
# ³body_mass_g, ⁴year_factor
class(penguins_new$year_factor)
levels(penguins_new$year_factor)