The group_by function is often combined with other functions.
library(tidyverse) data <- mtcars data %>% group_by(cyl) %>% summarize(avg_mpg = mean(mpg))
In the console.
> data %>% group_by(cyl) %>% summarize(avg_mpg = mean(mpg)) # A tibble: 3 × 2 cyl avg_mpg <dbl> <dbl> 1 4 26.7 2 6 19.7 3 8 15.1
Palmer Penguins
The penguin data has three different islands. Suppose we want the mean (average) bill length of each of the three islands. Also suppose that some entries have NA.
> penguins %>% group_by(island) %>% drop_na() %>% summarize(mean_bill_length_mm = mean(bill_length_mm)) # A tibble: 3 × 2 island mean_bill_length_mm <fct> <dbl> 1 Biscoe 45.2 2 Dream 44.2 3 Torgersen 39.0
Here’s another example.
> penguins %>% group_by(species, island) %>% drop_na() %>% summarize(mean_bl = mean(bill_length_mm), max_bl = max (bill_length_mm)) `summarise()` has grouped output by 'species'. You can override using the `.groups` argument. # A tibble: 5 × 4 # Groups: species [3] species island mean_bl max_bl <fct> <fct> <dbl> <dbl> 1 Adelie Biscoe 39.0 45.6 2 Adelie Dream 38.5 44.1 3 Adelie Torgersen 39.0 46 4 Chinstrap Dream 48.8 58 5 Gentoo Biscoe 47.6 59.6
Let’s recall how to use a filter.
penguins %>% filter(species == "Adelie")