group_by Function in R


The group_by function is often combined with other functions.

library(tidyverse)
data <- mtcars
data %>% group_by(cyl) %>% summarize(avg_mpg = mean(mpg))

In the console.

> data %>% group_by(cyl) %>% summarize(avg_mpg = mean(mpg))
# A tibble: 3 × 2
    cyl avg_mpg
  <dbl>   <dbl>
1     4    26.7
2     6    19.7
3     8    15.1

Palmer Penguins

The penguin data has three different islands. Suppose we want the mean (average) bill length of each of the three islands. Also suppose that some entries have NA.

> penguins %>% group_by(island) %>% drop_na() %>% summarize(mean_bill_length_mm = mean(bill_length_mm))
# A tibble: 3 × 2
  island    mean_bill_length_mm
  <fct>                   <dbl>
1 Biscoe                   45.2
2 Dream                    44.2
3 Torgersen                39.0

Here’s another example.

> penguins %>% group_by(species, island) %>% drop_na() %>% summarize(mean_bl = mean(bill_length_mm), max_bl = max (bill_length_mm))
`summarise()` has grouped output by 'species'. You can override using the `.groups` argument.
# A tibble: 5 × 4
# Groups:   species [3]
  species   island    mean_bl max_bl
  <fct>     <fct>       <dbl>  <dbl>
1 Adelie    Biscoe       39.0   45.6
2 Adelie    Dream        38.5   44.1
3 Adelie    Torgersen    39.0   46  
4 Chinstrap Dream        48.8   58  
5 Gentoo    Biscoe       47.6   59.6

Let’s recall how to use a filter.

penguins %>% filter(species == "Adelie")

Leave a comment

Your email address will not be published. Required fields are marked *