The aggregate Function in R


The aggregate function splits the data into subsets, computes summary statistics for each, and returns the result in a convenient form. The aggregate function lets you compute summary statistics of data subsets. The statistics include mean, min, sum. max and so on.

The syntax is as follows.

aggregate(DataFrame$aggregate_column, list(DataFrame$group_column), FUN) 
  • DataFrame is the input DataFrame.
  • aggregate_column is the column to be aggregated in the DataFrame.
  • group_column is the column to be grouped with FUN.
  • FUN represents sum/mean/min/ max.

A Very Simple Example

In order to use the aggregate function for mean in R, you will need to specify the numerical variable on the first argument, the categorical (as a list) on the second and the function to be applied (in this case mean) on the third. An alternative is to specify a formula of the form: numerical ~ categorical. Let’s first create a simple data frame.

library(skimr)
name <- c("Bob", "Sally", "Pierre", "Pat")
age <- c(40, 41, 42, 40)
gender <- c("M", "F", "M", "F")
friends <- data.frame(name, age, gender)

Let’s use aggregate.

gender_mean <- aggregate(age ~ gender, data = friends, mean) 
gender_mean

Output:

  gender  age
1      F 40.5
2      M 41.0

Leave a Reply