The aggregate function splits the data into subsets, computes summary statistics for each, and returns the result in a convenient form. The aggregate function lets you compute summary statistics of data subsets. The statistics include mean, min, sum. max and so on.
The syntax is as follows.
aggregate(DataFrame$aggregate_column, list(DataFrame$group_column), FUN)
- DataFrame is the input DataFrame.
- aggregate_column is the column to be aggregated in the DataFrame.
- group_column is the column to be grouped with FUN.
- FUN represents sum/mean/min/ max.
A Very Simple Example
In order to use the aggregate function for mean in R, you will need to specify the numerical variable on the first argument, the categorical (as a list) on the second and the function to be applied (in this case mean) on the third. An alternative is to specify a formula of the form: numerical ~ categorical. Let’s first create a simple data frame.
library(skimr) name <- c("Bob", "Sally", "Pierre", "Pat") age <- c(40, 41, 42, 40) gender <- c("M", "F", "M", "F") friends <- data.frame(name, age, gender)
Let’s use aggregate.
gender_mean <- aggregate(age ~ gender, data = friends, mean) gender_mean
Output:
gender age 1 F 40.5 2 M 41.0