The ChickWeight Dataset in R


The ChickWeight data frame is a simple longitudinal study of several diets and their impact on chick body weights. This dataset is part of RStudio. The ChickWeight data frame has 578 rows and 4 columns from an experiment on the effect of diet on the early growth of chicks. The dataset has four columns: weight, Time, Chick, and Diet. And yes, weight is not capitalized and the other three are. The chick is a unique identifier for the chick. The Diet is a factor with levels 1 to 4 indicating which experimental diet the chick received. To get some help with this dataset, type ?ChickWeight.

Weight is a numeric vector giving the chick’s body weight in grams. Time is a numeric vector giving the number of days since birth when the measurement was made. Chick is a unique identifier of the chick. Diet is a number from 1 to 4 that identifies the diet that the chick was on. The ChickWeight data frame has 578 rows and 4 columns from an experiment on the effect of diet on early growth of chicks.

The body weights of the chicks were measured at birth and every second day thereafter until day 20. They were also measured on day 21. In a longitudinal study, researchers conduct several observations of the same subjects over a period of time. A cross-sectional study studies observations at a point in time. There were four groups of chicks on different protein diets.

cw <- ChickWeight   # save the data frame as a two-letter DataFrame to save on some time and typing.

If we type head(cw) we’ll get the following.

> head(cw)
  weight Time Chick Diet
1     42    0     1    1
2     51    2     1    1
3     59    4     1    1
4     64    6     1    1
5     76    8     1    1
6     93   10     1    1

Statistics

Suppose we want some statistics. Type summary(cw).

> summary(cw)
     weight           Time           Chick     Diet   
 Min.   : 35.0   Min.   : 0.00   13     : 12   1:220  
 1st Qu.: 63.0   1st Qu.: 4.00   9      : 12   2:120  
 Median :103.0   Median :10.00   20     : 12   3:120  
 Mean   :121.8   Mean   :10.72   10     : 12   4:118  
 3rd Qu.:163.8   3rd Qu.:16.00   17     : 12          
 Max.   :373.0   Max.   :21.00   19     : 12          
                                 (Other):506         

The average weight of all the chicks is shown as 121.8. You can focus on the mean itself with the mean() function. Type mean(cw$weight). You get 121.8183. A more accurate number.

Aggregate

In R, you can use the aggregate function to compute summary statistics for subsets of the data. We want to focus on weight for each of the four subsets of data. We have four different diets. Below is the code to get the mean chick weight of each of the four types of diets.

> aggregate(weight~Diet,cw,mean)
  Diet   weight
1    1 102.6455
2    2 122.6167
3    3 142.9500
4    4 135.2627

Diet 3 produces the heaviest chicks, and diet 1 produces the lightest chicks.

In SQL, that would be SELECT Diet, MEAN(weight) AS weight FROM cw GROUP BY Diet.

If we wanted to group by both diet and time, we would just add Time to our function. Below is just part of the results.

> aggregate(weight~Diet+Time,cw,mean)
   Diet Time    weight
1     1    0  41.40000
2     2    0  40.70000
3     3    0  40.80000
4     4    0  41.00000
5     1    2  47.25000
6     2    2  49.40000
7     3    2  50.40000
8     4    2  51.80000
9     1    4  56.47368

What if we wanted to plot this with time on the x-axis, and weight on the y-axis, and use a line graph to plot each of the four diets and give each diet a different color? We could use ggplot2().

Learn with YouTube

Here is a video called How to Use the Aggregate Function in R.

Leave a Reply