The ChickWeight data frame is a simple longitudinal study of several diets and their impact on chick body weights. This dataset is part of RStudio. The ChickWeight data frame has 578 rows and 4 columns from an experiment on the effect of diet on the early growth of chicks. The dataset has four columns: weight, Time, Chick, and Diet. And yes, weight is not capitalized and the other three are. The chick is a unique identifier for the chick. The Diet is a factor with levels 1 to 4 indicating which experimental diet the chick received. To get some help with this dataset, type ?ChickWeight.
Weight is a numeric vector giving the chick’s body weight in grams. Time is a numeric vector giving the number of days since birth when the measurement was made. Chick is a unique identifier of the chick. Diet is a number from 1 to 4 that identifies the diet that the chick was on. The ChickWeight data frame has 578 rows and 4 columns from an experiment on the effect of diet on early growth of chicks.
The body weights of the chicks were measured at birth and every second day thereafter until day 20. They were also measured on day 21. In a longitudinal study, researchers conduct several observations of the same subjects over a period of time. A cross-sectional study studies observations at a point in time. There were four groups of chicks on different protein diets.
cw <- ChickWeight # save the data frame as a two-letter DataFrame to save on some time and typing.
If we type head(cw) we’ll get the following.
> head(cw) weight Time Chick Diet 1 42 0 1 1 2 51 2 1 1 3 59 4 1 1 4 64 6 1 1 5 76 8 1 1 6 93 10 1 1
Statistics
Suppose we want some statistics. Type summary(cw).
> summary(cw) weight Time Chick Diet Min. : 35.0 Min. : 0.00 13 : 12 1:220 1st Qu.: 63.0 1st Qu.: 4.00 9 : 12 2:120 Median :103.0 Median :10.00 20 : 12 3:120 Mean :121.8 Mean :10.72 10 : 12 4:118 3rd Qu.:163.8 3rd Qu.:16.00 17 : 12 Max. :373.0 Max. :21.00 19 : 12 (Other):506
The average weight of all the chicks is shown as 121.8. You can focus on the mean itself with the mean() function. Type mean(cw$weight). You get 121.8183. A more accurate number.
Aggregate
In R, you can use the aggregate function to compute summary statistics for subsets of the data. We want to focus on weight for each of the four subsets of data. We have four different diets. Below is the code to get the mean chick weight of each of the four types of diets.
> aggregate(weight~Diet,cw,mean) Diet weight 1 1 102.6455 2 2 122.6167 3 3 142.9500 4 4 135.2627
Diet 3 produces the heaviest chicks, and diet 1 produces the lightest chicks.
In SQL, that would be SELECT Diet, MEAN(weight) AS weight FROM cw GROUP BY Diet.
If we wanted to group by both diet and time, we would just add Time to our function. Below is just part of the results.
> aggregate(weight~Diet+Time,cw,mean) Diet Time weight 1 1 0 41.40000 2 2 0 40.70000 3 3 0 40.80000 4 4 0 41.00000 5 1 2 47.25000 6 2 2 49.40000 7 3 2 50.40000 8 4 2 51.80000 9 1 4 56.47368
What if we wanted to plot this with time on the x-axis, and weight on the y-axis, and use a line graph to plot each of the four diets and give each diet a different color? We could use ggplot2().
Learn with YouTube
Here is a video called How to Use the Aggregate Function in R.