Are you working in R language with a data frame? Are you wondering how to find missing values in a column? You have been working with the data in a numeric column called age and you wanted to take the mean() of that column but you get NA as a result. You are now wondering if there are any missing values in age that would cause this NA result. Let’s use a very small example to check this out.
library(skimr) name <- c("Bob", "Sally", "Pierre", "Pat") age <- c(40, NA, 42, 40) gender <- c("M", "F", "M", "F") friends <- data.frame(name, age, gender) mean(friends$age) # returns NA because one value is missing num_NA <- sum(is.na(friends)) print(paste("The total number of NA in friends is", num_NA))
Below is a screenshot of RStudio. Notice that Sally’s age is not known.
We can’t take the mean of age when we have an NA (or many NA) in the column.
We may decide to delete all rows that have an NA in the age column. If we do that, we should probably create a new data frame. Perhaps we can call it version two by appending v2 to the end of the data frame name.