Inferential statistics uses sample data to draw conclusions or make predictions about a larger population.
Sampling is the process of drawing a subset of data from a population.
Use Cases
Here are some use cases for sampling. Suppose we want to know how many products in off the assembly line do we need to test to feel confident that all the products, or at least a certain percentage of the products are free from defects? Do we need to test them all? What if we wanted to know the average height of males over 18 years of age in the United States. Testing them all is very expensive and time consuming, if not impossible. We could however take a sample and test the sample. Using a sample saves money and resources and analyzing a sample is more practical than analyzing an entire population.
Suppose we want to test a possible change to a web page on our website. We could do an A/B test and compare the old page with the proposed new page. How large should our sample be?
Representative
Your sample should be representative of your population. Recall that a representative sample accurately reflects the characteristics of a population. The quality of your sample helps determine the quality of the insights you share with stakeholders. A good model can’t overcome a bad sample.