Simple Linear Regression Assumptions


This entry is part 2 of 3 in the series Linear Regression

Model assumptions as statements about the data that must be true in order to justify the use of a particular modeling technique. In order to be confident in our results, we need to ensure that we’re using the right model given the data. Aim to closely examine your model assumptions before actually constructing the model. Some assumptions can only be checked after the model is constructed and observed. Be sure you check those assumptions after you apply the model to confirm if the model is valid or not. Data visualizations can be used to determine if model assumptions are met.

Simple linear regression provides a model of the relationship between the magnitude of one variable and that of a second variable.

What are the four assumptions of the simple linear regression?

  1. Linearity – Each predictor variable (Xi) is linearly related to the outcome variable (Y).
  2. Normality – The errors are normally distributed.
  3. Independent observations – Each observation in the dataset is independent.
  4. Homoscedasticity – The variance of the errors is constant or similar across the model.

Residuals are the difference between the predicted and observed values. You can calculate residuals after you build a regression model by subtracting the predicted values from the observed values. Errors are the natural noise assumed to be in the model. Residuals are used to estimate errors when checking the normality and homoscedasticity assumptions of linear regression.

Linearity

In order to assess whether or not there is a linear relationship between the independent and dependent variables, it
is easiest to create a scatterplot of the dataset. The independent variable would be on the x-axis, and the dependent
variable would be on the y-axis.

Normality

The normality assumption focuses on the errors, which can be estimated by the residuals, or the difference between
the observed values in the data and the values predicted by the regression model. For that reason, the normality
assumption can only be confirmed after a model is built, and predicted values are calculated. Once the model has
been built, you can either create a QQ-plot to check that the residuals are normally distributed, or create a histogram
of the residuals.

Independent Observations

Whether or not observations are independent is dependent on understanding your data. Asking questions like: How was the data collected?
What does each data point represent? Based on the data collection process, is it likely that the value of one data point impacts the value of another data point?

Homoscedasticity

Like the normality assumption, the homoscedasticity assumption concerns the residuals of a model, so it can only be evaluated after a regression model has already been constructed. A scatterplot of the fitted values versus the residuals can help determine whether the homoscedasticity assumption is violated.

Series Navigation<< Simple Linear RegressionSimple Linear Regression in Excel >>

Leave a Reply