Sum of Squared Errors
- Sum of Squared Errors (SSE) Calculation in Linear Regression Analysis
- Introduction
- Calculation Steps
- Step 1: Calculate (Sum of Squares for X)
- Step 2: Calculate (Sum of Squares for Y)
- Step 3: Calculate (Sum of Squares for X and Y)
- Step 4: Calculate Regression Coefficients
- Step 5: Calculate SS_Regression (Sum of Squares for Regression)
- Step 6: Calculate SSE (Sum of Squared Errors)
- Example Calculation
- Conclusion
- Alternatives to SSE in statistics
Sum of Squared Errors (SSE) Calculation in Linear Regression Analysis
Introduction
In linear regression analysis, the Sum of Squared Errors (SSE) measures the total deviation in the model's predictions, that is the difference between the observed and the predicted values. It is calculated as the difference between the total variability in the observed data and the variability explained by the regression model. When SSE is zero, it indicates that the model perfectly explains all the variability in the data.
Calculation Steps
To compute SSE, follow these steps:
Step 1: Calculate (Sum of Squares for X)
represents the total variability in the predictor variable . It is calculated using the formula:
where are the values of the predictor variable, and is the number of observations.
Step 2: Calculate (Sum of Squares for Y)
represents the total variability in the response variable . It is calculated using:
where are the values of the response variable, and is the number of observations.
Step 3: Calculate (Sum of Squares for X and Y)
measures the covariance between and . It is calculated using:
Step 4: Calculate Regression Coefficients
- Slope ():
- Intercept ():
Step 5: Calculate SS_Regression (Sum of Squares for Regression)
SS_Regression represents the portion of the total variability in that is explained by the regression model. It is given by:
Step 6: Calculate SSE (Sum of Squared Errors)
SSE measures the portion of the total variability in that is not explained by the regression model. It is given by:
Example Calculation
Given the following data:
- X values: [1, 2, 3, 4, 5]
- Y values: [2, 4, 6, 8, 10]
We calculate:
:
:
:
Slope ():
Intercept ():
SS_Regression:
SSE:
Conclusion
SSE quantifies the discrepancy between the observed data and the predictions made by the regression model. A SSE of 0 indicates that the model perfectly explains the variability in the response variable.
Alternatives to SSE in statistics
Mean Absolute Error (MAE)
Definition: MAE measures the average magnitude of errors in a set of predictions, without considering their direction. It is the mean of the absolute differences between predicted values and actual values.
Formula:
Advantages:
- MAE is less sensitive to outliers compared to MSE because it does not square the errors.
- Provides a more intuitive measure of average error magnitude.
Disadvantages:
- Does not differentiate between larger and smaller errors, as all errors are treated equally.
Root Mean Squared Error (RMSE)
Definition: RMSE is the square root of the average of the squared differences between predicted and actual values. It penalizes large errors more severely than MAE due to the squaring of errors.
Formula:
Advantages:
- RMSE provides a measure of error in the same units as the dependent variable, making it easier to interpret.
- Emphasizes larger errors more than MAE, which can be useful if large errors are particularly undesirable.
Disadvantages:
- Sensitive to outliers due to the squaring of errors.
Mean Absolute Percentage Error (MAPE)
Definition: MAPE measures the size of the error in percentage terms. It is the mean of the absolute percentage errors between predicted and actual values.
Formula:
Advantages:
- Provides error metrics in percentage terms, which can be easier to interpret in some contexts.
- Useful for comparing forecast performance across different scales.
Disadvantages:
- Can be problematic if actual values are close to zero, leading to very high percentage errors.