Sum of Squared Errors

Sum of Squared Errors (SSE) Calculation in Linear Regression Analysis
- Introduction
- Calculation Steps
Conclusion
Alternatives to SSE in statistics

Sum of Squared Errors (SSE) Calculation in Linear Regression Analysis

Introduction

In linear regression analysis, the Sum of Squared Errors (SSE) measures the total deviation in the model's predictions, that is the difference between the observed and the predicted values. It is calculated as the difference between the total variability in the observed data and the variability explained by the regression model. When SSE is zero, it indicates that the model perfectly explains all the variability in the data.

Calculation Steps

To compute SSE, follow these steps:

Step 1: Calculate $SS_{XX}$ (Sum of Squares for X)

$SS_{XX}$ represents the total variability in the predictor variable $X$ . It is calculated using the formula:

SS_{XX} = \sum_{i=1}^{n} X_i^2 - \frac{1}{n} \left(\sum_{i=1}^{n} X_i\right)^2

where $X_i$ are the values of the predictor variable, and $n$ is the number of observations.

Step 2: Calculate $SS_{YY}$ (Sum of Squares for Y)

$SS_YY$ represents the total variability in the response variable $Y$ . It is calculated using:

SS_{YY} = \sum_{i=1}^{n} Y_i^2 - \frac{1}{n} \left(\sum_{i=1}^{n} Y_i\right)^2

where $Y_i$ are the values of the response variable, and $n$ is the number of observations.

Step 3: Calculate $SS_{XY}$ (Sum of Squares for X and Y)

$SS_{XY}$ measures the covariance between $X$ and $Y$ . It is calculated using:

SS_{XY} = \sum_{i=1}^{n} X_i Y_i - \frac{1}{n} \left(\sum_{i=1}^{n} X_i\right) \left(\sum_{i=1}^{n} Y_i\right)

Step 4: Calculate Regression Coefficients

Slope ( $\hat{\beta_1}$ ):

\hat{\beta_1} = \frac{SS_{XY}}{SS_{XX}}

Intercept ( $\hat{\beta_0}$ ):

\hat{\beta_0} = \frac{\sum_{i=1}^{n} Y_i}{n} - \hat{\beta_1} \times \frac{\sum_{i=1}^{n} X_i}{n}

Step 5: Calculate SS_Regression (Sum of Squares for Regression)

SS_Regression represents the portion of the total variability in $Y$ that is explained by the regression model. It is given by:

SS_{R} = \hat{\beta_1} \times SS_{XY}

Step 6: Calculate SSE (Sum of Squared Errors)

SSE measures the portion of the total variability in $Y$ that is not explained by the regression model. It is given by:

SSE = SS_{YY} - SS_{R}

Example Calculation

Given the following data:

X values: [1, 2, 3, 4, 5]
Y values: [2, 4, 6, 8, 10]

We calculate:

$SS_{XX}$ :

SS_{XX} = \sum X_i^2 - \frac{1}{n} \left(\sum X_i\right)^2 = 55 - \frac{1}{5}(15)^2 = 10

$SS_{YY}$ :

SS_{YY} = \sum Y_i^2 - \frac{1}{n} \left(\sum Y_i\right)^2 = 220 - \frac{1}{5}(30)^2 = 40

$SS_{XY}$ :

SS_{XY} = \sum X_i Y_i - \frac{1}{n} \left(\sum X_i\right) \left(\sum Y_i\right) = 110 - \frac{1}{5}(15 \times 30) = 20

Slope ( $\hat{\beta_1}$ ):

\hat{\beta_1} = \frac{SS_{XY}}{SS_{XX}} = \frac{20}{10} = 2

Intercept ( $\hat{\beta_0}$ ):

\hat{\beta_0} = \frac{\sum Y_i}{n} - \hat{\beta_1} \times \frac{\sum X_i}{n} = 6 - 2 \times 3 = 0

SS_Regression:

SS_{R} = \hat{\beta_1} \times SS_{XY} = 2 \times 20 = 40

SSE:

SSE = SS_{YY} - SS_{R} = 40 - 40 = 0

Conclusion

SSE quantifies the discrepancy between the observed data and the predictions made by the regression model. A SSE of 0 indicates that the model perfectly explains the variability in the response variable.

Alternatives to SSE in statistics

Mean Absolute Error (MAE)

Definition: MAE measures the average magnitude of errors in a set of predictions, without considering their direction. It is the mean of the absolute differences between predicted values and actual values.

Formula:

\text{MAE} = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|

Advantages:

MAE is less sensitive to outliers compared to MSE because it does not square the errors.
Provides a more intuitive measure of average error magnitude.

Disadvantages:

Does not differentiate between larger and smaller errors, as all errors are treated equally.

Root Mean Squared Error (RMSE)

Definition: RMSE is the square root of the average of the squared differences between predicted and actual values. It penalizes large errors more severely than MAE due to the squaring of errors.

Formula:

\text{RMSE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2}

Advantages:

RMSE provides a measure of error in the same units as the dependent variable, making it easier to interpret.
Emphasizes larger errors more than MAE, which can be useful if large errors are particularly undesirable.

Disadvantages:

Sensitive to outliers due to the squaring of errors.

Mean Absolute Percentage Error (MAPE)

Definition: MAPE measures the size of the error in percentage terms. It is the mean of the absolute percentage errors between predicted and actual values.

Formula:

\text{MAPE} = \frac{1}{n} \sum_{i=1}^{n} \left| \frac{y_i - \hat{y}_i}{y_i} \right| \times 100\%

Advantages:

Provides error metrics in percentage terms, which can be easier to interpret in some contexts.
Useful for comparing forecast performance across different scales.

Disadvantages:

Can be problematic if actual values are close to zero, leading to very high percentage errors.

Sum of Squared Errors