# Multiple linear regression calculator

## How to use Multiple Regression Calculator?

Usage:

1. Type Xi and Y values.

2. Click Calculate button

3. Results are generated automatically.

Contact: [email protected]

## Multiple Linear Regression Calculator

This document provides an overview of the multiple linear regression calculation process, including the calculation of coefficients, the R-squared value, and the statistical significance of each feature.

### Theory and Equations

#### Multiple Linear Regression Model

In multiple linear regression, we aim to model the relationship between a target variable $Y$ and multiple predictor variables $X_1, X_2, \ldots, X_n$. The model can be represented as:

$Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_n X_n + \epsilon$

where:

- $\beta_0$ is the intercept.
- $\beta_i$ (for $i = 1, 2, \ldots, n$) are the coefficients of the predictor variables.
- $\epsilon$ is the error term.

The coefficients $\beta_i$ are estimated such that the difference between the observed values and the predicted values is minimized. The Ordinary Least Squares (OLS) method is typically used to find these estimates.

#### R-squared ($R^2$)

The R-squared value is a measure of how well the regression model explains the variability of the target variable. It is given by:

$R^2 = 1 - \frac{\sum (Y_i - \hat{Y}_i)^2}{\sum (Y_i - \bar{Y})^2}$

where:

- $Y_i$ are the observed values.
- $\hat{Y}_i$ are the predicted values from the model.
- $\bar{Y}$ is the mean of the observed values.

#### Statistical Significance (p-values)

To assess the statistical significance of each coefficient, we compute the p-values. The p-value for each feature indicates whether the corresponding coefficient is significantly different from zero. Lower p-values suggest that the feature has a significant impact on the target variable.

The p-value for a term is calculated using hypothesis testing. For each feature $X_i$, the null hypothesis is that $\beta_i = 0$. The p-value helps in deciding whether to reject this null hypothesis.

### Example

Let's walk through the steps to calculate the coefficients and p-values in a multiple linear regression model.

#### 1. Organize the Data

Suppose we have the following dataset with three independent variables (features and one dependent variable (target):

$\begin{array}{cccc} \text{Feature 1} & \text{Feature 2} & \text{Feature 3} & y \\ \hline 3504 & 130 & 12 & 18 \\ 3693 & 165 & 11.5 & 15 \\ 3436 & 150 & 11 & 18 \\ 3433 & 150 & 12 & 16 \\ 3449 & 140 & 10.5 & 17 \\ \end{array}$

**Features Matrix**$X$:

$X = \begin{bmatrix} 3504 & 130 & 12 \\ 3693 & 165 & 11.5 \\ 3436 & 150 & 11 \\ 3433 & 150 & 12 \\ 3449 & 140 & 10.5 \end{bmatrix}$

**Target Vector**$y$:

$y = \begin{bmatrix} 18 \\ 15 \\ 18 \\ 16 \\ 17 \end{bmatrix}$

#### 2. Add the Intercept

To account for the intercept, add a column of ones to the features matrix $X$:

$X' = \begin{bmatrix} 1 & 3504 & 130 & 12 \\ 1 & 3693 & 165 & 11.5 \\ 1 & 3436 & 150 & 11 \\ 1 & 3433 & 150 & 12 \\ 1 & 3449 & 140 & 10.5 \end{bmatrix}$

#### 3. Fit the Model

Fit the multiple linear regression model using the least squares method to estimate the coefficients:

$\hat{\beta} = (X'^T X')^{-1} X'^T y$

Where:

- $\hat{\beta}$ are the estimated coefficients,
- $X'^T$ is the transpose of $X'$,
- $(X'^T X')^{-1}$ is the inverse of $X'^T X'$,
- $y$ is the target vector.

#### 4. Calculate Coefficients and P-values

After fitting the model, you obtain:

**Intercept**($\beta_0$): 40.214**Coefficient for Feature 1**($\beta_1$): -0.0029711**Coefficient for Feature 2**($\beta_2$): -0.063915**Coefficient for Feature 3**($\beta_3$): -0.31671

**P-values** for each coefficient test the null hypothesis that the coefficient is zero:

**Intercept**p-value: 0.35605**Feature 1**p-value: 0.78453**Feature 2**p-value: 0.52916**Feature 3**p-value: 0.82945

#### 5. Calculate $R^2$

The coefficient of determination $R^2$ measures the proportion of variance in the target variable that is explained by the features:

$R^2 = 1 - \frac{\text{RSS}}{\text{TSS}}$

Where:

**RSS**is the residual sum of squares,**TSS**is the total sum of squares.

For this example, $R^2 = 0.69$, indicating that 69% of the variance in $y$ is explained by the model.

#### Conclusion

The model provides an $R^2$ value of 0.9837, showing a good fit. However, most coefficients (including the intercept) have high p-values, suggesting that they are not statistically significant. This means the model explains the variance well, but the individual contributions of the features may not be substantial.