Posts R-Squared (R2) - Coefficient of Determination - Evaluation Metrics
Post
Cancel

R-Squared (R2) - Coefficient of Determination - Evaluation Metrics

Brief

The R-Squared evaluation metric is also known as the coefficient of determination. It tells us how good our regression line fit to our model and how much output varies based on changes in independent variables. It will give us a floating-point value ranging from 0 to 1 where 1 is an ideal value. The more closer the value to 1 better your model is. Let’s find out how it works using the following R-Squared formula:

Formula

\begin{equation} R^{2}=1 - \frac{ \sum \left (\mathrm{y} _ {\mathrm{i}} - \hat {\mathrm{y}} _ {\mathrm{i}} \right)^{2} }{\sum \left ( \mathrm{y} _ {\mathrm{i}} - \bar {\mathrm{y}}_{\mathrm{i}} \right)^{2}} \end{equation}

Explanation

Here, \( \sum \) = symbol for doing addition of values,
\( \mathrm{y} _ {\mathrm{i}} \) = actual value of y present in a dataset,
\( \hat {\mathrm{y}} _ {\mathrm{i}} \) = (y hat) predicted value of y from a model,
\( \bar {\mathrm{y}}_{\mathrm{i}} \) = (y bar) mean/average value of y from a dataset.

• In the numerator of this formula, we are doing a sum of squared differences between the actual and predicted value of y.
• Whereas In the denominator, we are doing a sum of squared differences between actual and mean/average value of y.
• After doing summation of both, we are dividing them and subtracting it from 1.

Let’s solve one example:

Example

We have following sample table with the mean of Y = 60:

Actual YPredicted Y(Actual Y - Predicted Y)2(Actual Y - Mean Y)2
7055(70 - 55)2 = 225(70 - 60)2 = 100
4032(40 - 32)2 = 64(40 - 60)2 = 400
8475(84 - 75)2 = 81(84 - 60)2 = 576
4450(44 - 50)2 = 36(44 - 60)2 = 256
6252(62 - 52)2 = 100(62 - 60)2 = 4
Mean Y = 300/5 = 60 Σ = 506Σ = 1336

Now we will put this data into our formula:

\begin{equation} R^{2}=1 - \frac{ \sum \left (\mathrm{y} _ {\mathrm{i}} - \hat {\mathrm{y}} _ {\mathrm{i}} \right)^{2} }{\sum \left ( \mathrm{y} _ {\mathrm{i}} - \bar {\mathrm{y}}_{\mathrm{i}} \right)^{2}} \end{equation}

\begin{equation} R^{2}=1 - \frac{506}{1336} \end{equation}

\begin{equation} R^{2}=1 - 0.38 = 0.62 \end{equation}

Conclusion

Here, 0.62 is the R-Squared (coefficient of determination) means that our model can fit a regression line such that it can identify 62% of the data correctly.

Limitation

R-squared only increases as the number of variables gets added as it is not considering the number of variables in the calculation and mean of Y remains the same no matter how many variables we add. Because it never decreases more variables you add in your model, better it becomes at predicting values. In that case, you might end up adding variables that are not suitable for your model, and then your model would not perform well. That is the limitation of R-Squared.

End Note

As you can see R-Squared has a limitation too, so what to do then?

Well, Adjusted R-Squared is used for dealing with the addition of variables. You can find Adjusted R-Squared here: Adjusted R-Squared.

This post is licensed under CC BY 4.0 by the author.