Brief
The R-Squared evaluation metric is also known as the coefficient of determination. It tells us how good our regression line fit to our model and how much output varies based on changes in independent variables. It will give us a floating-point value ranging from 0 to 1 where 1 is an ideal value. The more closer the value to 1 better your model is. Let’s find out how it works using the following R-Squared formula:
Formula
\begin{equation} R^{2}=1 - \frac{ \sum \left (\mathrm{y} _ {\mathrm{i}} - \hat {\mathrm{y}} _ {\mathrm{i}} \right)^{2} }{\sum \left ( \mathrm{y} _ {\mathrm{i}} - \bar {\mathrm{y}}_{\mathrm{i}} \right)^{2}} \end{equation}
Explanation
Here, \( \sum \) = symbol for doing addition of values,
\( \mathrm{y} _ {\mathrm{i}} \) = actual value of y present in a dataset,
\( \hat {\mathrm{y}} _ {\mathrm{i}} \) = (y hat) predicted value of y from a model,
\( \bar {\mathrm{y}}_{\mathrm{i}} \) = (y bar) mean/average value of y from a dataset.
• In the numerator of this formula, we are doing a sum of squared differences between the actual and predicted value of y.
• Whereas In the denominator, we are doing a sum of squared differences between actual and mean/average value of y.
• After doing summation of both, we are dividing them and subtracting it from 1.
Let’s solve one example:
Example
We have following sample table with the mean of Y = 60:
Actual Y | Predicted Y | (Actual Y - Predicted Y)2 | (Actual Y - Mean Y)2 |
---|---|---|---|
70 | 55 | (70 - 55)2 = 225 | (70 - 60)2 = 100 |
40 | 32 | (40 - 32)2 = 64 | (40 - 60)2 = 400 |
84 | 75 | (84 - 75)2 = 81 | (84 - 60)2 = 576 |
44 | 50 | (44 - 50)2 = 36 | (44 - 60)2 = 256 |
62 | 52 | (62 - 52)2 = 100 | (62 - 60)2 = 4 |
Mean Y = 300/5 = 60 | Σ = 506 | Σ = 1336 |
Now we will put this data into our formula:
\begin{equation} R^{2}=1 - \frac{ \sum \left (\mathrm{y} _ {\mathrm{i}} - \hat {\mathrm{y}} _ {\mathrm{i}} \right)^{2} }{\sum \left ( \mathrm{y} _ {\mathrm{i}} - \bar {\mathrm{y}}_{\mathrm{i}} \right)^{2}} \end{equation}
\begin{equation} R^{2}=1 - \frac{506}{1336} \end{equation}
\begin{equation} R^{2}=1 - 0.38 = 0.62 \end{equation}
Conclusion
Here, 0.62 is the R-Squared (coefficient of determination) means that our model can fit a regression line such that it can identify 62% of the data correctly.
Limitation
R-squared only increases as the number of variables gets added as it is not considering the number of variables in the calculation and mean of Y remains the same no matter how many variables we add. Because it never decreases more variables you add in your model, better it becomes at predicting values. In that case, you might end up adding variables that are not suitable for your model, and then your model would not perform well. That is the limitation of R-Squared.
End Note
As you can see R-Squared has a limitation too, so what to do then?
Well, Adjusted R-Squared is used for dealing with the addition of variables. You can find Adjusted R-Squared here: Adjusted R-Squared.