Skip to content

Commit dd0dce0

Browse files
authored
Update README.md
1 parent 02c2343 commit dd0dce0

File tree

1 file changed

+8
-0
lines changed

1 file changed

+8
-0
lines changed

README.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,3 +24,11 @@ need to calculate the Covid Cases on 1st Oct 2020 for every City in the test dat
2424
output file 02 should contain only City and the respective Covid Cases on 1st October.
2525

2626
**Our approach here**
27+
We used a Voting Regression model consisting of three different models: Random forest, Gradient Boosting Regression and Kernel Ridge Regression
28+
After some initial analysis of the data, when we visualized the data, there was no linear trend between different features and target value .So we initially decided to try some tree based models. We started with the basic decision tree but the RMSE value was not that good. Then we used random forest and the RMSE improved drastically. Then we decided to try out few different ensemble learning techniques. We tried XGBoost , Gradient Boosting , Adaboost regression. Among these, gradient boosting helped in improving the RMSE value. We also tried out Kernel Ridge Regression which also improved the RMSE slightly. Then, we decided to merge all these good models in a voting regression model for the final prediction.
29+
Random Forest: Random Forest is a model which consists of many decision trees.
30+
Gradient Boosting regression: Gradient Boosting produces predictive model from an ensemble of weak predictive models.
31+
Kernel ridge regression: It combines Ridge Regression with the kernel trick.
32+
Voting regression: A voting regressor is an ensemble meta-estimator that fits several base regressors, each on the whole dataset. Then it averages the individual predictions to form a final prediction.
33+
34+
For second part,we were supposed to predict covid cases on 1st october using time series prediction model. After analyzing the trend of the time series in Excel using sparklines, there was a clear upward increasing trend in the data. So, we tried Holtz Linear trend model as it is known for good forecasting on time series having a common trend. But because of the NaN values, we were not able to get good forecast and there were many outliers. So, we decided to go with Simple Exponential Smoothing which is based on the principle that the most recent value is attached higher weight than the values from distant past. With that we got a better forecasting. We used a alpha value of 0.9 in the formula of simple exponential smoothing so as to avoid forecasting large values which don’t follow the trend.

0 commit comments

Comments
 (0)