– What were the difficulties you encountered during the project?
Difficult part for the group was trying to capture the motivation behind this project. Frankly at first , we thought we will just have to deal with reading the data and analyzing it , not actually modifying it for such reasons(building machine learning models, applying linear regression for various feature sets etc). Besides, Our data’s range was not exactly equal because there were some extra dates in train.csv which creates missmatches on our graphs.
– If you were given sufficient amount of resources, what additional datasets would you utilize?
We would’ve wanted a more correlated Consumer Price Index attribute to other attributes since group discussions always lead to the idea that CPI actually effects Fuel Price, Unemployment and Weekly Sales in our datasets. Moreover, we could use inflation rate in our data to have more accurate correlations. Because we believe that inflation rate has a negative correlation with weekly sales, fuel prices and CPI.
– Compare the machine learning algorithms you used, in terms of performance and applicability to your dataset.
First of all, we used decision tree algorithm and random forest algorithm for building machine learning models within our datasets. Random forest, in our opininon, was more applicable to our dataset and since its builded on decision trees, it is actually better in terms of performance
– What improvements could have been done in your project?
it is a fact that walmart is not the only supermarket chain in the United States. So, we could gather more supermarket chain brands’ datas together and have more accurate results. Maybe, we could find better correlations between our data types.