Abstract
Consumers now have more options and comparison shopping when making purchases thanks to the new retail industry's quick expansion. This has increased rivalry among supermarket chains and continuously compressed profit margins. This research uses big data analytics to compare several ML approaches for forecasting grocery sales. The purpose of the research is to assess how well different ML algorithms predict sales. This paper uses supermarket sales data as the database, uses Python to conduct data preprocessing. By analyzing this data, including identifying anomalies and trends through data mining techniques, develop predictive models using advanced machine learning algorithms including DT, XGBoost, GB, and RF are employed to forecast sales volumes with greater accuracy than traditional methods across MAE, and R2-score. In evaluating model performance, the Extra Trees (ET) model exhibits superior accuracy with an R² of 0.94 and MAE of 1.96, compared to the existing models. Addressing challenges such as data sparsity, variability, and adapting to dynamic market conditions will be crucial. Additionally, future research could investigate novel methods for combining diverse forecasting approaches and refining models to better handle the complexities of real-world sales data.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.