Customer Segmentation Analysis and Churn Prediction in E-commerce Using K-Means Clustering and Random Forest: A Case Study of the Brazilian E-commerce Platform Olist
Keywords:
Customer Intelligence, Churn Prediction, K-Means Clustering, Random Forest, RFMAbstract
The rapid growth of the e-commerce industry requires digital platforms to focus on customer retention strategies to ensure business sustainability. This study aims to integrate a customer intelligence approach through customer segmentation and loyalty risk prediction. The methods applied in this study combine unsupervised learning techniques using the K-Means algorithm and supervised learning using the Random Forest algorithm on the Olist Brazilian E-commerce dataset. The clustering process based on the Recency, Frequency, and Monetary metrics produced optimal groupings with a Silhouette Score of 0.36. Furthermore, the Random Forest model successfully predicted the potential for churn with an accuracy rate of 85.37%. The combination of these two methods significantly contributes to mapping high-risk customer segments, enabling management to formulate precise retention programs.
References
Bramer, M. (2020). Principles of data mining (4th ed.). Springer.
Garetti, M., & Taisch, M. (2019). Customer intelligence in digital commerce: A review of clustering techniques. International Journal of Information Systems, 14(2), 112–128.
Han, J., Kamber, M., & Jian, P. (2011). Data mining: Concepts and techniques (3th ed.). Morgan Kaufmann.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). An introduction to statistical learning: With applications in R (2nd ed.). Springer.
Kaggle. (2018). Brazilian E-Commerce Public Dataset by Olist. Kaggle Repositories. https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12(85), 2825–2830.
Turban, E., Outland, J., King, D., Lee, J. K., Liang, T. P., & Turban, D. C. (2018). Electronic commerce 2018: A managerial and social networks perspective. Springer.
Witten, I. H., Frank, E., Hall, M. A., & Pal, C. J. (2016). Data mining: Practical machine learning tools and techniques (4th ed.). Morgan Kaufmann.
Zhang, Y., & Chen, X. (2022). Customer churn prediction in e-commerce using random forest and grid search optimization. Journal of Big Data Analytcs, 9(1), 45–59.
Zhao, J., & Claster, W. B. (2021). Combining RFM analysis and K-means clustering for customer segmentation in online retail platforms. International Journal of Electronic Commerce Research, 22(3), 204–221.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Bayu Aditiya Nuryansyah, Raihanullah, Oriandika Rasyid (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.
Creative Commons Attribution 4.0 International (CC BY 4.0).


This work is licensed under a