Comparison of the performance of machine learning algorithms in breast cancer screening and detection: A protocol
Background: Breast Cancer (BC) is a known global crisis. TheWorld Health Organization reports a global 2.09 million inci-dences and 627,000 deaths in 2018 relating to BC. The traditionalBC screening method in developed countries is mammography,whilst developing countries employ breast self-examination andclinical breast examination. The prominent gold standard for BCdetection is triple assessment: i) clinical examination, ii) mam-mography and/or ultrasonography; and iii) Fine Needle AspirateCytology. However, the introduction of cheaper, efficient and non-invasive methods of BC screening and detection would be benefi-cial.
Design and methods: We propose the use of eight machinelearning algorithms: i) Logistic Regression; ii) Support VectorMachine; iii) K-Nearest Neighbors; iv) Decision Tree; v) RandomForest; vi) Adaptive Boosting; vii) Gradient Boosting; viii)eXtreme Gradient Boosting, and blood test results using BCCoimbra Dataset (BCCD) from University of California Irvineonline database to create models for BC prediction. To ensure themodels’ robustness, we will employ: i) Stratified k-fold Cross-Validation; ii) Correlation-based Feature Selection (CFS); and iii)parameter tuning. The models will be validated on validation andtest sets of BCCD for full features and reduced features. Featurereduction has an impact on algorithm performance. Seven metricswill be used for model evaluation, including accuracy.
Expected impact of the study for public health: The CFStogether with highest performing model(s) can serve to identifyimportant specific blood tests that point towards BC, which mayserve as an important BC biomarker. Highest performing model(s)may eventually be used to create an Artificial Intelligence tool toassist clinicians in BC screening and detection.
PlumX Metrics provide insights into the ways people interact with individual pieces of research output (articles, conference proceedings, book chapters, and many more) in the online environment. Examples include, when research is mentioned in the news or is tweeted about. Collectively known as PlumX Metrics, these metrics are divided into five categories to help make sense of the huge amounts of data involved and to enable analysis by comparing like with like.
Copyright (c) 2019 Zakia Salod, Yashik Singh
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.