Using Machine Learning on an Imbalanced Cancer Dataset

James Ernest Ekow Arthur, University of Texas at El Paso


With an estimated 1.4 million cancer diagnosis worldwide and the increasing death of cancer patients. It is prudent to investigate methods, approaches and smarter ways of predicting and diagnosing of cancer so that a holistic techniques can be used to curb or reduce false predictions , increase exact predictions and also meticulos prognosis information . Can a feasible technique be developed for the general problem of prognosis and diagnosis of cancer be developed ? We will show here that this problem of cancer prognosis and diagnosis can be eciently tackled with the aid of machine learning techniques and the best, feasible and ecient technique can be used to reduce this cancer menace. Cancer has been characterized as a heterogeneous disease consisting of many di↵erent subtypes. The early diagnosis and prognosis of a cancer type have become a necessity in cancer research, as it can facilitate the subsequent clinical management of patients. The importance of classifying cancer patients into high or low risk groups has led many research teams, from the biomedical and the bioinformatics field, to study the application of machine learning (ML) methods. Therefore, these techniques have been utilized as an aim to model the progression and 16 treatment of cancerous conditions. In addition, the ability of ML tools to detect key features from complex datasets reveals their importance. A variety of these techniques, including Artificial Neural Networks (ANNs), Bayesian Networks (BNs), Support Vector Machines (SVMs) and Decision Trees (DTs) have been widely applied in cancer research for the development of predictive models, resulting ine↵ective and accurate decision making. Even though it is evident that the use of Machine Learning methods can improve our understanding of cancer progression, an appropriate level of validation is needed in order for these methods to be considered in the everyday clinical practice. In this work, what we present are view of recent ML approaches employed in the modeling ofcancer prognosis and diagnosis.

Subject Area

Mathematics|Artificial intelligence|Oncology

Recommended Citation

Arthur, James Ernest Ekow, "Using Machine Learning on an Imbalanced Cancer Dataset" (2020). ETD Collection for University of Texas, El Paso. AAI28092733.