Machine Learning in Prediction of Second Primary Cancer and Recurrence in Colorectal Cancer
Abstract
Background: Colorectal cancer (CRC) is the third commonly diagnosed cancer worldwide.
Recurrence of CRC (Re) and onset of a second primary malignancy (SPM) are important indicators
in treating CRC, but it is often difficult to predict the onset of a SPM. Therefore, we used mechanical
learning to identify risk factors that affect Re and SPM.
Patient and Methods: CRC patients with cancer registry database at three medical centers were
identified. All patients were classified based on Re or no recurrence (NRe) as well as SPM or no SPM
(NSPM). Two classifiers, namely A Library for Support Vector Machines (LIBSVM) and Reduced
Error Pruning Tree (REPTree), were applied to analyze the relationship between clinical features
and Re and/or SPM category by constructing optimized models.
Results: When Re and SPM were evaluated separately, the accuracy of LIBSVM was 0.878 and that of
REPTree was 0.622. When Re and SPM were evaluated in combination, the precision of models for
SPM+Re, NSPM+Re, SPM+NRe, and NSPM+NRe was 0.878, 0.662, 0.774, and 0.778, respectively.
Conclusions: Machine learning can be used to rank factors affecting tumor Re and SPM. In clinical
practice, routine checkups are necessary to ensure early detection of new tumors. The success of
prediction and early detection may be enhanced in the future by applying “big data” analysis methods
such as machine learning.