(click to copy)
Early detection of ovarian cancer is crucial for a good outlook. Different machine learning methods have already proven useful to that effect, but using many features and samples often yields a complex structure of classifier algorithms.
This study investigates the effect of four different manifold learning methods prior to well-known classification algorithms to reduce the number of features and compares the achieved results with the well-known principal component analysis method.
The NCI PBSII dataset, which consists of 253 samples with 15154 features, is used in this study. We tested nine distinct classifiers: k-nearest neighbors, decision tree, support vector machines, stochastic gradient descent, random forest, multi-layer perceptron, Naive Bayes, logistic regression, and AdaBoost.
Among these classifiers, the logistic regression gives a maximum of 99.2% accuracy using these features. These classifiers were rerun for five distinct reduced feature sets obtained using principal component analysis, Multidimensional Scaling, Locally Linear Embedding, Isometric Feature Mapping, and t-Distributed Stochastic Neighbor Embedding methods. Among these feature reduction methods, Locally Linear Embedding hit the maximum classifier performance five times (of nine classifiers) with an average of 15.4 components. Both the logistic regression classifier with 28 Multidimensional Scaling components and the stochastic gradient descent classifier with 30 Locally Linear Embedding components achieved the maximum accuracies of 99.8%. On the other hand, the commonly used principal component analysis resulted in a maximum of 99.7% accuracy using stochastic gradient descent with 30 principles.
In conclusion, although principal component analysis is the most commonly used feature reduction method, the Locally Linear Embedding (a manifold learning method) may give higher classifier performances with fewer components in the diagnosis of ovarian cancer.
B. Yesilkaya, M. Perc, Y. Isler, Manifold learning methods for the diagnosis of ovarian cancer, Journal of Computational Science 63 (2022) 101775