COVID-19, Transcriptome data, Phenotype analysis, Machine learning models, Respiratory diseases, Dysregulated genes.
Purpose: Evolving technologies allow us to measure human molecular data in a wide reach. Those data are extensively used by researchers in many studies and help in advancements of medical field. Transcriptome, proteome, metabolome, and epigenome are few such molecular data. This study utilizes the transcriptome data of COVID-19 patients to uncover the dysregulated genes in the SARS-COV-2.
Method: Selected genes are used in machine learning models to predict various phenotypes of those patients. Ten different phenotypes are studied here such as time since onset, COVID-19 status, connection between age and COVID-19, hospitalization status and ICU status, using classification models. Further, this study compares molecular characterization of COVID-19 patients with other respiratory diseases.
Results: Gene ontology analysis on the selected features shows that they are highly related to viral infection. Features are selected using two methods and selected features are individually used in the classification of patients using six different machine learning algorithms. For each of the selected phenotype, results are compared to find the best prediction model.
Conclusion: Even though, there are not any significant differences between the feature selection methods, random forest and SVM performs very well throughout all the phenotype studies.