Prediction of Genomic Signature of Ngs Sequences and Comparative Drug-Likeness

Authors

  • Angela U Makolo University of Ibadan Bioinformatics Group, Department of Computer Science, University of Ibadan, Ibadan 200132, Nigeria Department of Computer Science, Federal School of Statistics Ibadan
  • festus segun Ajiboye University of Ibadan Bioinformatics Group, Department of Computer Science, University of Ibadan, Ibadan 200132, Nigeria Department of Computer Science, Federal School of Statistics Ibadan

Keywords:

SAR-CoV-1, SAR-CoV-2, compression complexity, Lempel-Ziv, Lipinski descriptor, , Regression model

Abstract

Developing a drug or particular immunotherapy medication for a worldwide epidemic illness caused by viruses (current pandemic) necessitates comprehensive evaluation and annotation of the metagenomic datasets to filter nucleotide sequences quickly and efficiently. Because of the homologs' origin of aligning sequences, space complexity, and time complexity of the analyzing system, traditional sequence alignment procedures are unsuccessful. This necessitates employing an alignment-free sequencing approach in this research that solves the foregoing issue. We suggest a distance function that compresses performance metrics for automatically identifying Short nucleotide sequences used by SARS coronavirus variants to identify critical features in genetic markers and genomic structure. This method provides easy recognition of data compressed by using a set of mathematical and computational tools in the study. We also show that by using our suggested technique to examine extremely short regions of nucleotide sequences, we can differentiate SAR-CoV-2 from SAR-CoV-1 viruses. Later, the Lipinski descriptor (rule of 5) was used to predict the drug-likeness of the target protein in SARS-CoV-2. A regression model using random forest was created to validate the machine learning model for computational analysis. This work was furthered by comparing the regressor model to other machine learning models using lezypredict, allowing scientists to swiftly and accurately identify and describe the SARS coronavirus strains.  

References

Abraham, L. & Jacob, Z. (2019). The difficulties of discrete sequencing is discussed. IEEE Transactions on Cognitive Science, 22(1):75–81

Alagaili, AN., Briese, T., Mishra, N., Kapoor, V., Sameroff, SC., & deWit, E., et al.(2014). Middle east respiratory syndrome coronavirus infection in dromedary camels in Saudi Arabia. MBio. 2014; 5. https://doi.org/10. 1128/mBio.00884-14

Bin Li, Yi-Bing Li, and Hong-Bo He (2005). LZ Nucleotide sequence functionality position and its implementation in phylogenetic analysis restructuring. Genomics Bioinformatics & Proteomics, 3(4):206–212.

DR,P., Bose, P. (2021). Comparative study of Sars, Mers, Bat-sars and Sars- cov-2. News medical Life sciences

Gurjit, R., & Maximillian P. (2020). Machine learning using intrinsic genomic signatures for rapid classification of novel. International Journal of natural Science and Engineering Research Council of Canada.: 10:1371.

Hafiz., A., & Farheen., R. (2021). Comprehensive comparative genomic and microsatellite analysis of SARS, MERS, BATSARS, and COVID?19 coronaviruses. International Journal of natural Science and Engineering Research Council of Canada, Vol 10 issue 1002.

Hasan, H., & Khalid, S,. (2003). A new sequence distance measure for phylogenetic tree construction. Bioinformatics, 19(16):2122–2130, 2003.

Karthi, B., Nithin, N. (2020). Compression-complexity measurements: Automatic identification of SARS coronavirus. International Journal of natural Science and Engineering Research Council of Canada.: Volume 3 issue 24.

Vijayaragavan., S.P. Kumar., B. & Ajay., p. (2000). Prediction of genetic structure in eukayotic DNA using refrence point logistic regression and sequence alignment. International Journal of Advanced Research in Computer Engineering & Technology (IJARCET). Vol 16 Issue 5

Mitchell, T,. (1999). Machine learning and data mining. Communications of the ACM, 42(11), 30-36.

Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22(22),:4673–80.

Vinga, S. & Almeida, J. (2003). Alignment-free sequence comparison—a review. Bioinformatics; 19(4):513–23.

Christopher, L,. (2002). Capture the Untapped Value of Therapeutics. Melior Pharmaceuticals. https://www.meliordiscovery.com/christopher-lipinski/

Karthi, B. & Nithin, N. (2020). Compression-complexity measurements: Automatic identification of SARS coronavirus bioRxiv preprint doi: https://doi.org/10.1101/2020.03.24.006007.

Kumar, S., Stecher, G., Tamura, K,. (2016). MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol; 33(7):1870–4.

Lempel, A., Ziv, J. (1976). On the complexity of finite sequences. IEEE Transactions on information theory;22(1):75–81. LI, M. & Paul, M. (2014). Kolmogorov complexity and its applications. Algorithms and Complexity, 1:187. Liwei Liu, Dongbo Li, and Fenglan Bai. Application of a relative Lempel-Ziv complexity to comparing biological nucleotide. Letters in Chemical Physics, 530:107–112.

Lu, H., Yang, L., Yan, K., Xue, Y. & Gao, Z. (2017). A cost-sensitive rotation forest algorithm for gene expression data classification. Neurocomputing; 228:270–6

Lu, R., Zhao, X., Li, J., Niu, P., Yang, B. & Wu, H., et al (2020). Genomic characterization and epidemiology of novel coronavirus: implications for virus origins and receptor binding. Lancet;. https://doi.org/10.1016/S0140-6736(20)30251-8 Luk, H., Li, X., Fung, J., Lau, S. & Woo, P. (2019). Molecular epidemiology, evolution and phylogeny of SARS coronavirus. Infection, Genetics and Evolution; 71: 21–30. https://doi.org/10.1016/j.meegid.

Maguire, P., Moser, P., Maguire, R. & Griffith, V. (2014). Is it possible to program consciousness? Using algorithmic information theory to quantify integrated data. arXiv preprint arXiv:14050126.

Mitchell, T,. (1999). Machine learning and data mining. Communications of the ACM, 42(11), 30-36. Ming, Li., Jonathan, H., Badger, X., Chen., Sam, K., Paul, K., & Haoyong, Z. (2001). The application of an information- based sequencing distance to the phylogeny of the entire mitochondrial genome. Bioinformatics, 17(2):149–154.

Downloads

Published

2023-01-02

How to Cite

Makolo, A. U., & Ajiboye, festus segun. (2023). Prediction of Genomic Signature of Ngs Sequences and Comparative Drug-Likeness. American Scientific Research Journal for Engineering, Technology, and Sciences, 90(1), 573–589. Retrieved from https://www.asrjetsjournal.org/index.php/American_Scientific_Journal/article/view/8383

Issue

Section

Articles