Download
journal.pone.0232391.pdf 4,22MB
WeightNameValue
1000 Titel
  • Machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: COVID-19 case study
1000 Autor/in
  1. Randhawa, Gurjit |
  2. Soltysiak, Maximillian |
  3. El Roz, Hadi |
  4. de Souza, Camila P. E. |
  5. Hill, Kathleen A. |
  6. Kari, Lila |
1000 Erscheinungsjahr 2020
1000 Publikationstyp
  1. Artikel |
1000 Online veröffentlicht
  • 2020-04-24
1000 Erschienen in
1000 Quellenangabe
  • 15(4):e0232391
1000 Copyrightjahr
  • 2020
1000 Lizenz
1000 Verlagsversion
  • https://doi.org/10.1371/journal.pone.0232391 |
1000 Ergänzendes Material
  • https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0232391#sec006 |
1000 Publikationsstatus
1000 Begutachtungsstatus
1000 Sprache der Publikation
1000 Abstract/Summary
  • The 2019 novel coronavirus (renamed SARS-CoV-2, and generally referred to as the COVID-19 virus) has spread to 184 countries with over 1.5 million confirmed cases. Such major viral outbreaks demand early elucidation of taxonomic classification and origin of the virus genomic sequence, for strategic planning, containment, and treatment. This paper identifies an intrinsic COVID-19 virus genomic signature and uses it together with a machine learning-based alignment-free approach for an ultra-fast, scalable, and highly accurate classification of whole COVID-19 virus genomes. The proposed method combines supervised machine learning with digital signal processing (MLDSP) for genome analyses, augmented by a decision tree approach to the machine learning component, and a Spearman’s rank correlation coefficient analysis for result validation. These tools are used to analyze a large dataset of over 5000 unique viral genomic sequences, totalling 61.8 million bp, including the 29 COVID-19 virus sequences available on January 27, 2020. Our results support a hypothesis of a bat origin and classify the COVID-19 virus as Sarbecovirus, within Betacoronavirus. Our method achieves 100% accurate classification of the COVID-19 virus sequences, and discovers the most relevant relationships among over 5000 viral genomes within a few minutes, ab initio, using raw DNA sequence data alone, and without any specialized biological knowledge, training, gene or genome annotations. This suggests that, for novel viral and pathogen genome sequences, this alignment-free whole-genome machine-learning approach can provide a reliable real-time option for taxonomic classification.
1000 Sacherschließung
gnd 1206347392 COVID-19
lokal Taxonomy
lokal Machine learning
lokal Viral taxonomy
lokal Coronaviruses
lokal Sequence alignment
lokal Comparative genomics
lokal Bats
lokal Viral genomics
1000 Fächerklassifikation (DDC)
1000 Liste der Beteiligten
  1. https://orcid.org/0000-0003-1054-125X|https://orcid.org/0000-0001-7495-5203|https://orcid.org/0000-0002-4020-701X|https://frl.publisso.de/adhoc/uri/ZGUgU291emEsIENhbWlsYSBQLiBFLg==|https://frl.publisso.de/adhoc/uri/SGlsbCwgS2F0aGxlZW4gQS4=|https://frl.publisso.de/adhoc/uri/S2FyaSwgTGlsYQ==
1000 (Academic) Editor
1000 Label
1000 Förderer
  1. Natural Sciences and Engineering Research Council of Canada |
1000 Fördernummer
  1. R2824A01; R3511A12
1000 Förderprogramm
  1. -
1000 Dateien
1000 Förderung
  1. 1000 joinedFunding-child
    1000 Förderer Natural Sciences and Engineering Research Council of Canada |
    1000 Förderprogramm -
    1000 Fördernummer R2824A01; R3511A12
1000 Objektart article
1000 Beschrieben durch
1000 @id frl:6420461.rdf
1000 Erstellt am 2020-04-27T11:45:49.026+0200
1000 Erstellt von 122
1000 beschreibt frl:6420461
1000 Bearbeitet von 122
1000 Zuletzt bearbeitet Mon Apr 27 11:47:44 CEST 2020
1000 Objekt bearb. Mon Apr 27 11:47:29 CEST 2020
1000 Vgl. frl:6420461
1000 Oai Id
  1. oai:frl.publisso.de:frl:6420461 |
1000 Sichtbarkeit Metadaten public
1000 Sichtbarkeit Daten public
1000 Gegenstand von

View source