Download
s40249-020-00649-8.pdf 846,12KB
WeightNameValue
1000 Titel
  • Using the spike protein feature to predict infection risk and monitor the evolutionary dynamic of coronavirus
1000 Autor/in
  1. Qiang, Xiao-Li |
  2. Xu, Peng |
  3. Fang, Gang |
  4. Liu, Wen-Bin |
  5. Kou, Zheng |
1000 Erscheinungsjahr 2020
1000 Publikationstyp
  1. Artikel |
1000 Online veröffentlicht
  • 2020-03-25
1000 Erschienen in
1000 Quellenangabe
  • 9(1):33
1000 Copyrightjahr
  • 2020
1000 Lizenz
1000 Verlagsversion
  • https://doi.org/10.1186/s40249-020-00649-8 |
  • https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7093988/ |
1000 Ergänzendes Material
  • https://idpjournal.biomedcentral.com/articles/10.1186/s40249-020-00649-8#availability-of-data-and-materials |
1000 Publikationsstatus
1000 Begutachtungsstatus
1000 Sprache der Publikation
1000 Abstract/Summary
  • BACKGROUND: Coronavirus can cross the species barrier and infect humans with a severe respiratory syndrome. SARS-CoV-2 with potential origin of bat is still circulating in China. In this study, a prediction model is proposed to evaluate the infection risk of non-human-origin coronavirus for early warning. METHODS: The spike protein sequences of 2666 coronaviruses were collected from 2019 Novel Coronavirus Resource (2019nCoVR) Database of China National Genomics Data Center on Jan 29, 2020. A total of 507 human-origin viruses were regarded as positive samples, whereas 2159 non-human-origin viruses were regarded as negative. To capture the key information of the spike protein, three feature encoding algorithms (amino acid composition, AAC; parallel correlation-based pseudo-amino-acid composition, PC-PseAAC and G-gap dipeptide composition, GGAP) were used to train 41 random forest models. The optimal feature with the best performance was identified by the multidimensional scaling method, which was used to explore the pattern of human coronavirus. RESULTS: The 10-fold cross-validation results showed that well performance was achieved with the use of the GGAP (g = 3) feature. The predictive model achieved the maximum ACC of 98.18% coupled with the Matthews correlation coefficient (MCC) of 0.9638. Seven clusters for human coronaviruses (229E, NL63, OC43, HKU1, MERS-CoV, SARS-CoV, and SARS-CoV-2) were found. The cluster for SARS-CoV-2 was very close to that for SARS-CoV, which suggests that both of viruses have the same human receptor (angiotensin converting enzyme II). The big gap in the distance curve suggests that the origin of SARS-CoV-2 is not clear and further surveillance in the field should be made continuously. The smooth distance curve for SARS-CoV suggests that its close relatives still exist in nature and public health is challenged as usual. CONCLUSIONS: The optimal feature (GGAP, g = 3) performed well in terms of predicting infection risk and could be used to explore the evolutionary dynamic in a simple, fast and large-scale manner. The study may be beneficial for the surveillance of the genome mutation of coronavirus in the field.
1000 Sacherschließung
gnd 1206347392 COVID-19
lokal Cross-species infection
lokal Machine learning
lokal Spike protein
lokal Coronavirus
1000 Fächerklassifikation (DDC)
1000 Liste der Beteiligten
  1. https://frl.publisso.de/adhoc/uri/UWlhbmcsIFhpYW8tTGk=|https://frl.publisso.de/adhoc/uri/WHUsIFBlbmc=|https://frl.publisso.de/adhoc/uri/RmFuZywgR2FuZw==|https://frl.publisso.de/adhoc/uri/TGl1LCBXZW4tQmlu|https://frl.publisso.de/adhoc/uri/S291LCBaaGVuZw==
1000 Label
1000 Förderer
  1. National Natural Science Foundation of China |
  2. Natural Science Foundation of Guangdong Province |
1000 Fördernummer
  1. 61972109; 61632002
  2. 2018A030313380
1000 Förderprogramm
  1. -
  2. -
1000 Dateien
1000 Förderung
  1. 1000 joinedFunding-child
    1000 Förderer National Natural Science Foundation of China |
    1000 Förderprogramm -
    1000 Fördernummer 61972109; 61632002
  2. 1000 joinedFunding-child
    1000 Förderer Natural Science Foundation of Guangdong Province |
    1000 Förderprogramm -
    1000 Fördernummer 2018A030313380
1000 Objektart article
1000 Beschrieben durch
1000 @id frl:6419762.rdf
1000 Erstellt am 2020-04-06T14:38:07.926+0200
1000 Erstellt von 122
1000 beschreibt frl:6419762
1000 Bearbeitet von 122
1000 Zuletzt bearbeitet 2020-04-06T14:47:23.490+0200
1000 Objekt bearb. Mon Apr 06 14:39:02 CEST 2020
1000 Vgl. frl:6419762
1000 Oai Id
  1. oai:frl.publisso.de:frl:6419762 |
1000 Sichtbarkeit Metadaten public
1000 Sichtbarkeit Daten public
1000 Gegenstand von

View source