Download
s00439-021-02411-y.pdf 1,73MB
WeightNameValue
1000 Titel
  • Embeddings from protein language models predict conservation and variant effects
1000 Autor/in
  1. Marquet, Céline |
  2. Heinzinger, Michael |
  3. Olenyi, Tobias |
  4. Dallago, Christian |
  5. Erckert, Kyra |
  6. Bernhofer, Michael |
  7. Nechaev, Dmitrii |
  8. Rost, Burkhard |
1000 Erscheinungsjahr 2021
1000 Publikationstyp
  1. Artikel |
1000 Online veröffentlicht
  • 2021-12-30
1000 Erschienen in
1000 Quellenangabe
  • 141(10):1629-1647
1000 Copyrightjahr
  • 2021
1000 Lizenz
1000 Verlagsversion
  • https://doi.org/10.1007/s00439-021-02411-y |
  • https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8716573/ |
1000 Publikationsstatus
1000 Sprache der Publikation
1000 Abstract/Summary
  • The emergence of SARS-CoV-2 variants stressed the demand for tools allowing to interpret the effect of single amino acid variants (SAVs) on protein function. While Deep Mutational Scanning (DMS) sets continue to expand our understanding of the mutational landscape of single proteins, the results continue to challenge analyses. Protein Language Models (pLMs) use the latest deep learning (DL) algorithms to leverage growing databases of protein sequences. These methods learn to predict missing or masked amino acids from the context of entire sequence regions. Here, we used pLM representations (embeddings) to predict sequence conservation and SAV effects without multiple sequence alignments (MSAs). Embeddings alone predicted residue conservation almost as accurately from single sequences as ConSeq using MSAs (two-state Matthews Correlation Coefficient-MCC-for ProtT5 embeddings of 0.596 ± 0.006 vs. 0.608 ± 0.006 for ConSeq). Inputting the conservation prediction along with BLOSUM62 substitution scores and pLM mask reconstruction probabilities into a simplistic logistic regression (LR) ensemble for Variant Effect Score Prediction without Alignments (VESPA) predicted SAV effect magnitude without any optimization on DMS data. Comparing predictions for a standard set of 39 DMS experiments to other methods (incl. ESM-1v, DeepSequence, and GEMME) revealed our approach as competitive with the state-of-the-art (SOTA) methods using MSA input. No method outperformed all others, neither consistently nor statistically significantly, independently of the performance measure applied (Spearman and Pearson correlation). Finally, we investigated binary effect predictions on DMS experiments for four human proteins. Overall, embedding-based methods have become competitive with methods relying on MSAs for SAV effect prediction at a fraction of the costs in computing/energy. Our method predicted SAV effects for the entire human proteome (~ 20 k proteins) within 40 min on one Nvidia Quadro RTX 8000. All methods and data sets are freely available for local and online execution through bioembeddings.com, https://github.com/Rostlab/VESPA , and PredictProtein.
1000 Sacherschließung
gnd 1206347392 COVID-19
lokal Algorithms [MeSH]
lokal Language [MeSH]
lokal Humans [MeSH]
lokal SARS-CoV-2/genetics [MeSH]
lokal Molecular Medicine
lokal Original Investigation
lokal COVID-19/genetics [MeSH]
lokal Metabolic Diseases
lokal Computational Interpretation of Human Genetic Variation
lokal Gene Function
lokal Amino Acids [MeSH]
lokal Proteome [MeSH]
lokal Human Genetics
1000 Liste der Beteiligten
  1. https://orcid.org/0000-0002-8691-5791|https://frl.publisso.de/adhoc/uri/SGVpbnppbmdlciwgTWljaGFlbA==|https://frl.publisso.de/adhoc/uri/T2xlbnlpLCBUb2JpYXM=|https://frl.publisso.de/adhoc/uri/RGFsbGFnbywgQ2hyaXN0aWFu|https://frl.publisso.de/adhoc/uri/RXJja2VydCwgS3lyYQ==|https://frl.publisso.de/adhoc/uri/QmVybmhvZmVyLCBNaWNoYWVs|https://frl.publisso.de/adhoc/uri/TmVjaGFldiwgRG1pdHJpaQ==|https://frl.publisso.de/adhoc/uri/Um9zdCwgQnVya2hhcmQ=
1000 Hinweis
  • DeepGreen-ID: df1b7c7826be48d4bc55b660c1d3ce7a ; metadata provieded by: DeepGreen (https://www.oa-deepgreen.de/api/v1/), LIVIVO search scope life sciences (http://z3950.zbmed.de:6210/livivo), Crossref Unified Resource API (https://api.crossref.org/swagger-ui/index.html), to.science.api (https://frl.publisso.de/), ZDB JSON-API (beta) (https://zeitschriftendatenbank.de/api/), lobid - Dateninfrastruktur für Bibliotheken (https://lobid.org/resources/search)
1000 Label
1000 Dateien
1000 Objektart article
1000 Beschrieben durch
1000 @id frl:6446822.rdf
1000 Erstellt am 2023-04-28T14:16:37.248+0200
1000 Erstellt von 322
1000 beschreibt frl:6446822
1000 Zuletzt bearbeitet Fri Oct 20 19:04:13 CEST 2023
1000 Objekt bearb. Fri Oct 20 19:04:13 CEST 2023
1000 Vgl. frl:6446822
1000 Oai Id
  1. oai:frl.publisso.de:frl:6446822 |
1000 Sichtbarkeit Metadaten public
1000 Sichtbarkeit Daten public
1000 Gegenstand von

View source