Title and abstract screening for literature reviews using large language models: an exploratory study in the biomedical domain

Dennstädt, Fabio

Zink, Johannes

Putora, Paul Martin

Hastings, Janna

Cihoric, Nikola

Download
13643_2024_Article_2575.pdf 1,61MB

Name

Value

Titel

Title and abstract screening for literature reviews using large language models: an exploratory study in the biomedical domain

Autor/in

Verlag

BioMed Central

Erscheinungsjahr

2024

Publikationstyp

Artikel |

Online veröffentlicht

2024-06-15

Erschienen in

Systematic Reviews |

Quellenangabe

13(1):158

Copyrightjahr

2024

Lizenz

https://creativecommons.org/licenses/by/4.0/ |

Verlagsversion

https://doi.org/10.1186/s13643-024-02575-4 |
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11180407/ |

Publikationsstatus

Postprint Verlagsversion |

Begutachtungsstatus

begutachtet (Peer-reviewed) |

Sprache der Publikation

Englisch |

Abstract/Summary

<jats:title>Abstract</jats:title><jats:sec> <jats:title>Background</jats:title> <jats:p>Systematically screening published literature to determine the relevant publications to synthesize in a review is a time-consuming and difficult task. Large language models (LLMs) are an emerging technology with promising capabilities for the automation of language-related tasks that may be useful for such a purpose.</jats:p> </jats:sec><jats:sec> <jats:title>Methods</jats:title> <jats:p>LLMs were used as part of an automated system to evaluate the relevance of publications to a certain topic based on defined criteria and based on the title and abstract of each publication. A Python script was created to generate structured prompts consisting of text strings for instruction, title, abstract, and relevant criteria to be provided to an LLM. The relevance of a publication was evaluated by the LLM on a Likert scale (low relevance to high relevance). By specifying a threshold, different classifiers for inclusion/exclusion of publications could then be defined. The approach was used with four different openly available LLMs on ten published data sets of biomedical literature reviews and on a newly human-created data set for a hypothetical new systematic literature review.</jats:p> </jats:sec><jats:sec> <jats:title>Results</jats:title> <jats:p>The performance of the classifiers varied depending on the LLM being used and on the data set analyzed. Regarding sensitivity/specificity, the classifiers yielded 94.48%/31.78% for the FlanT5 model, 97.58%/19.12% for the OpenHermes-NeuralChat model, 81.93%/75.19% for the Mixtral model and 97.58%/38.34% for the Platypus 2 model on the ten published data sets. The same classifiers yielded 100% sensitivity at a specificity of 12.58%, 4.54%, 62.47%, and 24.74% on the newly created data set. Changing the standard settings of the approach (minor adaption of instruction prompt and/or changing the range of the Likert scale from 1–5 to 1–10) had a considerable impact on the performance.</jats:p> </jats:sec><jats:sec> <jats:title>Conclusions</jats:title> <jats:p>LLMs can be used to evaluate the relevance of scientific publications to a certain review topic and classifiers based on such an approach show some promising results. To date, little is known about how well such systems would perform if used prospectively when conducting systematic literature reviews and what further implications this might have. However, it is likely that in the future researchers will increasingly use LLMs for evaluating and classifying scientific publications.</jats:p> </jats:sec>

Sacherschließung

lokal	Large language models
lokal	Title and abstract screening
lokal	Systematic Reviews as Topic [MeSH]
lokal	Language [MeSH]
lokal	Research
lokal	Natural Language Processing [MeSH]
lokal	Natural language processing
lokal	Systematic literature review
lokal	Biomedical Research [MeSH]
lokal	Biomedicine

Fächerklassifikation (DDC)

Medizin und Gesundheit |

Liste der Beteiligten

https://orcid.org/0000-0002-5374-8720|https://frl.publisso.de/adhoc/uri/WmluaywgSm9oYW5uZXM=|https://frl.publisso.de/adhoc/uri/UHV0b3JhLCBQYXVsIE1hcnRpbg==|https://frl.publisso.de/adhoc/uri/SGFzdGluZ3MsIEphbm5h|https://frl.publisso.de/adhoc/uri/Q2lob3JpYywgTmlrb2xh

Hinweis

DeepGreen-ID: 11a1d0798fc844cf8dd084ddd0349fec ; metadata provieded by: DeepGreen (https://www.oa-deepgreen.de/api/v1/), LIVIVO search scope life sciences (http://z3950.zbmed.de:6210/livivo), Crossref Unified Resource API (https://api.crossref.org/swagger-ui/index.html), to.science.api (https://frl.publisso.de/), ZDB JSON-API (beta) (https://zeitschriftendatenbank.de/api/), lobid - Dateninfrastruktur für Bibliotheken (https://lobid.org/resources/search)

Label

frl:6518495

Dateien

Title and abstract screening for literature reviews using large language models: an exploratory study in the biomedical domain

Objektart

article

Beschrieben durch

1000	@id	frl:6518495.rdf
1000	Erstellt am	2025-07-05T11:38:41.409+0200
1000	Erstellt von	322
1000	beschreibt	frl:6518495
1000	Zuletzt bearbeitet	2025-08-19T20:02:45.461+0200
1000	Objekt bearb.	Tue Aug 19 20:02:45 CEST 2025

Vgl.

frl:6518495

Oai Id

oai:frl.publisso.de:frl:6518495 |

Sichtbarkeit Metadaten

public

Sichtbarkeit Daten

public

Gegenstand von

https://repository.publisso.de/resource/frl:6518495/edit |

View source

Suche

Publisso-Menue

Title and abstract screening for literature reviews using large language models: an exploratory study in the biomedical domain

Dennstädt, Fabio

Zink, Johannes

Putora, Paul Martin

Hastings, Janna

Cihoric, Nikola

Suche

Publisso-Menue

Sie sind hier

Title and abstract screening for literature reviews using large language models: an exploratory study in the biomedical domain

Dennstädt, Fabio Zink, Johannes Putora, Paul Martin Hastings, Janna Cihoric, Nikola

Dennstädt, Fabio

Zink, Johannes

Putora, Paul Martin

Hastings, Janna

Cihoric, Nikola