Download
s13015-017-0117-9.pdf 1,64MB
WeightNameValue
1000 Titel
  • Generalized enhanced suffix array construction in external memory
1000 Autor/in
  1. Louza, Felipe |
  2. Telles, Guilherme |
  3. Hoffmann, Steve |
  4. Ciferri, Christina D. A. |
1000 Erscheinungsjahr 2017
1000 LeibnizOpen
1000 Art der Datei
1000 Publikationstyp
  1. Artikel |
1000 Online veröffentlicht
  • 2017-12-07
1000 Erschienen in
1000 Quellenangabe
  • 12:26
1000 FRL-Sammlung
1000 Copyrightjahr
  • 2017
1000 Lizenz
1000 Verlagsversion
  • https://doi.org/10.1186/s13015-017-0117-9 |
  • https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/29234460/ |
1000 Publikationsstatus
1000 Begutachtungsstatus
1000 Sprache der Publikation
1000 Abstract/Summary
  • BACKGROUND: Suffix arrays, augmented by additional data structures, allow solving efficiently many string processing problems. The external memory construction of the generalized suffix array for a string collection is a fundamental task when the size of the input collection or the data structure exceeds the available internal memory. RESULTS: In this article we present and analyze eGSA [introduced in CPM (External memory generalized suffix and LCP arrays construction. In: Proceedings of CPM. pp 201–10, 2013)], the first external memory algorithm to construct generalized suffix arrays augmented with the longest common prefix array for a string collection. Our algorithm relies on a combination of buffers, induced sorting and a heap to avoid direct string comparisons. We performed experiments that covered different aspects of our algorithm, including running time, efficiency, external memory access, internal phases and the influence of different optimization strategies. On real datasets of size up to 24 GB and using 2 GB of internal memory, eGSA showed a competitive performance when compared to eSAIS and SAscan, which are efficient algorithms for a single string according to the related literature. We also show the effect of disk caching managed by the operating system on our algorithm. CONCLUSIONS: The proposed algorithm was validated through performance tests using real datasets from different domains, in various combinations, and showed a competitive performance. Our algorithm can also construct the generalized Burrows-Wheeler transform of a string collection with no additional cost except by the output time.
1000 Sacherschließung
lokal Burrows–Wheeler transform
lokal String collections
lokal External memory algorithms
lokal LCP array
lokal Suffix array
1000 Fachgruppe
  1. Medizin |
  2. Biologie |
1000 Fächerklassifikation (DDC)
1000 Liste der Beteiligten
  1. http://orcid.org/0000-0003-2931-1470|http://orcid.org/0000-0003-2608-4807|https://frl.publisso.de/adhoc/creator/SG9mZm1hbm4sIFN0ZXZl|https://frl.publisso.de/adhoc/creator/Q2lmZXJyaSwgQ2hyaXN0aW5hIEQuIEEu
1000 Förderer
  1. São Paulo Research Foundation (FAPESP)
  2. Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
  3. Leipzig Research Center for Civilization Diseases (LIFE), Leipzig University
  4. European Union
  5. Free State of Saxony
  6. Coordenação de Aperfeicoamento de Pessoal de Nível Superior (CAPES)
  7. Financiadora de inovação e pesquisa (FINEP)
1000 Fördernummer
  1. 2011/15423-9; 2017/09105-0; 2011/23904-7
  2. 2011/23904-7
  3. -
  4. -
  5. -
  6. 2011/23904-7
1000 Förderprogramm
  1. -
  2. -
  3. -
  4. European Regional Development Fund (ERDF); European Social Fund (ESF)
  5. Excellence initiative
  6. -
  7. -
1000 Dateien
  1. Generalized enhanced suffix array construction in external memory
1000 Objektart article
1000 Beschrieben durch
1000 @id frl:6410897.rdf
1000 Erstellt am 2018-11-05T11:15:21.593+0100
1000 Erstellt von 285
1000 beschreibt frl:6410897
1000 Bearbeitet von 218
1000 Zuletzt bearbeitet 2018-12-18T12:31:14.651+0100
1000 Objekt bearb. Tue Dec 18 12:31:14 CET 2018
1000 Vgl. frl:6410897
1000 Oai Id
  1. oai:frl.publisso.de:frl:6410897 |
1000 Sichtbarkeit Metadaten public
1000 Sichtbarkeit Daten public
1000 Gegenstand von

View source