Comparison of high-throughput sequencing data compression tools

Research output: Contribution to journalArticleResearch

Authors

  • Ibrahim Numanagić
  • James K. Bonfield
  • Faraz Hach
  • Jan Voges
  • Jörn Ostermann
  • Claudio Alberti
  • Marco Mattavelli
  • S. Cenk Sahinalp

Research Organisations

External Research Organisations

  • Simon Fraser University
  • Wellcome Trust Sanger Institute
  • Vancouver Prostate Centre
  • École polytechnique fédérale de Lausanne (EPFL)
View graph of relations

Details

Original languageEnglish
Pages (from-to)1005-1008
Number of pages4
JournalNature methods
Volume13
Publication statusPublished - 24 Oct 2016

Abstract

High-throughput sequencing (HTS) data are commonly stored as raw sequencing reads in FASTQ format or as reads mapped to a reference, in SAM format, both with large memory footprints. Worldwide growth of HTS data has prompted the development of compression methods that aim to significantly reduce HTS data size. Here we report on a benchmarking study of available compression methods on a comprehensive set of HTS data using an automated framework.

ASJC Scopus subject areas

Cite this

Comparison of high-throughput sequencing data compression tools. / Numanagić, Ibrahim; Bonfield, James K.; Hach, Faraz et al.
In: Nature methods, Vol. 13, 24.10.2016, p. 1005-1008.

Research output: Contribution to journalArticleResearch

Numanagić, I, Bonfield, JK, Hach, F, Voges, J, Ostermann, J, Alberti, C, Mattavelli, M & Sahinalp, SC 2016, 'Comparison of high-throughput sequencing data compression tools', Nature methods, vol. 13, pp. 1005-1008. https://doi.org/10.1038/nmeth.4037
Numanagić, I., Bonfield, J. K., Hach, F., Voges, J., Ostermann, J., Alberti, C., Mattavelli, M., & Sahinalp, S. C. (2016). Comparison of high-throughput sequencing data compression tools. Nature methods, 13, 1005-1008. https://doi.org/10.1038/nmeth.4037
Numanagić I, Bonfield JK, Hach F, Voges J, Ostermann J, Alberti C et al. Comparison of high-throughput sequencing data compression tools. Nature methods. 2016 Oct 24;13:1005-1008. doi: 10.1038/nmeth.4037
Numanagić, Ibrahim ; Bonfield, James K. ; Hach, Faraz et al. / Comparison of high-throughput sequencing data compression tools. In: Nature methods. 2016 ; Vol. 13. pp. 1005-1008.
Download
@article{2c6092d04bcf47c8b19fd8e5a14f3a0e,
title = "Comparison of high-throughput sequencing data compression tools",
abstract = "High-throughput sequencing (HTS) data are commonly stored as raw sequencing reads in FASTQ format or as reads mapped to a reference, in SAM format, both with large memory footprints. Worldwide growth of HTS data has prompted the development of compression methods that aim to significantly reduce HTS data size. Here we report on a benchmarking study of available compression methods on a comprehensive set of HTS data using an automated framework.",
author = "Ibrahim Numanagi{\'c} and Bonfield, {James K.} and Faraz Hach and Jan Voges and J{\"o}rn Ostermann and Claudio Alberti and Marco Mattavelli and Sahinalp, {S. Cenk}",
note = "Funding information: This research was supported by Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Frontiers program 'Cancer Genome Collaboratory' project (S.C.S., F.H., I.N.); the Vanier Canada Graduate Scholarships program (I.N.); National Institutes of Health (NIH) (R01GM108348 to S.C.S.); National Science Foundation (NSF) (1619081 to S.C.S.); Indiana University Grant Challenges Program Precision Health Initiative (S.C.S.); Wellcome Trust (098051 to J.K.B.); Leibniz Universit{\"a}t Hannover eNIFE grant (J.V. and J.O.); Swiss Platform for Advanced Scientific Computing (PASC) PoSeNoGap project (C.A. and M.M.). We would also like to thank the authors of evaluated compression tools for providing support for their tools and replying to our bug reports.",
year = "2016",
month = oct,
day = "24",
doi = "10.1038/nmeth.4037",
language = "English",
volume = "13",
pages = "1005--1008",
journal = "Nature methods",
issn = "1548-7091",
publisher = "Nature Publishing Group",

}

Download

TY - JOUR

T1 - Comparison of high-throughput sequencing data compression tools

AU - Numanagić, Ibrahim

AU - Bonfield, James K.

AU - Hach, Faraz

AU - Voges, Jan

AU - Ostermann, Jörn

AU - Alberti, Claudio

AU - Mattavelli, Marco

AU - Sahinalp, S. Cenk

N1 - Funding information: This research was supported by Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Frontiers program 'Cancer Genome Collaboratory' project (S.C.S., F.H., I.N.); the Vanier Canada Graduate Scholarships program (I.N.); National Institutes of Health (NIH) (R01GM108348 to S.C.S.); National Science Foundation (NSF) (1619081 to S.C.S.); Indiana University Grant Challenges Program Precision Health Initiative (S.C.S.); Wellcome Trust (098051 to J.K.B.); Leibniz Universität Hannover eNIFE grant (J.V. and J.O.); Swiss Platform for Advanced Scientific Computing (PASC) PoSeNoGap project (C.A. and M.M.). We would also like to thank the authors of evaluated compression tools for providing support for their tools and replying to our bug reports.

PY - 2016/10/24

Y1 - 2016/10/24

N2 - High-throughput sequencing (HTS) data are commonly stored as raw sequencing reads in FASTQ format or as reads mapped to a reference, in SAM format, both with large memory footprints. Worldwide growth of HTS data has prompted the development of compression methods that aim to significantly reduce HTS data size. Here we report on a benchmarking study of available compression methods on a comprehensive set of HTS data using an automated framework.

AB - High-throughput sequencing (HTS) data are commonly stored as raw sequencing reads in FASTQ format or as reads mapped to a reference, in SAM format, both with large memory footprints. Worldwide growth of HTS data has prompted the development of compression methods that aim to significantly reduce HTS data size. Here we report on a benchmarking study of available compression methods on a comprehensive set of HTS data using an automated framework.

UR - http://www.scopus.com/inward/record.url?scp=84992390491&partnerID=8YFLogxK

U2 - 10.1038/nmeth.4037

DO - 10.1038/nmeth.4037

M3 - Article

C2 - 27776113

AN - SCOPUS:84992390491

VL - 13

SP - 1005

EP - 1008

JO - Nature methods

JF - Nature methods

SN - 1548-7091

ER -

By the same author(s)