Cover Image

Complexity analysis of algorithms: A case study on bioinformatics tools

Adonney Allan de Oliveira Veras

Abstract


The data volume produced by the omic sciences nowadays was driven by the adoption of new generation sequencing platforms, popularly called NGS (Next Generation Sequencing). Among the analysis performed with this data, we can mention: mapping, genome assembly, genome annotation, pangenomic analysis, quality control, redundancy removal, among others. When it comes to redundancy removal analysis, it is worth noting the existence of several tools that perform this task, with proven accuracy through their scientific publications, but they lack criteria related to algorithmic complexity. Thus, this work aims to perform an algorithmic complexity analysis in computational tools for removing redundancy of raw reads from the DNA sequencing process, through empirical analysis. The analysis was performed with sixteen raw reads datasets. The datasets were processed with the following tools: MarDRe, NGSReadsTreatment, ParDRe, FastUniq, and BioSeqZip, and analyzed using the R statistical platform, through the GuessCompx package. The results demonstrate that the BioSeqZip and ParDRe tools present less complexity in this analysis

Keywords


Time complexity; Computational tools; Empirical analysis.

Full Text:

PDF

References


Agenis-Nevers, M., N. D. Bokde, Z. M. Yaseen and M. K. Shende, 2021. An empirical estimation for time and memory algorithm complexities: Newly developed r package. Multimedia tools applications, 80(2): 2997-3015.

Chain, P., D. Grafham, R. Fulton, M. Fitzgerald, J. Hostetler, D. Muzny, J. Ali, B. Birren, D. Bruce and C. Buhay, 2009. Genome project standards in a new era of sequencing. Science, 326(5950): 236-237.

Cormin, T., C. Leiserson and R. Rivest, 1992. Introduction to algorithms mit press: Cambridge. MA.

Expósito, R. R., J. Veiga, J. González-Domínguez and J. Touriño, 2017. Mardre: Efficient mapreduce-based removal of duplicate DNA reads in the cloud. Bioinformatics, 33(17): 2762-2764.

Gaia, A. S. C., P. H. C. G. de Sá, M. S. de Oliveira and A. A. de Oliveira Veras, 2019. Ngsreadstreatment–a cuckoo filter-based tool for removing duplicate reads in ngs data. Scientific reports, 9(1): 1-6.

Goodrich, M. T., R. Tamassia and M. H. Goldwasser, 2014. Data structures and algorithms in java. John Wiley & Sons.

Kremer, F. S., A. J. A. McBride and L. d. S. Pinto, 2017. Approaches for in silico finishing of microbial genome sequences. Genetics molecular biology, 40: 553-576.

Levitin, A., 2012. Introduction to the design & analysis of algorithms 3rd. Villanova university, 1(1): p18.

Urgese, G., E. Parisi, O. Scicolone, S. Di Cataldo and E. Ficarra, 2020. Bioseqzip: A collapser of ngs redundant reads for the optimization of sequence analysis. Bioinformatics, 36(9): 2705-2711.

Xu, H., X. Luo, J. Qian, X. Pang, J. Song, G. Qian, J. Chen and S. Chen, 2012. Fastuniq: A fast de novo duplicates removal tool for paired short reads. PloS one, 7(12): e52249.




DOI: https://doi.org/10.33865/wjb.006.03.0445

Refbacks

  • There are currently no refbacks.




Copyright (c) 2021 Adonney Allan de Oliveira Veras

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Print ISSN: 2522-6746 : Online ISSN: 2522-6754
1. How to register 
2. How to reset password
2. How to prepare a manuscript before submission 
3. How to submit a paper 
4. How to check the review status of a paper
5. How to check the plagirisim or similarity report