Perseus:uma nova técnica para tratar árvores de sufixo persistentes / Perseus: a novel technique to handle persistent suffix trees
AUTOR(ES)
Caio Cesar Mori Carelo
DATA DE PUBLICAÇÃO
2009
RESUMO
Due to the technological advances in molecular biology laboratories, biological databases are extremely voluminous and tend to become more voluminous as data on new genome organisms are available. This introduces the challenge of searching nucleotide sequences efficiently. The suffix tree is an access method used for several applications that search for these data. However, the cost of building suffix trees is high, since they are extremely large data structures and they should fit in the main memory to be constructed in linear time. In this masters thesis, we propose the Perseus, a novel technique that handles persistent suffix trees. The Perseus introduces the following distinctive good properties. It is based on an approach that constructs persistent suffix trees whose sizes may exceed the main memory capacity. Furthermore, it provides an algorithm that allows for users to indicate which substrings of the input string should be indexed, according to the requirements of their applications. Moreover, it proposes an extended exact matching algorithm that searches for a query string into suffix trees that may be partitioned. The Perseus was validated through performance tests using genomes of several organisms of different sizes. The results were compared with the Trellis+ technique, which represents the state-of-the-art in this field. The tests showed that the Perseus reduced the time spent on constructing suffix trees by 24%. The Perseus also constructed compacter suffix trees, providing an average reduction in the secondary memory storage of 27%. Furthermore, the Perseus reduced the time spent on query processing of nucleotide sequences by up to 49%. As for the functionality of indexing substrings according to the users requirements, the Perseus greatly improved the query performance in comparison to the Trellis+. The results showed that the Perseus reduced the time spent on constructing suffix trees by 97% on average and the time spent on query processing of genes by 93% on average
ASSUNTO(S)
bioinformática nucleotides sequence bioinformatics Árvore de sufixo suffix tree seqüencia de nucleotídeos
Documentos Relacionados
- UMA NOVA TÉCNICA DE PREDIÇÃO
- A multimodal and multiobjective approach for phylogenetic trees reconstruction
- A new technique for interrogating fiber optic sensors with Bragg gratings
- Nucleofragmentação horizontal: uma nova técnica para a cirurgia de catarata
- Agricultura familiar e capacitação tecnica : perspectiva para uma nova geração