Searching for RNA genes using base-composition statistics
AUTOR(ES)
Schattner, Peter
FONTE
Oxford University Press
RESUMO
The hypothesis that genomic regions rich in non-protein-coding RNAs (ncRNAs) can be identified using local variations in single-base and dinucleotide statistics has been investigated. (G+C)%, (G–C)% difference, (A–T)% difference and dinucleotide-frequency statistics were compared among seven classes of ncRNAs and three genomes. Significant variations were observed in (G+C)% and, in Methanococcus jannaschii, in the frequency of the dinucleotide ‘CG’. Screening programs based on these two base-composition statistics were developed. With (G+C)% screening alone, a 1% fraction of the M.jannaschii genome containing all 44 known transfer RNAs, ribosomal RNAs and signal recognition particle RNAs could be identified. When (G+C)% combined with CG dinucleotide-frequency screening was used, 43 of the 44 known M.jannaschii structural ncRNAs were again identified, while the number of presumably false hits overlapping a known or putative protein-coding gene was reduced from 15 to 6. In addition, 19 candidate ncRNAs were identified including one with significant homology to several known archaeal RNaseP RNAs.
ACESSO AO ARTIGO
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=113829Documentos Relacionados
- Statistical Analysis of the Base Composition of Genes Using Data on the Amino Acid Composition of Proteins
- Number of genes and base composition of mitochondrial tRNA from Saccharomyces cerevisiae.
- Using RNA interference to identify genes required for RNA interference
- Genetics of osteoporosis: searching for candidate genes for bone fragility
- Helical Lévy walks: Adjusting searching statistics to resource availability in microzooplankton