Academia.eduAcademia.edu
J Mol Evol (2005) 61:524–530 DOI: 10.1007/s00239-004-0315-1 tRNA Creation by Hairpin Duplication Jeremy Widmann,1 Massimo Di Giulio,2 Michael Yarus,3 Rob Knight1 1 2 3 Department of Chemistry and Biochemistry, University of Colorado, Boulder, CO 80309, USA International Institute of Genetics and Biophysics, CNR, Via G. Marconi 10, Naples 80125, Italy Department of Molecular, Cellular and Developmental Biology, University of Colorado, Boulder, CO 80309, USA Received: 4 November 2004 / Accepted: 3 May 2005 [Reviewing Editor: Dr. Niles Lehman] Abstract. Many studies have suggested that the modern cloverleaf structure of tRNA may have arisen through duplication of a primordial hairpin, but the timing of this duplication event has been unclear. Here we measure the level of sequence identity between the two halves of each of a large sample of tRNAs and compare this level to that of chimeric tRNAs constructed either within or between groups defined by phylogeny and/or specificity. We find that actual tRNAs have significantly more matches between the two halves than do random sequences that can form the tRNA structure, but there is no difference in the average level of matching between the two halves of an individual tRNA and the average level of matching between the two halves of the chimeric tRNAs in any of the sets we constructed. These results support the hypothesis that the modern tRNA cloverleaf arose from a single hairpin duplication prior to the divergence of modern tRNA specificities and the three domains of life. Key words: verleaf tRNA — Hairpin duplication — Clo- Introduction Many lines of evidence suggest that the two halves of tRNA may be evolutionarily distinct. For example, the ‘‘operational code’’ that links amino acids to Correspondence to: Rob Knight; email: rob@spot.colorado.edu tRNAs depends only on the acceptor stem for certain amino acids (Schimmel and Henderson 1994), and aminoacyl tRNA synthetases can even charge minihelices that resemble only one half of the tRNA molecule (Tamura and Schimmel 2001). These charged minihelix structures have been shown to function in peptide synthesis and may have been part of the primordial protein synthesis machinery (Dick and Shamel 1995). It has also been suggested that the top half of modern tRNAs has an ancient origin in replication and is recognized separately by RNaseP, the CCA-adding enzyme, telomerase, and aminoacyltRNA synthetases (Weiner and Maizels 1987; Maizels and Weiner 1994). The 3¢ half of modern tRNAs has been proposed to be older than the 5¢ half due to its base composition and repetitive sequence patterns (Eigen and Winkler-Oswatitsch 1981). More recently, it has been shown that the archaeon Nanoarchaeum equitans can create functional tRNAs from the 3¢ and 5¢ tRNA halves, which are encoded by different loci and trans-spliced to form the final product (Randau et. al. 2005). Similarities between nucleotides at comparable positions within the two halves of the tRNA molecule have often been taken as evidence that the modern cloverleaf structure arose through direct duplication of a hairpin (Jukes 1995; Di Giulio 1995 and references cited therein). If this duplication theory is correct, corresponding positions in the two halves of each modern tRNA molecule should match more than chance predicts (i.e., should have greater sequence identity). Additionally, the halves of different tRNA molecules should match to greater or lesser extents depending on how many tRNA-creating 525 duplication events occurred. Specifically, tRNA halves that came from the same duplication event should match each other better than tRNA halves that came from different duplication events, even if these halves are not found in the same modern tRNA molecule. There are several possibilities for the number and timing of these duplication events relative to the divergence of the different amino acid specificities, and of the three domains of life (eukaryotes, bacteria, and archaea). All modern tRNAs stemming from a particular duplication event should have about the same number of matches between the two tRNA halves, even if one half comes from one tRNA and the other half comes from another tRNA. This equivalent level of similarity stems from the fact that the duplication creates two identical halves, but in the absence of convergent evolution or recombination, all tRNAs are expected to diverge in sequence equally after the duplication event. Conversely, two tRNAs that do not stem from the same duplication event should have fewer matches between the first half of one tRNA and the second half of the second tRNA, because the duplication event would replicate changes that occurred in one of the original hairpins but not the other. We emphasize that the base pairs in the cloverleaf structure need not be the same as the base pairs in the original hairpins. For the cloverleaf structure to be more stable than those in the hairpin, either different base pairs must form in the cloverleaf or the two hairpins must not be identical and must originally be partially mismatched. For example, in the modern tRNA structure, the identities between hairpin halves identified by sequence alignment generally do not have the same base pairs as those in the cloverleaf, so that if A pairs with A¢ and B pairs with B¢ in the original hairpins, A need not pair with B¢ and A¢ with B in the cloverleaf (Di Giulio 1995). Although sequences that can support both sets of pairing constraints are relatively rare, they are expected to be found at the appreciable frequencies of 1 in 30 million random sequences (Nagaswamy and Fox 2003) and are thus easily accessible to evolution. We consider five distinct scenarios for the evolution of modern tRNAs through duplication and divergence (Fig. 1). First, the similarities between the halves of each molecule might be caused by modern selection for function in each tRNA, eliminating the need for any duplication events to explain the similarities. In this scenario, the two halves of a given tRNA should match each other, but would not be expected to match the corresponding halves of another arbitrarily chosen tRNA. Second, the cloverleaf might have arisen once, before either the domains or the amino acid specificities emerged, thus explaining the similarities with a single duplication event (Di Giulio 1992). In this scenario, matching between the two tRNA halves in modern sequences arises through common descent from an ancestral sequence in which the halves matched. Thus, the halves of any two tRNAs would match to about the same degree. Third, the cloverleaf might have arisen independently in each of the three domains by duplication and then diverged to produce different specificities, requiring three duplication events. In this scenario, the tRNAs in each domain arise from an independent duplication, so the halves of two tRNAs from within a single domain should match better than the halves of two tRNAs from different domains. This third scenario has been proposed as a way to explain incongruence in tRNA phylogenies (Di Giulio 1999). Fourth, each tRNA specificity might have arisen through an independent duplication before the three domains diverged, requiring at least 20 duplication events. In this scenario, the tRNAs in each specificity arise from an independent duplication, so the halves of two tRNAs from within a single specificity should match better than the halves of two tRNAs from different specificities. Finally, each specificity in each domain might have arisen from an independent duplication, requiring the largest number of independent duplication events. In this scenario, the tRNAs in each domain and specificity arise from an independent duplication, so the halves of two tRNAs from within a single domain and specificity should match better than the halves of two tRNAs from a different domain and specificity. Here, we test these scenarios by counting the number of matches between the two halves of chimeric tRNA molecules, where the first half and the second half may either be restricted to come from tRNAs in the same group (by domain, specificity, or both) or be unrestricted. We expected to be able to identify duplication events by finding group restrictions such that tRNA halves from random pairs of tRNAs within a single group have more matches on average than tRNA halves from random pairs of tRNAs from different groups. For example, if tRNAs evolved from separate hairpin duplications for each specificity, we would expect that chimeric tRNAs made from halves of tRNAs with the same specificity would have more matches than chimeric tRNAs made from halves of tRNAs with different specificities. Methods To establish that the hairpin duplication scenario was plausible, we first tested whether the similarity between the two halves of real tRNAs was in fact greater than that of random sequences that could fold into the canonical cloverleaf structure. We obtained 5950 sequences from the Sprinzl Genomic tRNA database (Sprinzl and Vassilenko 2003). Starting with previously published alignments and secondary structures of reconstructed ancestral tRNA 526 Fig. 1. Five scenarios for duplication of a hairpin to create the modern cloverleaf structure. a After an initial duplication, tRNAs diverged into specificities and then into domains. b After an initial duplication, tRNAs diverged into domains and then into specificities. c Hairpin pre-tRNAs diverged into domains, duplicated in each domain to form cloverleaves, and then diverged into specificities. d Hairpin pre-tRNAs diverged into specificities, duplicated in each specificity to form cloverleaves, and then diverged into domains. e Hairpin pre-tRNAs duplicated independently in each domain and specificity. See text for discussion. sequences (Di Giulio 1995), we were able to define the two halves of a tRNA molecule and the specific positions that should match between the two halves. We compared the distribution of matches for real tRNA sequences with that of chimeric tRNA sequences made of halves from within or between specific groups of tRNAs. sensus sequences was generated by inserting gaps at the same positions in the sequences as those in the published alignment. Figure 2 shows the consensus alignment and structure, using the consensus sequence of all the tRNAs in the Sprinzl database rather than those of the ancestral sequences as in previous work (Di Giulio 1995). tRNA Alignment Generating Random tRNAs The published alignment of reconstructed ancestral sequences was generated with the ALIGN program, which uses the Needleman– Wunsch global alignment algorithm. The alignment of the con- Random tRNA sequences were constructed by randomizing the nucleotide sequences of the real tRNA sequences in the Sprinzl 527 Fig. 2. Matches between the two halves of the modern cloverleaf structure, possibly produced by hairpin duplication. a Fusion of two hairpins to form the modern cloverleaf. Bases are numbered as in the Sprinzl database. The most frequent base is shown at each position. b Matches between the two halves of the consensus sequence from the Sprinzl database. c Matches between the two halves of the reconstructed ancestral tRNA sequence (Di Giulio 1995). database. For each tRNA sequence in the database, we made one list containing each unpaired base in that tRNA and a second list containing each base pair. Each list was shuffled to randomize the order. This shuffling used the Yates–Fisher algorithm and the Mersenne Twister random number generator as implemented in the Python 2.3 package. We reconstructed the sequence from these two lists so that the structure and base composition were the same as the original tRNA, although the sequence was randomized. same amino acid specificity, and all tRNAs with the same domain and same specificity. Within each group, we joined the first half of each tRNA to the second half of another, randomly chosen, tRNA. We then counted the number of matches between the first and the second halves of the new, chimeric tRNAs. We compared the distribution of matches from each group of these chimeric tRNAs to that of the actual tRNAs, as identified above. Results Comparing Matches for Real and Random tRNAs For each tRNA sequence in the Sprinzl database, we counted the number of times that the corresponding positions in the two halves of the tRNA (as defined in Fig. 2) matched. We repeated this procedure for the set of randomized tRNAs. Comparing Matches for Real and Chimeric tRNAs We organized the tRNA sequences in the Sprinzl database into (overlapping) groups as follows: all tRNAs regardless of domain and specificity, all tRNAs in the same domain, all tRNAs with Real tRNAs have significantly more matches between their two halves than do random tRNAs (Fig. 3) (t = 15.8, df = 11898, p = 4.67 · 10)55, paired twosample t-test). The distributions of matches between the two halves of chimeric tRNAs from any combination of domain and/or specificity are essentially identical to the distributions of matches between the halves of real tRNA sequences (Fig. 3). We also tested whether the specific positions within the tRNA that contributed most to matches 528 Fig. 3. Distribution of matches between tRNA halves in sequences generated by different models. Individual tRNAs and chimeric tRNAs made by randomly selecting halves from within a domain, a specificity, a domain and specificity, or any two tRNAs (thin lines, statistically indistinguishable from one another) have significantly more matches between the two halves than do random sequences that are generated to allow the base pairs in canonical tRNA structure (thick line). This graph shows the number of matches between the two halves (x axis) plotted against the number of tRNAs with that many matches (y axis). Fig. 4. Frequency (y axis) of matches (solid lines) and conservation of most frequent base (dashed lines) plotted against position within the tRNA sequence (x axis). Match frequencies show the fraction of the time that the two corresponding positions in the first half and the second half are identical. The thick line shows the distribution for randomly generated sequences, while the (very similar) thin lines show the distribution for the actual tRNAs and each of the chimeric sets of tRNAs. Conservation shows the proportion of the most frequent base at each position. The long dashes refer to the conservation in the first half of the sequence, while the short dashes refer to the second half of the sequence. The most frequent base at the position is printed below the position number at the bottom of the graph. were also highly conserved. The consistent trend, for all chimeric tRNAs, is that several highly conserved positions did contribute to matches (Fig. 4). However, there was no overall correlation between the amount of conservation at a position and the tendency of that position to match between tRNA halves (p > 0.05). For comparison, Fig. 5 shows the secondary structure of a tRNA molecule and the calculated percentage conservation of the base at each position based on the sequences from the Sprinzl database. Discussion and Conclusions As shown in Figs. 3 and 4, we found no significant difference in the number of matches between the two tRNA halves for any of the chimeric sets we constructed. This observation, combined with the highly significant excess of matches between each of the chimeric sets and the set of random sequences, supports an ancient, monophyletic tRNA origin that predates the divergence of specificities and domains. Consequently, the data support option a or b in 529 Fig. 5. Canonical cloverleaf structure of tRNA, highlighting conserved positions. Positions with greater than 90% conservation are shown in black, positions with conservation between 75% and 90% are shown in dark gray, and positions with between 50% and 75% conservation are shown in light gray. The most frequent base across all tRNAs is shown, the conventional position number is shown as a black subscript, and the degree of conservation is shown as a gray subscript. Note that positions that are shown as dashes indicate that the most frequent state at that position is for the base to be missing (i.e., a gap). Fig. 1 equally. Although incongruent tRNA phylogenies have provided much of the evidence for a nonmonophyletic tRNA origin in a duplication, it is often difficult to resolve trees based on sequences of fewer than 500 nucleotides (Nei et al. 1998). tRNA phylogenies can thus be difficult to interpret, since the region conserved across specificities is only 74 nucleotides in length. Switches in tRNA specificity have also been demonstrated by as little as a single nucleotide change (Yaniv et al. 1974; Saks et al. 1998), which might also lead to nonmonophyly of individual specificities even if all tRNAs descended from a single, common ancestor. Though our outcome is consistent with the idea that tRNAs arose from a single ancestral duplication event and less consistent with duplication on other schedules, some caution is warranted. It is also possible that these data are entirely the result of convergent selection for function. Additionally, it remains conceivable that differences that discriminate among groups have been lost during the vast span of time since the appearance of the modern tRNA repertoire. In this context, it is interesting to note the discrepancy between the largest number of matches between the halves of modern sequences and the number of matches between the halves of the inferred ancestral sequences in Di Giulio (1995). Modern tRNAs average about 9 positions that match between the two halves (Fig. 3), but the ancestral alignment shows 21 matching positions (Di Giulio 1995). It is possible that the ancestral reconstruction technique overcomes some of the loss of information through neutral mutation, although it is also possible that long-branch attraction effects in the parsimony analysis (Felsenstein 1978) lead to difficulties in assigning ancestral states. If a cloverleaf produced by duplication of an ancient hairpin evolved into modern tRNAs, our results (Figs. 4 and 5) suggest that some primordial similarities between the halves were captured by evolution, particularly in the modern conserved sequences of the D and TYC stems and loops. Acknowledgments. This work was supported by a seed grant from the W. M. Keck Foundation RNA Bioinformatics Initiative. We thank members of the Knight and Yarus labs for critical discussion of the manuscript. References Di Giulio M (1995) Was it an ancient gene codifying for a hairpin RNA that, by means of direct duplication, gave rise to the primitive tRNA molecule? J Theor Biol 177:95–101 Di Giulio M (1999) The non-monophyletic origin of the tRNA molecule. J Theor Biol 197:403–414 Dick T, Schamel W (1995) Molecular evolution of transfer RNA from two precursor hairpins: implications for the origin of protein synthesis. J Mol Evol 41:1–9 Eigen M, Winkler-Oswatitsch R (1981) Transfer-RNA, an early gene? Naturwissenschaften 68:282–292 Felsenstein J (1978) Cases in which parsimony and compatibility methods will be positively misleading. Syst Zool 27:401–410 Jukes TH (1995) A comparison of mitochondrial tRNAs in five vertebrates. J Mol Evol 40:537–540 530 Maizels N, Weiner A (1994) Phylogeny from function: Evidence from the molecular fossil record that tRNA originated in replication, not translation. Proc Natl Acad Sci USA 91:6729– 6734 Nagaswamy U, Fox GE (2003) RNA ligation and the origin of tRNA. Orig Life Evol Biosph 33(2):199–209 Nei M, Kumar S, Takahashi K (1998) The optimization principle in phylogenetic analysis tends to give incorrect topologies when the number of nucleotides or amino acids used is small. Proc Natl Acad Sci USA 95:12390–12397 Randau L, Münch R, Hohn M, Jahn D, Söll D (2005) Nanoarchaeum equitans creates functional tRNAs from separate genes for their 5¢- and 3¢-halves. Nature 433:537–541 Saks ME, Sampson JR, Abelson J (1998) Evolution of a transfer RNA gene through a point mutation in the anticodon. Science 279(5357):1665–1670 Schimmel P, Henderson B (1994) Possible role of aminoacyl-RNA complexes in noncoded peptide synthesis and origin of coded synthesis. Proc Natl Acad Sci USA 91(24):11283–11286 Sprinzl M, Vassilenko KS (2005) Compilation of tRNA sequences and sequences of tRNA genes. Nucleic Acids Res 1:33, D139– D140 Tamura K, Schimmel P (2001) Oligonucleotide-directed peptide synthesis in a ribosome- and ribozyme-free system. Proc Natl Acad Sci USA 98:1393–1397 Weiner A, Maizels N (1987) tRNA-like structures tag the 3¢ ends of genomic RNA molecules for replication: Implications for the origin of protein synthesis. Proc Natl Acad Sci USA 84:7383–7387 Yaniv M, Folk WR, Berg P, Soll L (1974) A single mutational modification of a tryptophan-specific transfer RNA permits aminoacylation by glutamine and translation of the codon UAG. J Mol Biol 86:245–260