IS Families/IS3 family
Contents
- 1 Original Identification
- 2 Presence in Compound Transposons
- 3 Distribution
- 4 Organization
- 5 Formation of a strong transposase promoter
- 6 Regulation by Methylation?
- 7 Insertion specificity
- 8 Group II intron insertions
- 9 IS3 family subgroups
- 10 Family Exceptions
- 11 Mycoplasma and the non-universal genetic code
- 12 A clade with non-canonical IR
- 13 An additional subgroup
- 14 Mechanism
- 14.1 Transposition Proteins
- 14.2 OrfA
- 14.3 OrfB
- 14.4 OrfAB: a product of programmed ribosomal frameshifting (PRTF)
- 14.5 Artificial orfA-orfB fusion
- 14.6 Structural motifs
- 14.7 Co-translational DNA binding
- 14.8 Co-translational multimerisation
- 14.9 The IS911 transpososome
- 14.10 Excision synaptic complex SCA.
- 14.11 Insertion synaptic complex SCB
- 14.12 The Transposition Pathway
- 14.13 The Figure-eight form
- 14.14 The circular intermediate
- 14.15 Integration of the circular intermediate
- 14.16 Targeted Insertion
- 14.17 Mechanism in other family members
- 15 Bibliography
Original Identification
IS3 and another member of this family, IS2 were identified genetically as a DNA segments causing insertional inactivation of gal and lac operons and physically by electron microscopy[1] and in plasmid F as a segment called alpha-beta[2][3]. IS3 was subsequently wrongly identified as the insertion sequence flanking the tetracycline resistance transposon Tn10[4][5]. It has subsequently been found as a component of a large number of plasmids particularly in gram negative enterics.
Presence in Compound Transposons
Although IS3 family elements do participate in compound transposons (e.g. IS3411 flanking the Citrate Utilization to our knowledge there has been no systematic survey undertaken and very few IS3-associated compounds have been described to date. Several family members are part of compound transposons. These include: IS3411 flanking genes for citrate utilisation in transposon Tn3411[6][7][8], IS4521 which flanks a heat stable enterotoxin gene in enterotoxinogenic Escherichia coli and IS1706, which flanks genes of the Clp protease/chaperone family.
Distribution
This is one of the most coherent, largest, most abundant and widely distributed IS families (see [9]). Nearly 600 individual different members of this family have been identified in more than 267 bacterial species distributed over 145 genera. However, their true distribution is clearly significantly greater than this.
For example, IS911, (isolated from a Shigella dysenteriae phage lysogen by spontaneous insertion into the phage cI repressor gene[10]) is present in multiple copies in the original host strain and in type strains of other Shigella species. Two vestigial copies, both interrupted by a copy of IS30, were also detected in the chromosome of E. coli K12[11] and could form transposition intermediates when supplied with IS911 transposase[12]. Entire or truncated IS911 copies have also been identified in several E. coli virulence plasmids (e.g. [13]), in pathogenicity islands of uropathogenic E. coli (e.g. [14]), in various other clinical isolates of E. coli and in a large number of well-known and less well-known enterobacteria such as Escherichia fergusonii, Chronobacter, Dickeya, Erwinia, Klebsiella, Pantoea, Shimwellia, and Yersinia.
Most IS3 family members have been identified in bacteria although at least one example, ISMco1, has also been identified in the archaea Methanosaeta concilii[15]. Since this archaeon is widespread in nature[16], it is possible that this represents a case of recent horizontal transfer. The presence of 8 copies implies that ISMco1 is active in its archaeal host.
Organization
The family is quite homogenous in organization (Fig. IS3.1) in spite of its wide distribution in bacteria exhibiting a large range of G+C contents (from 70% in the Mycobacterial examples to 25% in those isolated from Mycoplasma) and of the presence of members in hosts such as Mycoplasma with a non-universal genetic code (e.g. IS1138) or in bacteria which use stop codon read-through by insertion of the unusual amino acid selenocysteine (e.g. ISDvu3 from Desulfovibrio vulgaris). In the case of both copies of IS1138, which participates in high frequency rearrangements of the Mycoplasma pulmonis chromosome, the Tpase orf carries 11 UGA codons which are decoded as tryptophan[17].
Members are between 1200 and 1550 bp with relatively well conserved inverted terminal repeats in the range of 20-40 bp. One exception previously attributed to this family, IS481, is 1045 bp long and has now been placed in a separate family; see IS481 family). They generate 3 or 4 bp DR on insertion.
The majority of IR terminate with 5'-TG-----CA-3' and present an internal block of G/C residues of variable length (Fig. IS3.2).
IS3-family members generally have two consecutive and partially overlapping reading frames, orfA and orfB, in relative translational reading phases 0 and -1, respectively (Fig. IS3.1 A) under control of a weak promoter, pIRL, partially located in IRL (Fig. IS3.1A and Fig. IS3.3C). The 5' end of orfB overlaps the 3' end of orfA and occurs in reading phase -1 relative to orfA (Fig. IS3.1). It had been demonstrated in the 1990s that several family members (IS150[18], IS3[19], IS911[20], and IS2[21]) express two major proteins (Fig. IS3.1B): OrfA, the product of the upstream frame,and the transposase, OrfAB, a “fusion” or “transframe” protein generated from orfA and orfB by programmed -1 ribosomal frameshifting (PRF) (see 1.33A Programmed translational frameshifting[22]. Many other members of this family are also organized in this way[23][24]. The frameshifting frequency varies from element to element. It is approximately 50% in the case of IS150[25] and only 15% for IS911[26].
Complex internal inverted repeat sequences (Fig. IS3.1C) (for IS911, located between co-ordinates 19 and 73) include the -35 and -10 hexamers of pIRL, the transcription start site and the ribosome binding site for OrfA. This is thought to play a role at the mRNA level in preventing excess transposase expression resulting from external transcription. The full secondary structure would be present in transcripts initiated outside the IS thus sequestering the translation initiation signals but only the 3’ part would be present if transcription initiates at pIRL. In this case the translation initiation signals would be exposed. Initial studies (Prère and Fayet pers communication) have shown that translation from the longer transcript is very low but that deletion of its 5’ end to “liberate” the ribosome binding site (Fig. IS3.1C) indeed results in a significant increase in translation. In the related IS2 element, a similar sequence appears to function as a DNA binding site for the OrfA protein which represses promoter activity but further studies are necessary to confirm this[27].
Formation of a strong transposase promoter
In common with many IS of other families (e.g. IS21[28], IS30[29], IS110[30][31]) the IS3 family IRR carry an outward-directed -35 promoter hexamer while IRL carries an inward-directed -10 promoter component (Fig. IS3.3B). These are assembled into a strong promoter, pJunc, which serves to express high levels of transposition proteins (Fig. IS3.3B; Fig. IS3.4) in one of its key transposition intermediates, an excised transposon circle (see Transposition Pathway). Transcription initiation from pJunc, like that from impinging transcription, would also produce an RNA which could sequester the translation initiation signals but in a shorter and less stable stem loop structure (Fig. IS3.3C).
Regulation by Methylation?
Several members carry GATC methylation sites within 50bp of their ends, which have been shown in one case, IS3, to modulate transposition activity[32], however, this is not a general characteristic of the family nor is it restricted to any particular subgroup.
Insertion specificity
There appears to be little sequence specificity for insertion of members of the family. IS2 exhibits a preference for a region of bacteriophage P1 but the basis of this preference is at present unknown[33]. Both IS911[34] and IS150 have been found next to sequences which resemble their IRs (see “Targeted Insertion”) and IS1397 is invariably located within intergenic repeated sequences in E. coli (bacterial interspersed mosaic elements or BIMEs[35].
Group II intron insertions
Finally, an element isolated from the ECOR collection of E. coli and closely related to IS3411 carries a group II intron[36]. The effect of this on regulation of transposition of this element has not been investigated.
IS3 family subgroups
The IS3 family is divided into five subgroups (TABLE Characteristics of IS families; Fig. 1.4.2). This is supported by deep branching in the alignment of the various OrfA and OrfB sequences[37] (Fig. IS3. 5). These are: the IS2 and IS407 subgroups (which appear closely related), and the IS3, IS51, and IS150 subgroups. Additional members of the family identified subsequently also tend to follow this pattern. One feature which lends biological credence to these subgroups is that they also clearly appear clustered (with some exceptions) in the results of the alignments with the upstream OrfA protein[38]. Moreover, there is some correlation between the members of each group and the number of base pairs of target DNA duplicated on insertion (DR): for those elements in the IS2 subgroup, insertion invariably leads to a 5 bp DR; for the IS407 subgroup a 4 bp DR is observed; while for the other groups a 3 bp DR is generated (TABLE Characteristics of IS families). In the latter cases some of the elements, e.g. IS911, have been shown to occasionally generate 4 bp repeats. This clustering is also exhibited to some extent in the nucleotide sequence of the terminal IRs (Fig. IS3.2) and is particularly marked in the IS2, IS51 and IS407 subgroups. It can also be observed in the primary sequence details of the putative leucine zipper[39].
Family Exceptions
Several family members exhibit an organisation which does not apparently conform to the generic IS3 member. In IS120, for example, the relationship between the reading phases of the upstream and downstream orfs appears to be +1 rather than -1 while in ISNg1 and ISYe1 the characteristic motifs of OrfB are distributed between reading phases. Other members, such as IS1076, IS1138, IS1221, and IS1141, exhibit only one long open reading frame. Although these may be true variants, it cannot at present be ruled out that the variations are simply be due to errors in sequence determination.
Mycoplasma and the non-universal genetic code
Family members from Mycoplasma merit special attention. Not only does the host use a non-universal genetic code in which the opal termination codon TGA directs insertion of tryptophan (see [40], but their genomes are among the smallest bacterial genomes known and extremely rich in A+T. To date, several different IS3 family members have been observed in Mycoplasma. Of these, only IS1138 (and IS1138b) has been demonstrated directly to undergo autonomous transposition[41]. All exhibit similarly high AT levels and this unusual base composition could lead to difficulties in sequence determination. It is remarkable that typical IS3 family characters have been maintained in such an "extreme" genetic environment. Nine individuals are closely related and form a group of iso-elements which have been called IS1221. As indicated above, one of these carries a single long reading frame (representing orfA + orfB) instead of two consecutive overlapping frames. The others each carry insertions or deletions which destroy either the equivalent of orfA, orfB, or both. Expression studies in E. coli indicate that a protein, equivalent to OrfAB, is indeed produced from the long open reading frame of IS1221. Interestingly, it appears that a second truncated protein, equivalent to OrfA, may be generated from the single orfAB frame by translational frameshifting, representing an "inverted" expression pattern to the majority of the family members[42]. Although this appears not to be a general rule for IS3 family members originating from Mycoplasma hosts, the presence of a similar single-frame arrangement in a second member, IS1138, indicates that it might not be rare. Because of the extremely high AT content of these elements, many potential frameshift windows of the A6G(/C) or A7 type are expected to occur. Only direct experiment will therefore be able to determine which, if any, of these sequences are used to generate the Tpase or, conversely, an OrfA-like protein.
A clade with non-canonical IR
A clade carrying non-canonical ends has recently been identified. These IS include 7 supplementary base pairs on each end flanking canonical IS3 ends: a conserved stretch of 5 C residues is located 5’ to the left IR and a less conserved motif (CGG) is located 3’ to the right end. When these additional bases are taken into account every member of this clade exhibits a 4 bp DR characteristic of the IS3 family (TABLE Characteristics of IS families) (Gourbeyre, pers. comm.). This conclusion is supported by the presence of multiple IS copies (e.g. ISPsy31) and also by identification of “empty sites”. This clearly requires further experimental investigation.
An additional subgroup
Recently, an additional subgroup has been proposed which includes ISPpy1[43]. However, all members belong to the IS150 subgroup and their Tpases are not separated by our standard multiple alignment and MCL analysis. Although they do exhibit some variation in the sequence of their terminal dinucleotides, similar variations are found for IS2 and members of other IS3 subgroups.
Mechanism
Transposition Proteins
Extensive alignment studies of the predicted OrfA and OrfB amino acid sequences between themselves and with those of other transposable elements[44][45][46][47][48] provided insights into structure/function relationships of the proteins (Fig. IS3.1B).
OrfA
OrfA is small. For IS911 it has a predicted molecular weight of 11.5 kDa. The predicted primary amino acid sequences of most IS3 family members exhibit a similarly placed HTH signature (see for example [49][50]) which initially suggested that they might provide sequence-specific binding to the terminal IRs of their particular IS[51] involved in sequence-specific binding of the transposase to the terminal IRs OrfAB which was subsequently confirmed experimentally[52]. They also carry a C-terminal leucine zipper (LZ) motif first identified in IS2, IS150 and IS3 and which appears to be conserved in the majority of known members[53] and is involved in protein multimerization[54][55][56][57].
OrfB
The OrfB products carry a DD(35)E catalytic motif and share additional identities with retroviral integrases and various other Tpases[58][59][60][61][62][63][64]. These include two amino acids located 4 and 7 residues downstream from the glutamate residue.
IS911 OrfB is 299 residues long with a predicted molecular weight of 34.6kD. Its TAA termination codon lies just within IRR and may be significant in regulation. The OrfB initiation codon is AUU and consequently initiation occurs only at low levels[65][66] and is modulated by the level of initiation factor IF3[67].
OrfB has been observed for: IS3[68] (Prère & Fayet, unpublished), IS150[69], IS911[70][71][72] and IS3411/IS629[73][74] but not for IS2[75]. It is generally present at quite low levels although for IS3 approximately equal amounts of OrfB and OrfAB appear to be produced[76]. The IS150 OrfB initiation codon is out of phase with the rest of the gene and expression of full length OrfB would require a -1 frameshift after initiation.
Sequence analysis suggests that OrfB may in fact be synthesized by about 34% of IS3 family members through translational coupling: the stop codon of orfA overlaps with a potential orfB start codon (e.g. AUGA or GUGA) in 134 out of 399 ISs analyzed[77].
It is possible that the OrfB protein itself plays no direct role in transposition chemistry but that it is simply its translation signals which are important. Their recognition by the ribosome could modulate programmed translational frameshifting required to generate a single transposase protein, OrfAB, from the two reading frames orfA and orfB (see 1.33A Programmed translational frameshifting).
The OrfB amino acid sequence shares significant similarities with retroviral integrases, an observation which contributed to defining the highly conserved amino acid triad DDE common to all IS3 family members and to many of this type of phophoryltransferase enzymes[78][79]. This constitutes part of the active site (for reviews see: [48,52,64–68]).
OrfB carries neither the HTH nor the LZ motif.
OrfAB: a product of programmed ribosomal frameshifting (PRTF)
OrfAB is assembled from orfA and orfB by a programmed –1 ribosomal frameshift occurring near the 3' end of orfA (1.33A Programmed translational frameshifting) first demonstrated for the related IS150[80].
The transframe protein combines the orfA HTH motif, an LZ motif and the orfB DD(35)E catalytic domain [81] (Fig. IS3.1B;).
OrfAB of IS911 (382 amino acids) shares its 86 N-terminal amino acids with OrfA (100 amino acids) and its 296 C-terminal amino acids with OrfB (299 amino acids).
Ribosome rephasing to generate OrfAB occurs on a group of "slippery” lysine codons with a frequency of about 15% (measured using systems driven by two different promoters; T7p10 and ptac). OrfA is therefore normally expressed at significantly higher levels than OrfAB. Frameshifting permits the combination of different functional protein domains (Fig. IS3.1C).
IS3-family frameshifting is similar to that used in some retroviruses to generate the pol-gag "polyprotein"[82] and in the dnaX gene of E. coli to synthesize γ the sub-unit of DNA polymerase III[83].
The relevant IS911 sequences involved in frameshifting are shown in (Fig. IS3.1C). Examples of frameshifting sequences from other members of the family are shown in Fig. IS3.6. The group of slippery lysine codons is A AAA AAG and is directly preceded by the AUU OrfB initiation codon. Since E. coli does not encode a tRNALys with a 3’UUC5’ anti-codon for AAG, both lysine codons are decoded by the same tRNALys with a 3’UUU5’ anticodon. Its pairing is weaker with a G at the wobble position[84] probably because modifications of U34 increase the rigidity of the anticodon[85]. The presence of an upstream RBS (GGAG sequence) and a downstream secondary structure (Y-shaped stem-loop) stimulates ribosome rephasing in the -1 direction. What drives frameshifting is probably the thermodynamically favorable re-pairing of the two tRNALys from codons AAA-AAG to codons AAA-AAA[86][87]. The stimulators likely have a mechanical effect bringing back in register the ribosome and the mRNA after tRNA slippage. Different groups of codons have been observed to allow rephasing of the ribosome[88] and, although the most common motif is A6G, different members of the IS3 family carry a variety of these (e.g. A3G for IS3; see Atkins & Gesteland, Recoding: expansion of decoding rules enriches gene expression, Springer 2010).
Two similarly located partially overlapping reading frames in IS3, IS150 and IS3411[89] also produce three proteins. The transposases, OrfAB, like that of IS911, are fusion products of the two orfs generated by a –1 translational frameshift.
For IS3, frameshifting is also stimulated by a presumed H-type pseudoknot structure similar to those generally involved in viral recoding[90]. In IS3411, -1 slippage on a U UUU motif requires a more convoluted form of pseudoknot structures formed by pairing of an apical loop and an internal loop belonging to two hairpins located 65 nucleotides apart on the mRNA[91]. Two similarly arranged orfs occur in IS2 and have been shown to encode OrfA and OrfAB equivalents only[92][93]. This organization is observed in most members of the IS3 family but, beside the cases mentioned above, frameshifting has been analyzed experimentally only in a few other, less well-characterized, elements (including IS51, IS222, IS600, IS1133, IS1222).
The frequency of frameshifting is quite variable from element to element: reported values are 15% for IS911, 50% for IS150, 6% for IS3 and 2% for IS3411[94]. These values may not reflect the in vivo situation since they were not established by direct measurement of the amount of the OrfA and OrfAB proteins synthesized from an intact IS, but after modification of expression signals of the IS genes or after cloning the frameshift signals in a reporter system[95][96][97].
The level of formation of a circular IS911 transposition intermediate IS911 carrying abutted left and right ends to generate an IRR-IRL junction (Transposition Pathway) measured by PCR indeed depends on frameshifting frequency in vivo[98]. IS911 copies from several clinical isolates contained variations in the frameshift region exhibited various reduced levels of frameshifting. When these were introduced into the model IS911 they resulted in comparable reductions in circle formation.
Frameshifting is likely modulated by the physiological state of the host cells and by the environment: for example frameshifting decreases when temperature is raised or when ribosome density on the mRNA is increased (O. Fayet, pers. Comm.).
Artificial orfA-orfB fusion
For experimental purposes, production of OrfAB without necessitating a translational frameshift is obtained by introduction of a single additional base pair within the frameshift region which artificially fuses the orfA and orfB frames and eliminates OrfA production[99]. It was initially difficult to construct this mutant in the context of an entire IS911 (i.e. with the two flanking IR) but more recently this has been accomplished using a longer artificial IS and resulted in an exceptionally high transposition frequency[100]. A similar mutant in IS3 results in a high frequency of adjacent deletions[101].
Structural motifs
Although no structural information is available from crystallography, the role of the HTH and LZ motifs have been probed in vivo and in vitro.
The conserved N-terminal helix-turn-helix (HTH) motif is related to the LysR family of bacterial transcription factors and has a highly conserved tryptophan residue similar to that of certain homeodomain protein HTH motifs. This domain is important in directing transposase to bind IS911 IR[102] and is present in most IS3 family members (Fig. IS3.7A). The N-terminal helices of the related IS2 transposase are also involved in IR binding[103].
Many members carry a putative leucine zipper located at the end of OrfA (sometimes extending into the OrfB region of the OrfAB protein) (see [104][105][106]. Studies with IS911 and IS2 indicate that this is a multimerization domain of the proteins[107][108][109]. The LZ motif of IS911 is composed of four heptameric units (Fig. IS3.1B) with a predicted coiled coil structure including a potential buried inter-subunit hydrogen bond across the dimer interface (Fig. IS3.7B), to maintain the zipper in a dimeric state, and correctly placed residues with opposite charges potentially able to form characteristic inter-subunit salt-bridges to stabilize the dimeric structure[110]. Leucine zipper motifs are found in most IS3 family members (Fig. IS3.7C).
OrfAB and OrfA form both homomultimers and mixed OrfAB-OrfA multimers[111][112].
Mutation of specific critical residues in the OrfAB LZ reduces the level of transposition intermediates in vivo and in vitro[113] are also involved in its multimerization.
Co-translational DNA binding
IS911 OrfAB has a strong cis preference in vivo[114]. It has about a 200 fold higher activity on the IS copy from which it is expressed (in cis) than in trans. This prevents activation of transposition of one IS copy by OrfAB expressed from a second copy in the same cell. The strength of the cis effect depends on the distance of the transposase gene from the IS ends. Also modification of the translational frameshifting pause signal has a strong influence on cis preference presumably by delaying translation and folding of the C-ter domain increasing the chance that the folded N-ter domain will recognize and bind its target IR.
In vitro analyses using ribosome display with a coupled E.coli-derived transcription-translation system coupled with size exclusion chromatography[115] demonstrated that an added IR bound nascent OrfAB derivatives while they are still attached to the ribosome. Ternary complexes containing mRNA, ribosome and a nascent peptide specifically bound added IR copies if only the N-ter 149 amino acids extended from the ribosome whereas a full-length Tpase exiting the ribosome did not.
Direct evidence of coupled translational binding (Fig. IS3.8) was obtained using a staged coupled transcription/translation reaction: nascent OrfAB bound the IR before its synthesis was complete but not after. Thus OrfAB can efficiently bind the IR only prior to its complete translation.
Co-translational multimerisation
An intriguing question arising directly from these results is how OrfAB multimerizes as is found in the transpososome to bind both ends of the IS. Stable formation of the important synaptic complex containing both IS ends and the transposase requires a dimeric OrfAB (see The IS911 transpososome,). It is therefore possible that dimerization is in some way directly associated with translation. Indeed, using luxA and luxB as a model system, it been shown that luxA/B subunit assembly initiates cotranslationally on nascent LuxB in vivo. Protein assembly appears to be directly coupled to translation and involves “spatially confined, actively chaperoned cotranslational subunit interactions”[116].
The IS911 transpososome
A crucial checkpoint in transposition is the assembly of the 'transpososome'. This step is a general prerequisite for initiating DNA cleavage and the subsequent chemical steps in transposition for most elements that use a DNA (rather than RNA) transposition intermediates. In this protein-DNA complex, both ends of the transposon are bridged by the transposase before it catalyzes the DNA strand cleavages and strand transfers necessary for transposon mobility[117][118][119]. The transpososome adopts very precise architectures to accomplish these steps, and undergoes defined changes throughout the transposition process.
The overall IS911 transposition pathway is a two-step process, involving replicative excision followed by insertion (Fig. IS3.9A and 9B). This implies consecutive assembly of two types of transpososome: one implicated in IS excision (synaptic complex A; SCA) and includes both IS ends while the other (synaptic complex B; SCB) involves the circle junction with its abutted IRs to ensure its integration into the target DNA.
Excision synaptic complex SCA.
Using a band shift assay and IR of different lengths (the so-called “long-short” experiment) it was shown that the truncated OrfAB[1-149] forms a complex with two IR copies, the paired-end complex (PEC)[120] equivalent to the SCA. An intact OrfAB[1-149] LZ is necessary for correct PEC/SCA formation[121][122]. At higher OrfAB[1-149] concentrations a probable single end complex (SEC) composed of one IR and OrfAB[1-149] appeared. Addition of OrfA disturbed both PEC/SCA and SEC and generated a fast migrating species whose composition remains to be determined but does not appear to contain OrfA itself[123].
DNaseI and Copper phenanthroline footprinting revealed that OrfAB[1-149] protects a sub-terminal (internal) IR region including two conserved sequence blocks in the left (IRL) and right (IRR) ends (Fig. IS3.1A). DNA binding assays in vitro and measurement of in vivo recombination activity of sequential IR deletion derivatives suggested a model in which the N-terminal region of OrfAB binds the conserved boxes in a sequence-specific manner and anchors the two IRs into the SCA. The external region of the inverted repeat was proposed to contact the C-terminal transposase domain carrying the catalytic site[124].
SCA is composed of a dimer of transposase bridging to two IR[125], as judged by the use of a tagged and untagged truncated transposase derivative, OrfAB1-149, and also of IR of different lengths. OrfAB1-149 assembles two IRR copies in a parallel orientation (Fig. 4A)[126] as studied at the single molecule level by Atomic Force Microscopy (AFM) using asymmetric IRR-carrying DNA fragments.
SCA assembly was also studied using a second single molecule approach: tethered particle motion (TPM) (Fig. IS3.10)[127] in which a DNA molecule is tethered to a glass support and its effective length is measured by observing the Brownian motion of a bead attached to its free end (Fig. IS3.10 left). OrfAB[1-149] binding to a single IR provoked a small shortening of the DNA, consistent with a DNA bend introduced by protein binding to the IR and was confirmed using EMSA. When two ends were present on the tethered DNA in their natural, inverted, configuration, OrfAB[149] not only provoked the short reduction in length but also generated species with greatly reduced effective length (Fig. IS3.10 middle and top right) consistent with DNA looping between the ends and thus SCA formation. SCA is very stable and kinetic analysis in real-time suggested that passage from the bound unlooped to the looped state could involve another unlooped species of intermediate length in which OrfAB[149] is bound to both IRs. DNA carrying directly repeated IR also gave rise to the looped species but the level of the intermediate species was significantly enhanced (Fig. IS3.10 middle and bottom right). Its accumulation could reflect a less favorable SCA formation with directly repeated IR copies than with inverted IR. This is compatible with a model in which OrfAB binds separately to and bends each IR and protein–protein interactions then lead to SCA formation (Fig. IS3.11A)[128]. Cleavage and strand transfer would then give rise to a species in which both IS ends are joined by a single strand bridge (or figure-eight on a circular plasmid (Fig. IS3.9C) (see The Transposition Pathway).
Insertion synaptic complex SCB
SCB has not been characterized in such a precise way as SCA. SCB is devoted to the insertion step of the transposition process. Two types of insertion, IR-targeted and non-targeted, have been observed (Fig. IS3.11b). It has been proposed that two different protein-DNA complexes are assembled during the two types of insertion reaction: SCBt and SCBnt (for targeted and non-targeted synaptic complex respectively[129]. Nothing is known about the stoichiometry and the geometry of these complexes but, based on protein and DNA requirements for protein-DNA complex formation, as judged by band shift, and for transposition products, as judged by in vitro and in vivo transposition assays, it has been proposed that SCBt is composed of a transposase dimer bridging a DNA molecule carrying an IR and a DNA molecule carrying an IRR-IRR junction (IS911 circle), the product of the replicative IS911 excision. This IR targeted insertion explains how the original isolate of IS911 might have occurred next to a sequence which strongly resembles an IR[130] and can also explain one ended insertion[131]. In this regard IRR shows somewhat higher affinity than IRL. Note that if one of the two IR carried by the circle is omitted, SCBt resembles SCA (Fig. IS3.11).
SCBnt is thought to differ from both SCA and SCBt and to include the second IS911 protein, OrfA. This protein, binds non-specifically to DNA and interacts with OrfAB[132][133], is proposed to direct an OrfAB-junction complex to a randomly chosen target-DNA to form SCBnt[134][135]. This is based on the observation that integration of the transposon circle intermediate is greatly stimulated by preincubation of OrfAB and OrfA in an in vitro reaction[136].
The Transposition Pathway
The IS3 family is one of an increasing number of IS families known to transpose using a double strand circular DNA intermediate. Closely related pathways have been demonstrated for IS1[137], IS2[138], IS3[139], and IS150[140]. This represents a major transposition pathway which has yet to be widely recognized. As shown in Fig. IS3.9,and the animation below, IS3 family transposition proceeds through a copy-out-paste-in process..
ADD IS911 COPY OUT - PASTE IN TRANSPOSITION MOVIE HERE.......
The Figure-eight form
The initial step is recognition of the IR by OrfAB (presumably during its translation) (IS911 movie) and assembly of SCA to correctly position the DNA ends and the transposase catalytic site for the subsequent chemical steps. Like all known DDE transposase-catalyzed reactions[141], IS911 transposition proceeds by cleavage of a single strand at the transposon end generating a 3’-OH. This then attacks a target phosphodiester bond in a strand transfer reaction. The particularity of this copy-out-paste-in mechanism is that initial cleavage occurs at only one transposon end, either left or right (Fig. IS3.9). This single liberated 3’-OH directs strand transfer to the same strand 3 bases 5’ to the other end of the element. This generates a molecule in which a single transposon strand is circularized to produce a single strand bridge generating a figure-eight structure on a circular plasmid donor molecule (Fig. IS3.12) which can be easily observed in vivo[142]. The IR are joined by the single-stranded bridge and separated by three bases derived from flanking DNA from either the left or right end. The three (or 4) bp direct repeats flanking the original insertion are not required for further transposition (as also shown for IS3[143]) and an IS911-based transposon engineered to have different flanks generates a mixed population of figure eight molecules with one or other flank sequence. Prevention of cleavage of one or other transposon end resulted in a homogenous population which carries the 3nt DNA flank associated with the mutant end confirming that the IRL can attack IRR and vice versa. The reaction can be viewed as a one-ended site-specific transposition event. These initial steps can be accomplished by OrfAB alone. However, it should be noted that in the presence of OrfA, no figure eight or IS circles could be detected by a simple gel assay in vivo although IS circles were found using a PCR approach[144]. This suggests that OrfA may play a role in negatively regulating initiation of transposition. A similar conclusion has been reached for OrfA of IS3[145]. Alternatively, OrfA may stimulate the disappearance of figure eight and IS circles (see below) since no effect of OrfA was observed on figure-eight formation in vitro. Together with the fact that OrfAB is normally produced at low levels from a weak promoter[146], initiation of transposition to form the figure eight intermediate may be stochastic.
The circular intermediate
Kinetic data[147][148] indicate that the figure-eight gives rise to the circular transposon form which can easily be detected in vivo [100] and in which the IR are abutted and separated by three base pairs of DNA flanking the original insertion (Fig. IS3.9 and Fig. IS3.12). As for figure-eight molecules, a transposon engineered to have different flanks generates a mixed population of transposon circles with one or the other 3bp flank located at the junction[149].
Studies in vivo using a labelling protocol and a temperature sensitive plasmid as transposon donor demonstrated that conversion from the figure-eight to the transposon circle occurs by semiconservative replication where the circular intermediate is “copied out” leaving a copy in the transposon donor molecule[150] (Fig. IS3.9). This is transposon-specific, requires OrfAB (presumably to generate the figure eight and generate a 3’-OH on the IS911 DNA flank) and does not depend on replication from the donor plasmid origin of replication[151].
Using donor plasmids where one or other IR was inactivated for cleavage would be expected to determine whether one or other of the 3’-OH is used in transposon replication. This was tested using the Tus/ter system[152][153][154][155] (which blocks passage of a replication fork in an orientation specific fashion) cloned into the transposon in either one or other orientation. In the presence of Tus protein, no transposon circles were observed if the orientation of the ter site was that expected to block replication from one or the other end[156].
At present, it is not known how OrfAB is removed and how this replication step is initiated or terminated to generate the final circles. It is possible that these processes involve host factors and mechanisms similar to those, which operate in replicative transposition of bacteriophage Mu (see [157][106])[158][159].
RecG helicase is implicated in targeted insertion. This process involves a target IS911 end and strand transfer occurs between one cleaved end of the IS circle and the target IS end to create an intermolecular single strand bridge rather than the intramolecular bridge of the figure-eight intermediate (Fig. IS3.13). Resolution of this structure implicates branch migration and replication from the donor plasmid[160]. This reinforces the idea that host proteins including components of the replication machinery are loaded onto figure eight intermediates.
Integration of the circular intermediate
The IR junction formed by IS circularization is very unstable in the presence of OrfAB and undergoes high levels of deletion and insertion in vivo[161] and in vitro[162]. Transposon circle insertion presumably requires further transposase synthesis.
A remarkable consequence of transposon circle formation is the assembly of a strong promoter, pjunc, from a –35 hexamer contributed by IRR and a –10 hexamer contributed by IRL (Fig. IS3.3B). The 3 (or more rarely 4) bp which separate IRL and IRR in the circle provide an ideal spacing between the –35 and –10 elements[163]. The junction promoter, pjunc, is 30-50 fold stronger than the indigenous promoter, pIRL[164] (Fig. IS3.4), and more than two fold stronger than lacUV5[165]. It is correctly placed to drive high levels of transposase synthesis and plays an active role in controlling IS911 transposition.
Inactivation of pjunc by mutagenesis strongly reduced IS911 transposition in vivo when transposase was expressed in its native configuration[166]. Moreover, the truncated OrfAB derivative, OrfAB1-149 , which specifically binds IRR and IRL, reduced in vivo promoter activity 10 fold in a mutated junction resistant to cleavage. Full length OrfAB, which binds the IR only weakly, and OrfA, which does not specifically bind the IR, had no effect[167]. Integration results in disassembly of pjunc providing a powerful feedback mechanism resulting in transient and controlled activation of integration only in the presence of the correct (circular) intermediate.
For the related IS2, this junction promoter is required for transposition[168].
Circle junction formation brings both transposon ends together in an inverted orientation. This active junction must then participate in a second type of synaptic complex which includes target DNA (Fig. IS3.9 and IS3.11B).
Two single strand cleavages, one at each abutted IR, would linearize the transposon circle permitting the two liberated 3'-OH groups to direct coordinated strand transfer (Fig. IS3.9 and IS3.11B). The final step requires OrfAB but is greatly stimulated by OrfA and is sensitive to the ratio of OrfAB/OrfA[169].
It is not known whether target capture occurs before or after cleavage of the circle junction although it has been observed that linear copies of IS911 are produced from transposon circles in vitro and in the presence of high OrfAB levels in vivo and a pre-cleaved linear transposon was a robust substrate for integration in vitro[170].
Based on kinetics and on the formation of the strong pjunc promoter, we favor a model in which the IS circles represent a reservoir of transposition intermediates and that linear forms are generated from the IS circles during the integration process.
This has also been proposed for IS3[171].
Targeted Insertion
As stated above, several IS including IS911 show a preference for integration next to sequences in the target similar to their IR. One way of understanding this is that the transposon circle is able to form a synaptic complex (SCBt; Fig. IS3.11B left) which is similar to SCA (Fig. IS3.11A) but which occurs “in trans” between an IR of the transposon circle and an IR in the target. In the case of IS911, this phenomenon occurs more frequently if OrfA is not present (Fig. IS3.13) and it was proposed that one role of OrfA is to promote dispersion of the IS[172][173].
This type of one-ended intermolecular recombination/integration has been analyzed in some detail[174][175][176].
IR-targeted insertion involves transfer of a single end of the junction to the target IR to generate a branched DNA structure. The single end transfer (SET) intermediate, but not the final insertion product, was detected in in vitro. This implies that SET intermediates must be processed by the bacterial host to obtain the final insertion products. Sequence analysis of in vitro and in vivo IR-targeted insertion products revealed high levels of DNA sequence conversion in which mutations from one IR were transferred to another. These sequence changes could not be explained by the classic transposition pathway but could be understood in terms of a mechanism in which SET generates a four-way Holliday-like junction which is then processed by host-mediated branch migration, resolution, repair and replication. This pathway resembles those described for processing other branched DNA structures such as stalled replication forks. A version of this model is shown in Fig. IS3.14. Subsequent studies showed that the RecG helicase is implicated in vivo, as might be expected for strand migration[177].
Mechanism in other family members
Several other members of this family have also been analysed in some detail. These include IS2, IS3, and IS150. All three have been shown to generate circles when supplied with high levels of the fused frame Tpase[178][179][180][181][182].
IS3 also generates adjacent deletions[183] but, unlike IS911, appears to undergo excision from the donor molecule as a linear form following a staggered double strand break at each end. These forms have a 3 base 5' overhang and may be an alternative type of transposition intermediate[184]. Such forms may be equivalent to the linear IS911 species derived from transposon circles. In addition, IS3-derivative transposons in which two abutted ends have been engineered undergo high levels of transposition[185].
Insertion of IS3 creates generally 3 and sometimes 4 bp direct target repeats. It is significant that plasmids in which the IRs are separated by 4 bp are more active than those separated by 8 bp. In these studies the authors were unable to engineer derivatives with two complete tandem IS3 elements. This may be the result of the formation of a strong hybrid promoter which, as described for IS911 and other ISs (see above), drives high levels of Tpase expression. This configuration of ends is equivalent to that found at the circle junction, and suggests that abutted ends of IS3 are also efficient substrates in transposition.
IS2 generates direct target duplications of 5 bp on insertion[186] although transposon circles generated with this element carry only a single base pair separating IRL and IRR[187].
While IS2 carries a conserved terminal 5'-CA-3' at its right end, the left end terminates with 5'-TG-3'. This atypical IRL does not act as a strand donor but uniquely as a target in the circularization reaction.
Functional studies indicate that the product of the upstream orfA may inhibit transposition[188]. It has been shown to bind specifically to IRL at a sequence which overlaps the -10 hexamer of the resident Tpase promoter and repress expression of OrfA.
It does not appear to bind IRR (note that in the original article the authors inverse the standard definition of IRL and IRR[189].
Several other elements also exhibit small inverted repeat sequences which flank the -10 hexamer of the putative resident Tpase promoter. IS2-derivative transposons in which two abutted ends have been engineered also undergo high levels of transposition[190][191] and, like IS911, the circle junction of IS2 also constitutes a strong promoter capable of driving Tpase expression. Several (but not all) IS3-family elements may also carry similarly located potential -35 and -10 sequences within their IRs.
Bibliography
- ↑ <pubmed>4567156</pubmed>
- ↑ <pubmed>1092667</pubmed>
- ↑ <pubmed>1092668</pubmed>
- ↑ <pubmed>1092669</pubmed>
- ↑ <pubmed>383689</pubmed>
- ↑ <pubmed>6277857</pubmed>
- ↑ <pubmed>2832386</pubmed>
- ↑ <pubmed>6094480</pubmed>
- ↑ <pubmed>26350305</pubmed>
- ↑ <pubmed>2163395</pubmed>
- ↑ <pubmed>9278503</pubmed>
- ↑ <pubmed>9302015</pubmed>
- ↑ <pubmed>10496929</pubmed>
- ↑ <pubmed>8751923</pubmed>
- ↑ <pubmed>17347521</pubmed>
- ↑ <pubmed>17320399</pubmed>
- ↑ <pubmed>8096321</pubmed>
- ↑ <pubmed>1653413</pubmed>
- ↑ <pubmed>8107082</pubmed>
- ↑ <pubmed>1660923</pubmed>
- ↑ <pubmed>9302014</pubmed>
- ↑ <pubmed>8384687</pubmed>
- ↑ <pubmed>21673094</pubmed>
- ↑ <pubmed>24875478</pubmed>
- ↑ <pubmed>1653413</pubmed>
- ↑ <pubmed>1660923</pubmed>
- ↑ <pubmed>8107136</pubmed>
- ↑ <pubmed>2540414</pubmed>
- ↑ <pubmed>3039299</pubmed>
- ↑ <pubmed>10438765</pubmed>
- ↑ <pubmed>11598022</pubmed>
- ↑ <pubmed>1645443</pubmed>
- ↑ <pubmed>3035338</pubmed>
- ↑ <pubmed>8106332</pubmed>
- ↑ <pubmed>9055066</pubmed>
- ↑ <pubmed>7994604</pubmed>
- ↑ <pubmed>9729608</pubmed>
- ↑ <pubmed>9729608</pubmed>
- ↑ <pubmed>10677279</pubmed>
- ↑ <pubmed>1579111</pubmed>
- ↑ <pubmed>8096321</pubmed>
- ↑ <pubmed>7476162</pubmed>
- ↑ <pubmed>23832000</pubmed>
- ↑ <pubmed>8302872</pubmed>
- ↑ <pubmed>1963920</pubmed>
- ↑ <pubmed>1647013</pubmed>
- ↑ <pubmed>1850126</pubmed>
- ↑ <pubmed>7934941</pubmed>
- ↑ <pubmed>2163395</pubmed>
- ↑ <pubmed>9435062</pubmed>
- ↑ <pubmed>2841644</pubmed>
- ↑ <pubmed>14981152</pubmed>
- ↑ <pubmed>9761671</pubmed>
- ↑ <pubmed>2163395</pubmed>
- ↑ <pubmed>1660923</pubmed>
- ↑ <pubmed>10677279</pubmed>
- ↑ <pubmed>9761671</pubmed>
- ↑ <pubmed>1660923</pubmed>
- ↑ <pubmed>8302872</pubmed>
- ↑ <pubmed>1963920</pubmed>
- ↑ <pubmed>1647013</pubmed>
- ↑ <pubmed>1850126</pubmed>
- ↑ <pubmed>7934941</pubmed>
- ↑ <pubmed>10547692</pubmed>
- ↑ <pubmed>1660923</pubmed>
- ↑ <pubmed>10064703</pubmed>
- ↑ <pubmed>21478364</pubmed>
- ↑ <pubmed>8107082</pubmed>
- ↑ <pubmed>1653413</pubmed>
- ↑ <pubmed>1660923</pubmed>
- ↑ <pubmed>10064703</pubmed>
- ↑ <pubmed>21478364</pubmed>
- ↑ <pubmed>18474594</pubmed>
- ↑ <pubmed>16731525</pubmed>
- ↑ <pubmed>8824609</pubmed>
- ↑ <pubmed>8107082</pubmed>
- ↑ <pubmed>21673094</pubmed>
- ↑ <pubmed>1963920</pubmed>
- ↑ <pubmed>1314954</pubmed>
- ↑ <pubmed>1653413</pubmed>
- ↑ <pubmed>9761671</pubmed>
- ↑ <pubmed>7636469</pubmed>
- ↑ <pubmed>1547945</pubmed>
- ↑ <pubmed>3860833</pubmed>
- ↑ <pubmed>11027137</pubmed>
- ↑ <pubmed>1547945</pubmed>
- ↑ <pubmed>12970189</pubmed>
- ↑ <pubmed>24875478</pubmed>
- ↑ <pubmed>18474594</pubmed>
- ↑ <pubmed>18621088</pubmed>
- ↑ <pubmed>18474594</pubmed>
- ↑ <pubmed>8107136</pubmed>
- ↑ <pubmed>8824609</pubmed>
- ↑ <pubmed>18474594</pubmed>
- ↑ <pubmed>1653413</pubmed>
- ↑ <pubmed>8107082</pubmed>
- ↑ <pubmed>1660923</pubmed>
- ↑ <pubmed>12586397</pubmed>
- ↑ <pubmed>1660923</pubmed>
- ↑ <pubmed>22195971</pubmed>
- ↑ <pubmed>8107082</pubmed>
- ↑ <pubmed>14981152</pubmed>
- ↑ <pubmed>14981152</pubmed>
- ↑ <pubmed>7476162</pubmed>
- ↑ <pubmed>8520113</pubmed>
- ↑ <pubmed>7496528</pubmed>
- ↑ <pubmed>10677279</pubmed>
- ↑ <pubmed>9761671</pubmed>
- ↑ <pubmed>9335268</pubmed>
- ↑ <pubmed>9761671</pubmed>
- ↑ <pubmed>10677279</pubmed>
- ↑ <pubmed>9761671</pubmed>
- ↑ <pubmed>9761671</pubmed> (Transposition Cycle) and reduced or prevented multimer (dimer) formation. OrfAB and OrfA share three of their four heptads (Fig. IS3.7B). The last of each differs in sequence due to the translational frameshift which occurs within the heptad in expression of OrfAB. This presumably results in different strengths of monomer-monomer interactions in the case of homo- and hetero-multimers and this may be involved in regulation of transposition. A poorly defined region, M, located between residues 109 and 135 (Fig. IS3.1B) and components in the catalytic domain of OrfAB<nowiki><ref> </nowiki>
- ↑ <pubmed>22195971</pubmed>
- ↑ <pubmed>22195971</pubmed>
- ↑ <pubmed>26405228</pubmed>
- ↑ <pubmed>21439812</pubmed>
- ↑ <pubmed>23217365</pubmed>
- ↑ <pubmed>16181782</pubmed>
- ↑ <pubmed>10677279</pubmed>
- ↑ <pubmed>10677279</pubmed>
- ↑ <pubmed>9761671</pubmed>
- ↑ <pubmed>10677279</pubmed>
- ↑ <pubmed>11352577</pubmed>
- ↑ <pubmed>20553579</pubmed>
- ↑ <pubmed>20553579</pubmed>
- ↑ <pubmed>15155821</pubmed>
- ↑ <pubmed>16923775</pubmed>
- ↑ <pubmed>17367389</pubmed>
- ↑ <pubmed>2163395</pubmed>
- ↑ <pubmed>8106332</pubmed>
- ↑ <pubmed>10677279</pubmed>
- ↑ <pubmed>9761671</pubmed>
- ↑ <pubmed>17367389</pubmed>
- ↑ <pubmed>18586933</pubmed>
- ↑ <pubmed>9463394</pubmed>
- ↑ <pubmed>7489730</pubmed>
- ↑ <pubmed>9302014</pubmed>
- ↑ <pubmed>15493331</pubmed>
- ↑ <pubmed>12374815</pubmed>
- ↑ <pubmed>26104718</pubmed>
- ↑ <pubmed>7590258</pubmed>
- ↑ <pubmed>10556026</pubmed>
- ↑ <pubmed>12586397</pubmed>
- ↑ <pubmed>9413996</pubmed>
- ↑ <pubmed>1660923</pubmed>
- ↑ <pubmed>22195971</pubmed>
- ↑ <pubmed>7590258</pubmed>
- ↑ <pubmed>1334464</pubmed>
- ↑ <pubmed>15359283</pubmed>
- ↑ <pubmed>15359283</pubmed>
- ↑ <pubmed>8021197</pubmed>
- ↑ <pubmed>2181438</pubmed>
- ↑ <pubmed>2510933</pubmed>
- ↑ <pubmed>16148308</pubmed>
- ↑ <pubmed>15359283</pubmed>
- ↑ <pubmed>26104374</pubmed>
- ↑ <pubmed>12770828</pubmed>
- ↑ <pubmed>11459960</pubmed>
- ↑ <pubmed>15306008</pubmed>
- ↑ <pubmed>9214651</pubmed>
- ↑ <pubmed>9463394</pubmed>
- ↑ <pubmed>9214651</pubmed>
- ↑ <pubmed>9214651</pubmed>
- ↑ <pubmed>11598022</pubmed>
- ↑ <pubmed>11598022</pubmed>
- ↑ <pubmed>11598022</pubmed>
- ↑ <pubmed>14729714</pubmed>
- ↑ <pubmed>9463394</pubmed>
- ↑ <pubmed>10320583</pubmed>
- ↑ <pubmed>10556026</pubmed>
- ↑ <pubmed>17367389</pubmed>
- ↑ <pubmed>12145217</pubmed>
- ↑ <pubmed>15306008</pubmed>
- ↑ <pubmed>12145217</pubmed>
- ↑ <pubmed>14756780</pubmed>
- ↑ <pubmed>15306008</pubmed>
- ↑ <pubmed>8107082</pubmed>
- ↑ <pubmed>9302014</pubmed>
- ↑ <pubmed>12374815</pubmed>
- ↑ <pubmed>10556026</pubmed>
- ↑ <pubmed>8550559</pubmed>
- ↑ <pubmed>8107082</pubmed>
- ↑ <pubmed>8550559</pubmed>
- ↑ <pubmed>1645443</pubmed>
- ↑ <pubmed>375194</pubmed>
- ↑ <pubmed>9302014</pubmed>
- ↑ <pubmed>8107136</pubmed>
- ↑ <pubmed>8107136</pubmed>
- ↑ <pubmed>9302014</pubmed>
- ↑ <pubmed>8676870</pubmed>