IS Families/IS607 family

From TnPedia
Revision as of 21:29, 4 July 2020 by TnCentral (talk | contribs)
Jump to navigation Jump to search

General

The first member of what is now called the IS607 family was IS1535 which had been identified as one of a collection of IS in the genome of Mycobacterium tuberculosis H37Rv [1]. Indeed, this study identified six different but related IS elements, IS1535 to IS1539 and IS1602, which, at the time, were grouped into a new IS family called the IS1535 family.

IS607 itself (Fig. IS607.1) was first identified by subtractive hybridization of a collection of Helicobacter pylori strains from a number of geographic locations and characterized by the Berg lab [2]. Further tests showed that it was widespread and present about 20% of H. pylori strains worldwide. It is 2027bp long and had a similar organization to IS1535. A second member of this family, IS609 (ISHp609), is also present in many H. pylori strains [3].

Fig IS607.1. IS607 organization: IS607 is shown as a yellow box and the overlapping orfs as purple horizontal arrows. The DNA sequence of the left (LE) and right (RE) ends is shown below with the repeated sequences Fig IS607.1. IS607 organization: IS607 is shown as a yellow box and the overlapping orfs as purple horizontal arrows. The DNA sequence of the left (LE) and right (RE) ends is shown below with the repeated sequences indicated in red and the variations from the canonical sequence shown in black above. The histogram below shows the distribution of the lengths of the collection of IS607 family members in ISfinder.

Distribution

Apart from being widespread geographically in Helicobacter strains [2] and in Mycobacteria [1][4]. Full length IS609 was widely distributed in many Helicobacter strains isolated from Africa, the Americas, Europe and India but only 1% East Asian strains [3]. IS607 family members have been subsequently found in a wide range of bacterial species, including cyanobacteria [5], and archea [6] and related sequences have been found in eukaryotic genomes and viruses [7][8], probably primarily through horizontal DNA transfer events.

One major problem in identifying IS607 family members is that, like members of the IS200/IS605 family, there are IS607 family members which have lost either orfA or, like IS200 itself, orfB. In the absence of clear sequence signatures which define the IS ends or of empty sites, it is difficult to define such reduced IS607 derivatives.

Organization

Most full-length IS607 family members are between 1900 and 2150 bp long (Fig. IS607.1) and carry two overlapping orfs where the stop codon from the upstream orf overlaps the start codon of the downstream orf suggesting that expression of the downstream gene is translationally coupled to that of the upstream gene [1] (see [9]) although IS609 was reported to encode to additional small orfs upstream of tnpA [2][3].

It was also noted that the product of the upstream orf, TnpA, shared similarity with serine- site-specific recombinases (SR) while that of the downstream frame, TnpB, showed weak similarity with TnpB of other IS such as IS1136 and IS891 [1][10][11] which do not include an upstream frame similar to orfA and which, due to the characteristic potential secondary structures at their ends, have been placed in the IS200/IS605 family.

The terminal DNA sequences of family members are not related although they carry several short directly repeated sequences at each end which appear to be helically phased [12][13] and a may contain a longer imperfect inverted repeat sequence near the ends but at different distances [12][13] (Fig. IS607.1 and Fig. IS607.3). Fig. IS607.2 shows the collection of ends described in reference [12] in green and reference [13] in black bold, It was noted that many, but not all, IS607 family members in ISfinder carry a trinucleotide repeated at each end (shown in reference [12], Fig. IS607.2 ) which may be involved in the transposition reaction [12]. The absence of these in some examples may simply be due to an incorrect definition of the IS ends.


Fig. IS607.2. IS607 family ends: The sequences at the ends of several IS607 family members. IS607 and IS1904 are from Boocock and Rice 2013 and IS607, IS1926 and IS1535 are from Chen et al., 2018. The repeated sequences identified by Boocock and Rice are shown in Green and those identified by Chen et al., are in black. Sequences above the line are from Chen et al and also show the the variations from the canonical sequence motif. In addition, the region on the top and bottom strand of protection against Dnase digestion are shown as blue horizontal lines (Chen et al.,2018)


Fig. IS607.3. IS607 family TnpA alignment. All examples in the ISfinder database (June 2020) were aligned using Clustal Omega with default settings. The three domains are shown by horizontal brackets. These can be distinguished by trypsin digestion (Chen et al., 2018). Binding to LE DNA is abrogated by deletion of theN-terminal DNA binding domain; the long central catalytic domain includes the catalytic serine, indicated by a vertical arrow. The two amino acids which were substituted by Cysteine are shown as *. Oxidation of these mutant proteins which generates covalent dimers had no effect on their binding activity in PEC formation indicating that they do not undergo configuration changes when forming PEC. The C-terminal HTH domain must move to render the active site accessible to the DNA substrate. The amino acid residue which was substituted for cysteine is indicated by #. Oxidation eliminates PEC formation capacity.


Mechanism

IS607 was observed to transpose in Escherichia coli, and the sequence of the IS-target junctions showed that IS607 inserted into a GG dinucleotide (Fig. IS607.1; see also Fig. IS607.2). Insertion sometimes generated a 2 bp target repeat but in other cases, no target repeat was observed. Mutational studies demonstrated that TnpA was required for transposition [2] but, as in the case in members of the IS200/IS605 family TnpB is not. IS609 was found to insert in many chromosomal sites and, in contrast to IS607, to have a preference for insertion with its left end next to a trinucleotide, TAT [3].

As pointed out by Boocock and Rice [12], the IS607 transposase is unusual for a serine site-specific recombinase because most such enzymes require extensive sequence-specificity for all recombining partners [14][15]. As might be expected for a transposase whose function is to optimise the number of potential target sites, IS607 TnpA does not [3][16]. Integration does not require a large conserved target sequence.

Serine recombinases are generally of two types: small (smSR) which orchestrate highly regulated recombination reactions such as transposon Tn3 transposon family resolution or inversion of invertible DNA segments such as Hin and Gin [17][18][19], and large SR often involved with phage integration [20]; Both carry their DNA binding domains at the C-terminus rather than TnpA of IS607 family members with their DNA binding domain at the N-terminus.

The reaction pathway of well characterized smSR has been studied in exquisite detail. It involves a recombinase tetramer and both DNA recombination partners with multiple recombinase binding sites forming a precise architectural structure. However, the N-terminal DNA binding domain in IS607 family transposases would be expected to be incompatible with tetramer formation on the present structural models [12][13][18][19][21].

DNA at the recombining sites is cleaved by attack of the conserved catalytic serine nucleophile to generate a 3 ′ hydroxyl and a covalent 5’ phospho-serine protein-DNA intermediate. Following cleavage, the two recombinase subunits of the tetramer attached to the 5’ DNA ends (the cutting dimer[12]) are rotated 180° by the other two (the “rotating” dimer [12]) [22][23][24][25][26][27][28]. The “opposing” 3 ′ hydroxyl groups then attack the phosphoserine linkages of the rotated partners to reform a phosphodiester bond and complete the strand transfer reaction. The GG dinucleotide at the transposon termini (Fig. IS607.2, blue underlined) and the invariant GG at the insertion target sites of these IS, might represent the 2 bp “overlap” sequence observed at the recombination crossover site for other serine recombinases [12][13].

Like many IS, it is thought that IS607 transposases using a double strand closed circular intermediate as judged by the presence of a LE-RE junction identified by PCR in an E. coli transposition system (Grindley, personal communication and cited in ref [12]) in which the IS is excised from its donor site with both ends joined and the donor DNA is resealed [12]. This model is reasonable in view of the type of transposase involved but remains to be formally validated.

The Johnson laboratory [13] has investigated the transposases of three related IS607 family members: IS607 itself, IS1535, and ISC1926 from Sulfolobus islandicus, a hyperthermophilic archea [29]. Using a lambda hop assay [30] they confirmed that IS607 indeed inserts between the G residues in a GG dinucleotide target; that transposition requires an intact TnpA, mutation of the active site serine (vertical arrow in Fig. IS607.2) with glycine eliminated activity. While TnpA promoted transposition in this assay, a construct supplying TnpA and TnpB did not and it was concluded that TnpB inhibits transposition. This is similar to the observed effect of the related TnpB on transposition of the IS200/IS605 family member, ISDra2 [31].

Purified TnpA from all three IS, IS607 itself, IS1535, and ISC1926 was able to bind to the cognate IS ends in gel mobility shift assays (EMSA) [13]. More extensive studies, directed to TnpA of IS1535 (TnpAIS1535) due to its more robust binding activity, using EMSA, DNase footprinting and exonuclease digestion, indicated that: it binds cooperatively to multiple sites (the directly repeated sequences) in LE and can bind a second LE to form a paired end complex (PEC); TnpA “nucleation” occurs over four of the helically-phased 9 bp direct repeat LE motifs (Fig. IS607.2); LE motifs (a) through part of (d) are required for PEC formation, but efficiency is improved by including “non-specific” DNA both within LE and at the IS-host DNA junction; RE is a poor substrate for TnpA binding, possibly resulting from the lower number of repeat motifs; TnpA covered only the two repeated RE motifs and supported only a low level of RE-RE or RE-LE PEC formation; although protected from exonuclease digestion, no clear footprint could be detected on RE indicating a lower binding affinity [13]. Unpublished data from these authors citing binding studies with TnpAIS607 and TnpAIS1926 however, suggested that this is not a general property of all members of the IS607 family since neither exhibited such large differences in the capacity to bind LE and RE ends of their cognate elements.

Further functional analysis of TnpAIS1535 revealed that removal of the 50 N-terminal amino acids eliminated binding activity [13], confirming its role in TnpA function [12]. When well conserved residues in the catalytic domain (* in Fig. IS607.3) were substituted for cysteine, the capacity of the mutant proteins to generate PECs under oxidizing (cross linked) or reducing conditions was identical. This was interpreted as showing that there are no large-scale conformational changes in the catalytic domain on binding. However, a similar substitution in the C-terminal HTH domain (# in Fig. IS607.3) eliminated PEC formation when oxidized as did deletion of this region, suggesting that binding involves conformational changes in this domain.

Fig. IS607.4. X-ray Structure of IS607 family TnpA: The data are taken from Chen et al.,. They show a comparison of the X-ray crystal structure of TnpA dimers IS1535 and IS1926 TnpA without the N-terminal DNA binding domain compared to previously published structure of the resolvase of the Tn3 family transposon member, Tn1000 or gamma delta. The catalytic domain of the two IS607 family members is rotated with respect to that of resolvase. The IS607 active site serines are too far apart for cleavage of a single DNA molecule – leading to the suggestion that they each cleave different DNA molecules in the recombining partner DNA. They are masked by the C-terminal HTH preventing access to substrate DNA. Each IS607 TnpA monomer is shown in green or blue. Alpha helical and beta-ribbon structures are labeled.


Structural analysis also underlined the difference between IS607 family TnpA and other serine recombinases. Structures were solved for the combined catalytic and C-terminal HTH domains of TnpAIS1535 and of TnpAIS1926 [13] (Fig. IS607.3) and for TnpAISC1904 from Sulfolobus solfataricus P2 with similar results [12]. The structures were found to contain either one (TnpAIS1535) or two TnpAIS1926 dimers. The topology of the catalytic domains was identical to that of the catalytic core of the smSR catalytic domain. However, whereas the dimer interface of smSR occurs between the long helices at the C-terminal end of the catalytic domain, that of both TnpAIS1535 and of TnpAIS1926 is located between entirely different helices in the catalytic domain (Fig. IS607.4). It was proposed that this C-terminal HTH domain must be displaced to permit DNA to enter the active site. Moreover, since the DNA binding domain is located on the opposite side of the dimer to the active site, it seemed probable that cleavage occurs in trans (where a molecule bound to one recombining pair cleaves the other partner) [13] as suggested in the model proposed by Boocock and Rice [12], a characteristic of many transposable elements (see [32][33][34] for reviews). The unusual dimer structure originally noted for TnpAISC1904 led to an elegant detailed mechanistic model[12]. In the Boocock-Rice model, which addressed both integration and excision of the proposed circular IS intermediate, a TnpA tetramer forms a complex with both recombining DNA partners. For integration the tetramer assembles on the IS circle LE-RE junction (Fig. IS607.4) while in excision, it is proposed to assemble from dimers bound a each end during synapsis (Fig. IS607.5 and Fig. IS607.6) at it is proposed that binding of the dimer occurs to LE (or RE) in the circle junction using TnpA dimer (B) using its DNA binding domain. For catalysis to occur, the active site must be “demasked” (Fig. IS607.4 top). The TnpAISC1904 structure indicated that the C-terminal HTH domain must be able to move to render the catalytic center accessible to DNA[12] and this was confirmed by crosslinking experiments indicating that the C-terminal HTH domain must be able to move for PEC formation however, cross linking of the catalytic domain did not affect PEC formation. A second dimer is then proposed to bind to the RE (or LE) side of the junction leading to binding of the insertion site and its engagement in the catalytic site. The model proposes that the free DNA binding DNA of a single dimer is not capable of binding DNA non-specifically on its own but two such domains in the tetramer are proficient in non-specific DNA binding and the tetramer bound to the IS junction can thus accommodate the relatively non-specific target DNA.

To understand the mechanism of IS607 family transposition in detail (for example at which stage in the pathway the HTH configuration change occurs, the stoichiometry of the transpososome, how the different repeated sequence elements bind TnpA) will require further structural studies using DNA-protein cocrystals.

It should be noted that there is as yet no experimental evidence describing cleavage or strand transfer in vitro.


Fig. IS607.5. Model for TnpA binding to its DNA recombination partners for insertion: The TnpA dimer has been modified from Boocock and Rice 2013. The colors of the individual monomers in the dimer are as for the crystal structures in Fig. IS607.3. The cartoon of the dimer is shown at the top with the N-terminal, catalytic and C-terminal domains indicated. As in the crystal structure, the C-terminal domain is shown the active site (serine is indicated by a red circle. The Model, below, proposes that a single dimer can bind to the LE (or RE) of an IS circular intermediate junction (LE-RE). The dimer is shown with the HTH domain of monomer A after a conformational change to reveal the active site. The HTH of the second monomer, B, is then shown to undergo a change in configuration unmasking its active site. Acquisition of a second dimer binding to RE (or LE) creates an active tetramer with 2 active sites engaged on the LE-RE junction and two free DNA binding domains for engaging the target sequence (it is postulated that a single binding site on its own does not have The affinity to retain a non-specific target site while two, in concert, are capable). Once engaged both DNA partners can be cleaved. Each partner uses the active site of both monomers in a given dimer (trans cleavage) therefore circumventing the constraint that the two serine residues in each dimer are too far apart to cleave both strands of a single DNA molecule (cis cleavage) – they each cleave a single strand of each partner). Strand exchange then occurs to join RE and LE to host sequences. In this modification of the Boocock Rice model, no changes in catalytic site configuration are shown consistent with the biochemical cross-linking data.


Fig. IS607.6. Model for TnpA catalyzed excision: The model is adapted from Boocock and Rice 2013. It shows the assembly of an active tetramer from the separate LE and RE bound by single TnpA dimers. The symbols are the same as those for Fig. IS607.4.

Bibliography

  1. 1.0 1.1 1.2 1.3 Gordon SV, Heym B, Parkhill J, Barrell B, Cole ST . New insertion sequences and a novel repeated sequence in the genome of Mycobacterium tuberculosis H37Rv. - Microbiology (Reading): 1999 Apr, 145 ( Pt 4);881-892 [PubMed:10220167] [DOI] </nowiki>
  2. 2.0 2.1 2.2 2.3 </nowiki>
  3. 3.0 3.1 3.2 3.3 3.4 </nowiki>
  4. <pubmed>9634230</pubmed>
  5. <pubmed>21576885</pubmed>
  6. <pubmed>PMC1847376</pubmed>
  7. <pubmed>PMC3673617</pubmed>
  8. <pubmed>17109990</pubmed>
  9. <pubmed>2183416</pubmed>
  10. <pubmed>8386127</pubmed><br /></span> </li> <li id="cite_note-11"><span class="mw-cite-backlink">[[#cite_ref-11|↑]]</span> <span class="reference-text"><nowiki><pubmed>PMC210459</pubmed>
  11. 12.00 12.01 12.02 12.03 12.04 12.05 12.06 12.07 12.08 12.09 12.10 12.11 12.12 12.13 12.14 12.15 12.16 Boocock MR, Rice PA . A proposed mechanism for IS607-family serine transposases. - Mob DNA: 2013 Nov 6, 4(1);24 [PubMed:24195768] [DOI] </nowiki>
  12. 13.00 13.01 13.02 13.03 13.04 13.05 13.06 13.07 13.08 13.09 13.10
  13. <pubmed>16756503</pubmed>
  14. <pubmed>20298189</pubmed>
  15. <pubmed>PMC134827</pubmed>
  16. Grindley NDF. The Movement of Tn3-Like Elements: Transposition and Cointegrate Resolution. In: Craig NL, Lambowitz AM, Craigie R, Gellert M, editors. Mobile DNA II. American Society of Microbiology; 2002. p. 272–302.
  17. 18.0 18.1 </nowiki>
  18. 19.0 19.1 </nowiki>
  19. <pubmed>26350324</pubmed>
  20. <pubmed>26104451</pubmed>
  21. <pubmed>PMC397197</pubmed>
  22. <pubmed>1885004</pubmed>
  23. <pubmed>8757793</pubmed>
  24. <pubmed>PMC555234</ṕubmed></span> </li> <li id="cite_note-26"><span class="mw-cite-backlink">[[#cite_ref-26|↑]]</span> <span class="reference-text"><nowiki><pubmed>2548736</pubmed>
  25. <pubmed>2990045</pubmed>
  26. <pubmed>15994378</pubmed>
  27. <pubmed>15612937</pubmed>
  28. <pubmed>PMC1213899</pubmed>
  29. <pubmed>23461641</pubmed>
  30. <pubmed>PMC7292550</pubmed>
  31. <pubmed>PMC3107681</pubmed>
  32. <pubmed>26104718</pubmed>