General Information/IS Identification

From TnPedia
Jump to navigation Jump to search

What’s in a name?

It is clearly important to construct a suitable naming system for biological objects such as transposable elements since it should provide a framework for understanding what they are (e.g. IS – Insertion Sequence; Tn – Transposon; In - integron), where they were initially identified (e.g. ISEcEscherichia coli; ISSflShigella flexneri) and of the particular type (ISEc1, 2, 3, 4) which allows them to be identified when in multiple copies in a large population of different TE in a particular host, strain, species, or genera. Grouping them into families, even though these may change with time and further knowledge (e.g. IS4 [1], also provides information on their relationship with one another. A robust classification system is important since specific properties of one IS in a family are inferred to be shared by other closely related IS in the same family facilitating understanding transposition mechanisms and the way in which closely related IS from different genomes might impact their host genomes. In cases where, for technical reasons, certain experiments cannot be undertaken with one member of a family, its behavior in shaping its host chromosome or plasmid can often be inferred from knowledge of that of other members of the family.

TE nomenclature rules defined in 1979

A committee assembled during the meeting on DNA Insertions at Cold Spring Harbor in 1976 proposed a set of rules to be used for the nomenclature of Transposable Elements [2]. The first attempt to create an concise nomenclature system for Transposable Elements started at Stanford University by Campbell and colleagues in 1979 [3]. They classified the prokaryotes elements as simple IS elements to more complex Tn transposons, and self-replicating episomes. In addition, definitions and nomenclature rules for these three classes of prokaryotic TEs were specified [3] (Table 1). ISs and TEs were named separately by having IS and Tn as a prefix, respectively, followed by a sequential number in italics such as IS1, IS2 and Tn1, Tn2, etc [4].

Table 1. TE nomenclature rules defined by Campbell et al., 1979
Classes Definition Naming rule
IS elements IS elements contain no known genes unrelated to insertion functions and are shorter than 2kb. IS1, IS2, IS3, IS4, IS5, etc, with the numbers italicized
Tn elements Tn are more complex, often containing two copies of an IS element. They are generally larger than 2kb, and contain additional genes unrelated to insertion functions. Tn1, Tn2, Tn3, etc, with the numbers italicized
Episomes Episomes are complex, self-replicating elements, often containing IS and Tn elements. Examples include the phage λ and plasmid F of E. coli. -

Revised Nomenclature for Insertion Sequences in 2000

In the late 1990s, several different nomenclature systems were being used. That centralized by Dr Esther Lederberg (Stanford University Medical School, CA, USA [5]) allocated simple numbers to IS and Tn (e.g. IS1, 2, 3, 4....Tn1, 2, 3 etc.) and lists for the registry of these allocations were subsequently published [6]. These allocations stopped with the retirement of Dr Lederberg and gradually a variety of ways of naming TE appeared in the literature [5]. This included IS names which indicated the bacterial source (e.g. ISRm1 for Rhizhobium meliloti) and other, less clear methods (e.g. RSalpha-9). ISfinder finally assumed the role of administrating the nomenclature and began to systematically use a similar system [7] to that used for the numerous restriction modification/ (see [2]) enzymes. A list of names proposed for be used for different bacterial and archaeal genera and species and the attributions are included in ISfinder.

Revised Nomenclature for Transposable Genetic Elements proposed in 2008

Over this period, new types of transposable elements, such as the mobilizable and conjugative transposons, were being discovered [8][9][10]. Additionally, interactions between different elements including transposition and/or recombination events led to novel chimeric transposons. These exacerbated the nomenclature problem [5]. Subsequent nomenclature systems have become complicated, with different systems being adopted for related elements by different research groups [5]. Therefore, Roberts and colleagues in 2008 proposed a new version of the early nomenclature system, but not including non-autonomous elements (such as integron cassettes and MITEs) [5] (Table 2).

Table 2. Revised Nomenclature for Transposable Genetic Elements proposed by Roberts et al., 2008.
Type of transposable element a Definition
Composite transposons Flanked by IS elements. The transposase of the IS element is responsible for the catalysis of insertion and excision.
Unit transposons Typical unit elements encode an enzyme involved in excision and integration (DD(35)E or tyrosine) often a site-specific recombinase or resolvase and one or several accessory (e.g. resistance) genes in one genetic unit.
Conjugative transposons (CTns) / Integrative conjugative elements (ICEs) The conjugative transposons (CTns), also known as integrative conjugative elements (ICEs), carry genes for excision, conjugative transfer, and for integration within the new host genome. They carry a wide range of accessory genes, including antibiotic resistance
Mobilisable transposons (MTns) / Integrative mobilisable elements (IMEs) The mobilizable transposons (MTns), also known as integrative mobilizable elements (IMEs), can be mobilized between bacterial cells by other “helper” elements that encode proteins involved in the formation of the conjugation pore or mating bridge. The MTns exploit these conjugation pores and generally provide their own DNA processing functions for intercellular transfer and subsequent transposition.
Mobile genomic islands Some chromosomally integrated genomic islands encode tyrosine or serine site-specific recombinases that catalyze their own excision and integration but do not harbor genes involved in the transfer. They carry genes encoding for a range of phenotypes. The name of a genomic island reflects the phenotype it confers, e.g. pathogenicity islands encode virulence determinants (toxins, adhesins etc).
Integrated or transposable prophage An integrated or transposable prophage is a phage genome inserted as part of the linear structure of the chromosome of a bacterium which is able to excise and insert from and into the genome.
Integrated satellite prophage Bacteriophage genome inserted into that of the host which requires gene products from “helper” phages to complete its replication cycle
Group I intron Small post-transcriptionally splicing (splicing occurs in the pre-mRNA), endonuclease encoding element. Will home to the allelic site
Group II intron Small post-transcriptionally splicing (splicing occurs in the pre-mRNA), restriction endonuclease encoding element
IStron intein Chimeric ribozyme consisting of a group I intron linked to an IS605 like transposase Small post-translational splicing (splicing occurs in the polypeptide), endonuclease encoding element. Will home to allelic site.

a; not all reported elements have been shown to be mobile

IS classification system proposed in 2008

Likewise, a number of different types of IS derivatives also began to be identified and the ISfinder nomenclature scheme was extended to these distinct forms [11] and now including terms and definitions for MICs (Mobile Insertion Cassette), composed of passenger genes but no transposase, MITES (Miniature Inverted repeat Transposable Element), which are short IS-related elements with no internal open reading frame but which, like most ISs, transposons and MICs, include IS-like extremities and tIS which are IS into which passenger genes have been incorporated. It is important to note that a single IS might in principle be represented in all four forms (Table 3)

Table 3. IS classification system proposed by Siguier et al., 2008
Extremities Transposase Passenger genes
MITE yes no no
IS yes yes no
tIS yes yes yes
MIC yes no yes

IS identification

Fig.5.1. Results of Markov Cluster (MCL) Analysis. Each circle represents an individual IS transposase amino acid sequence. a) Inflation factor of 1.2, score >30 (links with scores of less than 30 were removed). IS1 family: blue circles. IS1595 family: green circles. b) The inflation factor of 2 increases stringency and separates each major group into groups. The IS1 family generated the ISMhu11 group (magenta). The IS1595 family separated into four groups: IS1595 (green), IS1016 (yellow), ISPna2 (blue-green), and ISH4 (light blue). c) Gradually reducing the weakest links between groups (Inflation factor 2, score >140) further divided the IS1595 family into 4 additional groups (IS1595, ISSod1, ISNwi1, and ISNha5).

The families in ISfinder are defined using an initial manual BLAST analysis often followed by reiterative BLAST analyses with the primary transposase sequence of representative elements used as a query in a BLASTP[12] search of microbial genomes. Potential full-length Tpases are retained and that with the lowest score then used as a query in a second BLASTP search. This is continued until no new potential candidates are detected. The ClustalW multiple alignment algorithm[13] is then used and the results displayed using the Jalview alignment editor[14] for assessment.

The corresponding DNA together with 1000 base pairs up- and down-stream is then extracted and examined manually for the IRs or other typical features such as secondary structures and flanking Direct Repeats (DRs). This, together with a comparison of the DNA extremities of various elements, allows identification of both ends of the collected elements. In cases where more than a single IS copy is identified, BLASTN can be used to define the IS ends. Where only a single copy is found, the ends can often be defined by identifying and comparing it with empty sites.

In a second step, we use the Markov Cluster Algorithm (MCL) (http://micans.org/mcl/)[15][16] to weigh the relationships between clusters of ISs and to validate prior ISfinder classification of ISs into families and subgroups. This is explained in detail in [17] and is based on the parameters used in the MCL (Fig.5.1) in addition to characteristics such as the specificity of target site duplications, the detailed sequence of the ends, genetic organization.

It should be understood that the distinction between families and subgroups can evolve as the number of ISs in the database increases. Several semi-automatic IS annotation pipelines are now available. The interested reader is directed to three of these: ISsaga[18] which is now integrated into the ISfinder platform[19], ISScan[20] and Oasis[21]. At present, de novo prediction of ISs is not efficient and these pipelines all employ the ISfinder database to function. While all three pipelines permit identification of IS fragments as well as full length ISs, a certain level of manual assessment is essential.

Insertion Sequences nomenclature and naming attribution

Nowadays the IS nomenclature and naming attribution is controlled by ISfinder database team [22]. They adopted a nomenclature similar to that of restriction enzymes [22]. Although this is not perfect, since some ISs are found in different species or even in different genera, the system is viable and has the advantage of indicating the host species rather than being confronted with a long series of numbers in the names as in the original nomenclature system [23][22]. The original Campbell classification system assign blocks of IS numbers to individual scientists, groups or institutions[23]. The ISfinder database includes a listing of these original ISs numbers together with the last known address of the attribution [22] (See ISfinder page). Moreover, ISfinder includes an online form for registering (requesting a name for) new ISs [22] (See ISfinder form). Several journals now suggest that authors register their new IS elements with ISfinder before publication[22]. Since a unique name can be attributed, this avoids some of the confusion in the literature[22].

Tn nomenclature and naming attribution

The Tn nomenclature and naming attribution is controlled by "The Transposon Registry" database [24]. The Transposon Registry is a nomenclature system for the assignment of Tn numbers for bacterial and archaeal autonomous TEs, including unit transposons, composite transposons, conjugative transposons (CTns)/Integrative Conjugative Elements (ICEs), Mobilisable transposons (MTns)/Integrative mobilizable elements (IMEs) and mobile genomic islands[24]. It excludes ISs, which are managed by ISfinder database and other TEs such as introns and inteins for which other databases already exist, and non-autonomous TEs such as integron cassettes and MITES[24].

Bioinformatics approaches for TE identification and annotation in prokaryotes genomes

Most genome annotation pipelines developed to date are dedicated to the prediction and characterization of coding regions and their putative products, signal peptides, pseudogenes, and noncoding RNAs. Surprisingly, annotation of ubiquitous genomic features such as Mobile Genetic Elements (MGEs) is generally not fully addressed and/or is neglected in most of these pipelines, and thus MGEs are seldom well-characterized and annotated [25].

To begin an analysis of the MGE content of a given genome, it is first necessary to identify and annotate ordinary genome features such as coding sequences (CDSs), rRNAs, and tRNAs [25] . For MGE detection and annotation, it is essential to use a variety of approaches including additional specialized pipelines and software which are normally not implemented in a regular annotation pipeline [25]. Therefore, it is not a simple task and requires different approaches depending on the level of analysis (reviewed in[25]).

Eukaryotic TE nomenclature and naming attribution

TEs have been found in virtually all eukaryotic species investigated so far, displaying an extreme diversity, revealed by thousands or even tens of thousands of different TE families[26]. Finnegan and colleagues in 1989, proposed the first TE classification system, which distinguished two classes by their transposition intermediate: RNA (Class I or retrotransposons) or DNA (Class II or DNA transposons)[26][27]. The transposition mechanism of Class I is commonly called ‘copy-and-paste’, and that of Class II, ‘cut-and-paste[27]. Later, in 2007, Wicker and colleagues proposed a common TE classification system that can be easily handled by non-specialists during genome and TEs annotation procedures[28]. This system provided a consensus between the various conflicting classification and naming systems that are currently in use[26]. A key component of this system is a naming convention: a three-letter code with each letter respectively denoting class, order and superfamily; the family (or subfamily) name; the sequence (database accession number) on which the element was found; and the ‘running number’, which defines the individual insertion in the accession. The unified system is also intended to facilitate comparative and evolutionary studies on TEs from different species [28] .

Nowadays, the Wicker TE classification system is well-recognized by the scientific community and is currently in use by Eukaryotic TE annotation pipelines, such as RepeatModeler2[29], REPET[30][31], PASTEC[32] , RepeatExplorer [33], EDTA [34] and others.

Bibliography

  1. De Palmenaer D, Siguier P, Mahillon J. IS4 family goes genomic. BMC Evol Biol. 2008 Jan 23;8:18. doi: 10.1186/1471-2148-8-18. PMID: 18215304; PMCID: PMC2266710.
  2. 2.0 2.1 Roberts AP, Chandler M, Courvalin P, Guédon G, Mullany P, Pembroke T, Rood JI, Smith CJ, Summers AO, Tsuda M, Berg DE . Revised nomenclature for transposable genetic elements. - Plasmid: 2008 Nov, 60(3);167-73 [PubMed:18778731] [DOI] </nowiki>
  3. 3.0 3.1 Campbell A, Berg DE, Botstein D, Lederberg EM, Novick RP, Starlinger P, Szybalski W . Nomenclature of transposable elements in prokaryotes. - Gene: 1979 Mar, 5(3);197-206 [PubMed:467979] [DOI] </nowiki>
  4. Tansirichaiya S, Rahman MA, Roberts AP . The Transposon Registry. - Mob DNA: 2019, 10;40 [PubMed:31624505] [DOI] </nowiki>
  5. 5.0 5.1 5.2 5.3 5.4 Roberts AP, Chandler M, Courvalin P, Guédon G, Mullany P, Pembroke T, Rood JI, Smith CJ, Summers AO, Tsuda M, Berg DE . Revised nomenclature for transposable genetic elements. - Plasmid: 2008 Nov, 60(3);167-73 [PubMed:18778731] [DOI] </nowiki>
  6. Lederberg EM . Plasmid Reference Center Registry of transposon(Tn) and insertion sequence (IS) allocations through December 1986. - Gene: 1987, 51(2-3);115-8 [PubMed:3036649] [DOI]
  7. Chandler and Mahillon Insertion sequence nomenclature. In: Asm news, Vol. 66, no. 6, p. 324-324 (2000)
  8. Carraro N, Burrus V. Biology of Three ICE Families: SXT/R391, ICEBs1, and ICESt1/ICESt3. Microbiol Spectr. 2014;2(6):10.1128/microbiolspec.MDNA3-0008-2014. doi:10.1128/microbiolspec.MDNA3-0008-2014
  9. Wood MM, Gardner JF. The Integration and Excision of CTnDOT. Microbiol Spectr. 2015;3(2):MDNA3-2014. doi:10.1128/microbiolspec.MDNA3-0020-2014
  10. Hacker J, Kaper JB. Pathogenicity islands and the evolution of microbes. Annu Rev Microbiol. 2000;54:641-679. doi:10.1146/annurev.micro.54.1.641
  11. Michael Chandler, Patricia Siguier, Jacques Mahillon. Nomenclature for insertion sequences and other forms. Microbe. 10 (45) 2008.
  12. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ . Basic local alignment search tool. - J Mol Biol: 1990 Oct 5, 215(3);403-10 [PubMed:2231712] [DOI]
  13. Thompson JD, Higgins DG, Gibson TJ . CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. - Nucleic Acids Res: 1994 Nov 11, 22(22);4673-80 [PubMed:7984417] [DOI]
  14. Clamp M, Cuff J, Searle SM, Barton GJ . The Jalview Java alignment editor. - Bioinformatics: 2004 Feb 12, 20(3);426-7 [PubMed:14960472] [DOI]
  15. Van Dongen S. A cluster algorithm for graphs. Technical Report INS-R0010, National Research Institute for Mathematics and Computer Science in the Netherlands Amsterdam. 2000;
  16. Enright AJ, Van Dongen S, Ouzounis CA . An efficient algorithm for large-scale detection of protein families. - Nucleic Acids Res: 2002 Apr 1, 30(7);1575-84 [PubMed:11917018] [DOI]
  17. Siguier P, Gagnevin L, Chandler M . The new IS1595 family, its relation to IS1 and the frontier between insertion sequences and transposons. - Res Microbiol: 2009 Apr, 160(3);232-41 [PubMed:19286454] [DOI]
  18. Varani AM, Siguier P, Gourbeyre E, Charneau V, Chandler M . ISsaga is an ensemble of web-based methods for high throughput identification and semi-automatic annotation of insertion sequences in prokaryotic genomes. - Genome Biol: 2011, 12(3);R30 [PubMed:21443786] [DOI]
  19. Siguier P, Perochon J, Lestrade L, Mahillon J, Chandler M . ISfinder: the reference centre for bacterial insertion sequences. - Nucleic Acids Res: 2006 Jan 1, 34(Database issue);D32-6 [PubMed:16381877] [DOI]
  20. Wagner A, Lewis C, Bichsel M . A survey of bacterial insertion sequences using IScan. - Nucleic Acids Res: 2007, 35(16);5284-93 [PubMed:17686783] [DOI]
  21. Robinson DG, Lee MC, Marx CJ . OASIS: an automated program for global investigation of bacterial and archaeal insertion sequences. - Nucleic Acids Res: 2012 Dec, 40(22);e174 [PubMed:22904081] [DOI]
  22. 22.0 22.1 22.2 22.3 22.4 22.5 22.6 Siguier P, Perochon J, Lestrade L, Mahillon J, Chandler M . ISfinder: the reference centre for bacterial insertion sequences. - Nucleic Acids Res: 2006 Jan 1, 34(Database issue);D32-6 [PubMed:16381877] [DOI] </nowiki>
  23. 23.0 23.1 Campbell A, Berg DE, Botstein D, Lederberg EM, Novick RP, Starlinger P, Szybalski W . Nomenclature of transposable elements in prokaryotes. - Gene: 1979 Mar, 5(3);197-206 [PubMed:467979] [DOI] </nowiki>
  24. 24.0 24.1 24.2 Tansirichaiya S, Rahman MA, Roberts AP. The Transposon Registry. Mob DNA. 2019;10:40. Published 2019 Oct 9. doi:10.1186/s13100-019-0182-3
  25. 25.0 25.1 25.2 25.3 Oliveira Alvarenga D, Moreira LM, Chandler M, Varani AM . A Practical Guide for Comparative Genomics of Mobile Genetic Elements in Prokaryotic Genomes. - Methods Mol Biol: 2018, 1704;213-242 [PubMed:29277867] [DOI]
  26. 26.0 26.1 26.2 Finnegan DJ . Eukaryotic transposable elements and genome evolution. - Trends Genet: 1989 Apr, 5(4);103-7 [PubMed:2543105] [DOI] </nowiki>
  27. 27.0 27.1 Finnegan DJ . Eukaryotic transposable elements and genome evolution. - Trends Genet: 1989 Apr, 5(4);103-7 [PubMed:2543105] [DOI] </nowiki>
  28. 28.0 28.1 Wicker T, Sabot F, Hua-Van A, Bennetzen JL, Capy P, Chalhoub B, Flavell A, Leroy P, Morgante M, Panaud O, Paux E, SanMiguel P, Schulman AH . A unified classification system for eukaryotic transposable elements. - Nat Rev Genet: 2007 Dec, 8(12);973-82 [PubMed:17984973] [DOI] </nowiki>
  29. Flynn JM, Hubley R, Goubert C, Rosen J, Clark AG, Feschotte C, Smit AF . RepeatModeler2 for automated genomic discovery of transposable element families. - Proc Natl Acad Sci U S A: 2020 Apr 28, 117(17);9451-9457 [PubMed:32300014] [DOI]
  30. Flutre T, Duprat E, Feuillet C, Quesneville H . Considering transposable element diversification in de novo annotation approaches. - PLoS One: 2011 Jan 31, 6(1);e16526 [PubMed:21304975] [DOI]
  31. Quesneville H, Bergman CM, Andrieu O, Autard D, Nouaud D, Ashburner M, Anxolabehere D . Combined evidence annotation of transposable elements in genome sequences. - PLoS Comput Biol: 2005 Jul, 1(2);166-75 [PubMed:16110336] [DOI]
  32. Hoede C, Arnoux S, Moisset M, Chaumier T, Inizan O, Jamilloux V, Quesneville H . PASTEC: an automatic transposable element classification tool. - PLoS One: 2014, 9(5);e91929 [PubMed:24786468] [DOI]
  33. Novák P, Neumann P, Pech J, Steinhaisl J, Macas J . RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next-generation sequence reads. - Bioinformatics: 2013 Mar 15, 29(6);792-3 [PubMed:23376349] [DOI]
  34. Ou S, Su W, Liao Y, Chougule K, Agda JRA, Hellinga AJ, Lugo CSB, Elliott TA, Ware D, Peterson T, Jiang N, Hirsch CN, Hufford MB . Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. - Genome Biol: 2019 Dec 16, 20(1);275 [PubMed:31843001] [DOI]