General Information/ISfinder and the Growing Number of IS

From TnPedia
Revision as of 19:50, 4 May 2020 by TnCentral (talk | contribs)
Jump to navigation Jump to search

IS classification is needed to cope with the high numbers and diversity of ISs. It also permits identification of the many IS fragments present in numerous genomes, contributes to understanding their effects on their host genomes, and can provide insights into their regulation and transposition mechanism. This role has been assumed by ISfinder[1] following the closure of the Stanford repository[2]. Several criteria are used to classify IS. These include: genetic organization, the similarity of transposase amino acid primary sequence, length and sequence of terminal inverted repeats, target site preferences, length of target repeats and the chemistry of transposon DNA strand cleavage and transfer into the target DNA (Fig.1.4.1).

Fig.1.4.1. Main characteristics to used to define the IS groups and families.Terminal inverted repeats (IRL and IRR) are shown as two-colored boxes (a and b) with functions for transposase binding (a) and recognition for cleavage and strand transfer (a). A single (left) or double (right) open reading frame is shown underneath the IS (blue arrow). The transposase of the IS on the right is produced by programmed -1 translational frameshifting. The reading frames are indicated within the IS. The product of the upstream frame generally acts as a regulatory protein. The indigenous Tpase promoter is shown located (by convention) in IRL. XXX and YYYY represents the short direct target repeat sequence which is generally duplicated during the insertion event.

Since 1998, IS have been centralized in the ISfinder database to provide a basic framework for nomenclature and IS classification into related groups or families, often divided into subgroups (Fig.1.4.2)[3]. Initially IS were each assigned a simple number[4]. However, to provide information about their provenance, IS nomenclature rules were changed and now resemble those used for restriction enzymes: with the first letter of the genus followed by the first two letters of the species and a number (e.g., ISBce1 for Bacillus cereus). In 1977 only 5 IS (IS1, IS2, IS3, IS4 and IS5) had been identified [5].

Fig.1.4.2. IS groups and families abundance in ISfinder: Distribution of IS families in theISfinder database. The histogram shows the number of IS of a given family, as defined in the text, in the ISfinder database (June 2013). The horizontal boxes indicate the number and relative size of different subgroups (see Table 1 for the subgroups names) within the family. They are grouped by colour to indicate the type of Tpase used: DDE, blue; undetermined, purple; DEDD, green; HUH, red; Serine, orange.

At the time of publication of the first edition of Mobile DNA I (Berg & Howe, 1989) this had risen to 50 (Galas & Chandler, 1989 pp. 109–162); at the time of the second, Mobile DNA II (Craig, et al., 2002), there were more than 700; and at present, ISfinder includes more than 4600 examples distributed into 29 families some of which can be conveniently divided into subgroups (Fig.1.4.3) [6][7]. This classification evolves continuously with the accumulation of additional ISs. The IS in the ISfinder repository represents only a fraction of IS present in the public databases. Not only has the number of IS identified increased dramatically with the advent of high throughput genome sequencing but the examination of the public databases has shown that genes annotated as transposases (Tpases), the enzymes which catalyze TE movement (or proteins with related functions), are by far the most abundant functional class[8]. (Fig.1.2.5)

Fig.1.4.3. The growing number of IS deposited in the ISfinder Database. Diagram showing the increase in the number of IS in the ISfinder database as a function of time. At present (May, 2020) there are over 5000 entries.


  1. <pubmed>16381877</pubmed>
  2. <pubmed>6282704</pubmed>
  3. <pubmed>16381877</pubmed>
  4. <pubmed>467979</pubmed>
  5. <pubmed>339095</pubmed>
  6. <pubmed>26104715</pubmed>
  7. <pubmed>24499397</pubmed>
  8. <pubmed>PMC2910039</pubmed>