General Information/ISfinder and the Growing Number of IS
IS classification is needed to cope with the high numbers and diversity of ISs. It also permits identification of the many IS fragments present in numerous genomes, contributes to understanding their effects on their host genomes, and can provide insights into their regulation and transposition mechanism. This role has been assumed by ISfinder following the closure of the Stanford repository. Several criteria are used to classify IS. These include: genetic organization, the similarity of transposase amino acid primary sequence, length and sequence of terminal inverted repeats, target site preferences, length of target repeats and the chemistry of transposon DNA strand cleavage and transfer into the target DNA (Fig.1.4.1).
Since 1998, IS have been centralized in the ISfinder database to provide a basic framework for nomenclature and IS classification into related groups or families, often divided into subgroups (Fig.1.4.2). Initially IS were each assigned a simple number. However, to provide information about their provenance, IS nomenclature rules were changed and now resemble those used for restriction enzymes: with the first letter of the genus followed by the first two letters of the species and a number (e.g., ISBce1 for Bacillus cereus).
In 1977 only 5 IS (IS1, IS2, IS3, IS4 and IS5) had been identified (Nevers & Saedler, 1977). At the time of publication of the first edition of Mobile DNA I (Berg & Howe, 1989) this had risen to 50 (Galas & Chandler, 1989 pp. 109–162); at the time of the second, Mobile DNA II (Craig, et al., 2002), there were more than 700; and at present, ISfinder includes more than 4600 examples distributed into 29 families some of which can be conveniently divided into subgroups (Fig.1.4.3) . This classification evolves continuously with the accumulation of additional ISs. The IS in the ISfinder repository represents only a fraction of IS present in the public databases. Not only has the number of IS identified increased dramatically with the advent of high throughput genome sequencing but the examination of the public databases has shown that genes annotated as transposases (Tpases), the enzymes which catalyze TE movement (or proteins with related functions), are by far the most abundant functional class. (Fig.1.2.5)