CHAINER
Software for Comparative Genomics

Mohamed Ibrahim Abouelhoda

October, 2004

This is the web-site of CHAINER, software tool for versatile comparative genomics tasks. In the user manual and tutorial, there are many examples and a tutorial on how to use CHAINER for efficient genome comparison. Here we outline the most important features of CHAINER.

Features of CHAINER

CHAINER is a software tool for processing fragments, which are short similar regions of two or more genomes, for performing the following comparative genomics tasks:
  1. Global alignment for two or multiple whole genomes.
  2. Finding regions of high similarity (candidate regions of conserved synteny) among two or multiple genomes.
  3. Comparison of a draft genome (a draft genome is not a single string but is a set of strings called contigs) to a finished or to another draft genome.
  4. cDNA/EST mapping.
CHAINER is basically developed to handle the second phase of the anchor-based strategy that is applied for comparing two or more genomes. The anchor-based strategy is composed of three phases:
  1. Computation of fragments.
  2. Computation of one highest-scoring global chain or a set of significant local chains of colinear non-overlapping fragments.
  3. Alignment of the regions between the fragments of a chain (the anchors) by applying the same method recursively with less stringent parameters or by using standard dynamic programming.
This strategy can be used for solving the aforementioned tasks. For example, if genomes of closely-related organisms are compared, where there are no (or few) genome rearrangements, then in the second phase CHAINER can be used to compute an optimal global chain of colinear non-overlapping fragments. If genomes of distantly-related organisms are compared, where rearrangement events are very likely to take place, then CHAINER can be used for computing a set of significant local chains. Each local chain represents a region of high similarity among the genomes in comparison. For more details, see the user manual and tutorial.

Efficient algorithms and data structures

CHAINER is based on algorithms that solve the chaining problem using technqiues from computational geometry; see [1-3]. The underlying technqiues and data structures are optimized to handle large number of fragments efficiently. The standard version of CHAINER is compiled in 32-bit mode. For large server class machines (e.g., SUN-Sparc/Solaris) CHAINER can be compiled in 64-bit mode.

Formats and Usage

Please see the user manual and tutorial for details.

Test data

Here, you can download the test data (size is about 20 Mb) needed to run the examples of the tutorial.

Developer

CHAINER was developed since September 2003 by Mohamed Ibrahim Abouelhoda, Dept. of Bioinformatics, Faculty of Computer Science, University of Ulm, Germany. Improvements and additional options are continuousely integrated. Please, send your comments and suggestions to the author.

Availability

CHAINER is available in executable format for the following platforms: If you need CHAINER for an additional platform, then please contact the author. CHAINER is free for academic research, educational and demonstration purposes. Please, send an email to the author in order to obtain the software. This email should contain acknowledgement of the licence agreement. For commercial license, please directly contact the author.

Important Documents

The CHAINER-manual

The README file

The CHAINER-license agreement form

Fragment generation tools

CHAINER can use any kind of fragments as long as they are given in the correct input format. In the examples given in the user manual and tutorial we used the programs Vmatch and Multimat. Vmatch is a program to compute various kinds of matches between two genomes. Multimat is a program to compute multiple exact matches between two or more genomic sequences. Both programs are based on efficient data structures called the Enhanced Suffix Array [4] Keep in mind that CHAINER is a software developed to optimize the second phase of the anchor based strategy and the fragments have to be generated in the first phase. Later, the three phases will be incorporated within a toolbox with a graphical user interface. It is interesting to see that the usage of Multimat and the global chaining option of CHAINER yields the same set of anchors obtained by MGA.

Acknowledgement

CHAINER is part of the DFG-Projekt: Entwicklung eines Software-Systems zum multiplen Genomvergleich supported by the DFG-grant Oh 54/4-1. My thanks to Enno Ohlebusch, Stefan Kurtz , Janina Reeder, Kathrin Hockel, and Jan Stallkamp for their help and useful suggestions.

Bibliography

  1. Mohamed Ibrahim Abouelhoda, Enno Ohlebusch
    CHAINER: Software for Comparing Genomes.
    In 12th International Conference on Intelligent Systems for Molecular Biology/3rd European Conference on Computational Biology.

  2. Mohamed Ibrahim Abouelhoda, Enno Ohlebusch
    Chaining Algorithms for Multiple Genome Comparison
    Journal of Discrete Algorithms, to appear.

  3. Mohamed Ibrahim Abouelhoda, Enno Ohlebusch
    A Local Chaining Algorithm and its Applications in Comparative Genomics
    Proceedings of the 3rd Workshop on Algorithms in Bioinformatics, pages 1-16, LNBI 2812 , 2003. © Springer-Verlag

  4. Mohamed Ibrahim Abouelhoda, Stefan Kurtz, Enno Ohlebusch
    Replacing Suffix Trees with Enhanced Suffix Arrays
    Journal of Discrete Algorithms, 2(1):53-86, 2004.