CHAINER
Software for Comparative Genomics
Mohamed Ibrahim Abouelhoda
October, 2004
This is the web-site of
CHAINER,
software tool for versatile comparative genomics tasks. In the user
manual and tutorial,
there are many examples and a tutorial on
how to use CHAINER for efficient
genome comparison.
Here we outline the most important features
of CHAINER.
CHAINER is a software tool for processing fragments,
which are short similar regions of two or more genomes,
for performing the following
comparative genomics tasks:
- Global alignment for two or multiple whole genomes.
- Finding regions of high similarity (candidate regions of conserved
synteny) among two or multiple genomes.
- Comparison of a draft genome (a draft genome is not a single string but
is a set of strings called contigs) to a finished or to another draft
genome.
- cDNA/EST mapping.
CHAINER is basically developed to handle
the second phase of the anchor-based strategy that is applied for
comparing two or more genomes.
The anchor-based strategy is composed of three
phases:
- Computation of fragments.
- Computation of one highest-scoring global chain or a set of
significant local chains of colinear non-overlapping fragments.
- Alignment of the regions between the fragments of a chain (the
anchors) by applying the same method recursively with less stringent
parameters or by using standard dynamic programming.
This strategy can be used for solving the aforementioned tasks.
For example, if genomes of closely-related organisms are compared,
where there are no (or few) genome rearrangements, then in the second
phase CHAINER can be used to
compute an optimal global chain of colinear non-overlapping
fragments.
If genomes of distantly-related organisms are compared,
where rearrangement events are very likely to take place,
then CHAINER can be used for computing
a set of significant local chains.
Each local chain represents a region
of high similarity among the genomes in comparison.
For more details, see the user manual and tutorial.
CHAINER is based on algorithms that solve
the chaining problem using technqiues from computational geometry; see [1-3].
The underlying technqiues and data structures
are optimized to handle large number
of fragments efficiently.
The standard version of CHAINER is compiled in 32-bit mode.
For large server class machines (e.g., SUN-Sparc/Solaris)
CHAINER can be compiled in 64-bit mode.
Please see the user manual and tutorial for details.
Here, you can download
the test data (size is about 20 Mb)
needed to run the
examples of the tutorial.
CHAINER was developed since September 2003 by
Mohamed Ibrahim Abouelhoda,
Dept. of Bioinformatics,
Faculty of Computer Science, University of Ulm,
Germany. Improvements and additional options are continuousely integrated.
Please, send your comments and suggestions to the author.
CHAINER is available in executable format for the following platforms:
- Linux (Redhat, SuSe, Debian) for Intel and AMD architectures
- Solaris for the SUN/Sparc architecture
If you need CHAINER for an additional platform,
then please contact the author.
CHAINER is free for
academic research, educational and demonstration purposes.
Please,
send an email to the author
in order to obtain the software. This email should contain
acknowledgement of the licence agreement.
For commercial license, please directly contact the author.
The CHAINER-manual
The README file
The CHAINER-license agreement form
CHAINER can use any kind of fragments as long as they
are given in the correct input format. In the examples given in
the user manual and tutorial
we used the programs Vmatch
and Multimat.
Vmatch is a program to compute various kinds of matches between two genomes.
Multimat is a program to compute multiple exact matches between
two or more genomic sequences. Both programs are based on
efficient data structures called the Enhanced Suffix Array [4]
Keep in mind that
CHAINER is a software developed to optimize the second phase
of the anchor based strategy and the fragments have to be
generated in the first phase.
Later, the three phases will be incorporated
within a toolbox with a graphical user interface.
It is interesting to see that the usage of Multimat and
the global chaining option of CHAINER
yields the same set of anchors obtained by MGA.
CHAINER is part of the
DFG-Projekt:
Entwicklung eines Software-Systems zum multiplen
Genomvergleich supported by the DFG-grant Oh 54/4-1.
My thanks to Enno Ohlebusch, Stefan Kurtz , Janina Reeder,
Kathrin Hockel, and Jan Stallkamp for their help and useful
suggestions.
- Mohamed Ibrahim Abouelhoda, Enno Ohlebusch
CHAINER: Software for Comparing Genomes.
In 12th International Conference on Intelligent Systems for Molecular Biology/3rd European Conference on Computational Biology.
- Mohamed Ibrahim Abouelhoda, Enno Ohlebusch
Chaining Algorithms for Multiple Genome Comparison
Journal of Discrete Algorithms, to appear.
- Mohamed Ibrahim Abouelhoda, Enno Ohlebusch
A Local Chaining Algorithm and its Applications in Comparative
Genomics
Proceedings of the 3rd Workshop on Algorithms in Bioinformatics,
pages 1-16, LNBI 2812 , 2003.
© Springer-Verlag
- Mohamed Ibrahim Abouelhoda, Stefan Kurtz,
Enno Ohlebusch
Replacing Suffix Trees with Enhanced Suffix Arrays
Journal of Discrete Algorithms, 2(1):53-86, 2004.