The alignment algorithms used in the software from CLC bio A/S has some unique features including the option of adjusting the cost of gaps in the end of the alignment to suit the sequences being aligned.
We have two alignments: A standard algorithm that is 10 times faster than our previous
alignment in most scenarios, and an additional alignment that is even faster, but less accurate than
the standard algorithm.
The White Paper below forms the basis for these 5 conclusions:
- On large data sets of sequences that are not too divergent, our alignment is significantly faster than the standard CLUSTAL W alignment, and around the same speed as the fast CLUSTAL W alignment.
- Performing an alignment of 28 HIV genomes, our fast alignment is more than 10 times (55 minutes) faster than the standard CLUSTAL W alignment.
- We have benchmarked our new algorithms on the BAliBASE 3.0 database of accurate protein alignments (Thompson et al., 2005). This shows that our alignment algorithm is about 1% more accurate than the latest version of the standard CLUSTAL W on protein alignments.
- We have bechmarked our new algorithms on the BRaliBase II database of structurally aligned RNA. Here, our new algorithm is about 3.5% more accurate than the standard CLUSTAL W.
- Our standard algorithm is still a little slower than the standard CLUSTAL W on the fairly divergent alignments in BAliBASE and BRaliBase. Our fast alignment is as precise and as fast as the standard CLUSTAL W on these data sets.
Download White Paper