Nucleotide sequence alignment tool

11/10/2023

In this paper, we present a dynamic programming alignment algorithm called DP-MEM to be used in the extending-step of the DNA read-mapper tools. However, the scope of this paper is the use in conventional computing platform (CPUs), though they can be extended for use in FPGAs. The Smith-Waterman algorithm is also accelerated on Graphic Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs). Another example is a greedy approach for aligning DNA sequences introduced in. For example, PatternHunter uses its own custom-made extending-step. There are other alignment algorithms which do not use the conventional dynamic programming technique. Both Gene Myers and Ukkonen algorithms are edit-distance based dynamic programming alignment. For example, GEM and SNAP use modified version of Gene Myers and Ukkonen algorithms respectively for their extending-step. Not all DNA read-mappers use the above extending-step. These implementations exploit data-level parallelisation (SIMD) instructions of Intel processors (SSE) to further speed up the alignment process. Such implementations of the Smith-Waterman algorithm can be found in. Most DNA read-mappers use a derivative of the Smith-Waterman algorithm which uses affine-gap scoring to find a semi-global alignment. Such an extending-step could also implement different scoring systems such as edit-distance scoring or affine-gap scoring. Dynamic programming alignment could produce global, local or semi-global alignment. Typically, in the extending-step, dynamic programming is used. The focus of this paper is the extending-step and not the seeding-step. Another technique is to use spaced-seeding that is used in PatternHunter and PatternHunter II. GEM limits the seed size to have at least n+1 non-overlaping seed to find all alignments with up to n errors. For example, BWA-MEM looks for Maximal Exact Matches (MEMs) and MUMmer looks for Maximal Unique Matches (MUMs). The seeding-step varies from program to program. In order to search for the seed in the reference-genome, some methods such as BWA and Bowtie use a suffix-tree-based structure called FM-Index while others such as BLAST and SNAP use a hash-table index of fixed size k-mers (subsequences of length k). Seed-and-extend method is also used in similarity search tools such as BLAST, BLAT and MUMmer as well as PatternHunter and PatternHunter II. Once a rough alignment is identified (seeding-step), the read is typically aligned to all candidate regions using a dynamic programming algorithm (extending-step) In the seed-and-extend technique, small subsequences of a read (called seeds) are searched in the reference-genome to find candidate regions. This technique is used in DNA read-mappers such as BWA and Bowtie. Seed-and-extend alignment method is a popular technique for aligning reads to the reference-genome. The term alignment covers a broad range of different processes. MEM-Align is a potential candidate to replace other pairwise alignment algorithms used in processes such as DNA read-mapping and Variant-Calling.īiological sequence alignment is about finding similarities and differences between sequences. Fast run-time is achieved by: (a) using a bit-level parallel method to extract MEMs (b) processing MEMs rather than individual symbols and, (c) applying heuristics. Yet MEM-Align is up to 14.5 times faster than the Smith-Waterman algorithm. As a result, for 99.9% of input sequence pair, the computed alignment score is identical to the alignment score computed by Smith-Waterman. MEM-Align tries to mimic alignment produced by Smith-Waterman. In contrast to traditional alignment method (such as Smith-Waterman) where individual symbols are aligned, MEM-Align extracts Maximal Exact Matches (MEMs) using a bit-level parallel method and then looks for a subset of MEMs that forms the alignment using a novel dynamic programming method. In this paper, we present MEM-Align, a fast semi-global alignment algorithm for short DNA sequences that allows for affine-gap scoring and exploit sequence similarity. There are faster alignment algorithms but they suffer from the lack of accuracy. Despite using data level parallelisation, pairwise alignment consumes much time. Smith-Waterman algorithm) is widely used for this purpose. Pairwise alignment of short DNA sequences with affine-gap scoring is a common processing step performed in a range of bioinformatics analyses.

0 Comments

Nucleotide sequence alignment tool

Leave a Reply.

Author

Archives

Categories