Distributed Indexing Dispatched Alignment* (DIDA*)

DIDA* performs large-scale alignment tasks by distributing the indexing and alignment stages into smaller subtasks over a cluster of compute nodes.

Performance increase by up to 77 percent1

Distributed Indexing Dispatched Alignment* (DIDA*) is a novel distributed and parallel indexing and alignment framework that consists of five major steps to perform the indexing and alignment task: distribute, index, dispatch, align, and merge. The indexing and dispatch steps are performed in parallel. It works by first partitioning the targets into smaller parts using a heuristic balanced cut. Next, DIDA creates an index for each partition. The reads are then “flowed” through a Bloom filter to dispatch the alignment task to the node(s). Finally, the reads are aligned on all partitions in parallel and the partial results are combined together to create the final output.

DIDA is written in C++ and parallelized using OpenMP for multithreaded computing on a single computing node. For distributed computing, DIDA employs a message passing interface (MPI) for inter-process communications. As input, it gets the set of target sequences and the set of queries in FASTA or FASTQ formats, and the default output is SAM format.

Performance Results

The performance of DIDA was measured and evaluated when coupled with popular alignment methods Burrows-Wheeler Aligner* (BWA*), Bowtie2, Novoalign, and ABySS-map on C. elegans, human draft genome, human reference genome, and P. glauca genome. Compared to their baseline performance, when run through the DIDA framework with 12 nodes, BWA, Bowtie2, Novoalign, and ABySS-map use less memory (91 percent, 90 percent, 87 percent, and 91 percent, respectively) and execute faster (55 percent, 74 percent, 77 percent, and 67 percent, respectively) for a draft human genome assembly.1

Download the code ›

Reproduce these results with this optimization recipe ›

Related Codes

Assembly By Short Sequences* (ABySS*) ›

Publications

Hamid Mohamadi, Benjamin P. Vandervalk, Anthony Raymond, Shaun D. Jackman, Justin Chu, Clay P. Breshears, and Inanc Birol. "DIDA: Distributed Indexing Dispatched Alignment." PLoS ONE 10, no. 4 (2015). doi: 10.1371/journal.pone.0126409.

Configuration Table

System Overview

 

Nodes

Twelve HPC nodes interconnected by 40Gbps Infiniband

Processor

Each node has two Intel® Xeon® X5650 processors (2.67 GHz)

RAM

Each node has 48GB RAM

Operating System

CentOS 5.4
Intel® Cluster Studio 2013
DIDA ver. 1.0.1, ABySS-map v1.5.2
BWA v0.7.10, Bowtie2 v2.1.0
Novoalign v3.01.02

Información sobre productos y rendimiento

1

Los resultados de análisis se obtuvieron antes de la aplicación de los parches de software y actualizaciones de firmware más recientes, pensados para solucionar los ataques "Spectre" y "Meltdown". Puede que, al implementar estas actualizaciones, los resultados mostrados no sean aplicables a su dispositivo o sistema.

El software y las cargas de trabajo utilizadas en las pruebas de rendimiento han sido optimizados para el rendimiento solamente en microprocesadores Intel®. Las pruebas de rendimiento, como SYSmark* y MobileMark*, se han medido utilizando sistemas, componentes, software, operaciones y funciones informáticas específicas. Cualquier cambio realizado en cualquiera de estos factores puede hacer que los resultados varíen. Es conveniente consultar otras fuentes de información y pruebas de rendimiento que le ayudarán a evaluar a fondo sus posibles compras, incluyendo el rendimiento de un producto concreto en combinación con otros. Si desea obtener más información, visite http://www.intel.es/benchmarks.