Genaev M.A.   Afonnikov D.A.   Gunbin K.V.  

BioinfoWF — workflow management system for bioinformatics analysis

Докладчик: Genaev M.A.


Key words: bioinformatics, workflow, grid processing, XML, web interface

Motivation and Aim: The analysis of biological data in bioinformatics usually consists of several steps performed by different programs subsequently. During the analysis progress, the output of one calculation module serves as an input of the other module, etc. Thus, the overall procedure could be organized as a workflow. For example, the calculation of the phylogenetic tree for protein family requires protein sequence extraction from databases, multiple sequence alignment, phylogeny estimation. It should be noted, that most of single steps could be performed using different routines. For example, sequence alignment could be obtained using ClustalW, Mafft, Muscle or T-Coffee programs. The program’s choice by user often depends on the data under analysis and the aim of the task.

Methods and Algorithms: To perform workflow data processing for bioinformatics we developed BioinfoWF system. It is written in Perl and based on the XML description of the program options, input and output data for a single step of the workflow. The second part of the system describes the workflow scheme, set the file data, the execution status of each step. The BioinfoWF runs under command line on the UNIX-like systems or as a web-service. The workflow or its part can also perform on the multiprocessor cluster systems under Sun Grid Engine.
Results: We used BioinfoWF to develop Computer System for Analysis of Molecular Evolution Modes of Protein Families (SAMEM) and functionally important  SNP detection in the regulatory regions of eukaryotic genes.

Conclusion: The BioinfoWF can be used to organize workflow management for various bioinformatics tasks.

Availability: The BioinfoWF available at

