Bio.Mambo Project Overiew

aa_world.gifBio.Mambo is a web portal solution based on the open source Mambo
project. We developed components that extend Mambo from a content
management system to a powerful portal for web-based bioinformatics
services and cluster management.

The Bio.Mambo project was
guided to work best with the knowledge from existing open source
projects. Based on Mambo’s web application framework, our development
time is significantly reduced by focusing on only the functions we
need.  Together with the components from other 3rd party
developers we are able to provide a complete portal solution for
bioinformatics cluster.

The components we developed for Bio.Mambo project include:

  • A Generic Web Engine that  
    • provides interfaces for more than 350 bioinformatics tools.
    • provides interface to design pipelines for various analysis.
    • manages pipeline jobs without persistent pipeline manager.
    • implements a DB-centric job management system.

  • Bio-Data Auto Updater
  • Parallel Local Data Distributor

  • Batch System Monitor (support LSF and SGE)

We implemented a generic web engine that uses the
XML definition files from PISE project to dynamically generate
theme-able web interfaces for command line bioinformatics tools. Due to
various reasons, we do not use the Perl code from PISE but developed
our own engine with PHP from scratch. A lot of effort has been made to
extend with new functions while keep full compatibility with existing
open source resources.

Our web engine is enhanced
innovatively to manage bioinformatics pipelines without persistent
pipeline manager, which features most of the traditional pipeline
systems. In our implementation each job in the pipeline is self-managed
in that it is capable of querying the database to initiate jobs for the
next steps if possible. Different from GPIPE (a PISE pipeline
extension), we store the step configurations into MySQL database, which
at the same time works as the backbone of our DB-centric job management
system. User-friendly web interfaces were also implemented to facility
pipeline design and tool management.

Data management is
another important aspect for bioinformatics cluster. Bio.Mambo project
provides in-house developed tool that pushes data across the cluster in
parallel. The data distribution is so efficient that the total
distribution time is in O(log2N) to the time needed to
transfer data to one node, where N is the number of nodes in the
cluster. We also developed web interface for Citrina toolkits that
manages to update the biology data automatically.

Cluster
management functions include monitors for cluster hardware and batch
system (i.e. LSF, SGE etc). A component was developed specifically for
Apple’s Xserve G4 and G5 hardware. In addition, we wrote a thin wrapper
for phpMyAdmin project so that administrator can manage the MySQL
database without leaving the Bio.Mambo web environment.


相关日志

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>