High-performance and Distributed Biocomputing (HDBC) Lab


We are investigators at the Center for Computation and Technology (CCT), LSU. Our research interest is to develope high-performance and distributed computing strategies for large scale scientfic applications. In particular, biological applications such as Next-generation sequencing analytics, molecular simulations, and advanced sampling are the major topics aimed by the developed strategies. HPC has been a key role for supporting large scale applications, and distributed computing strategies have been increasingly becoming critical, in particular, data intensive applications and BigData challenges. Currently, we focus on specific reesearch topics described below. You can contact us with e-mail. jhkim@cct.lsu.edu, nykim@cct.lsu.edu

Science Gateway for Life Science BigData Research

Science gateways are effective means for collaborative research activities with the capability to utilize large scale High-performance computing (HPC) resources and databases located in many different places. Gateways lower significantly barriers associated with all kinds of complexities associated with accessing and such resources and data sets, and eventually facilitate scientific discovery of an individual researcher. While most of successful gateways are concerned about large scale compute intensive scientific applications, gateway is also an attractive method for BigData projects. We have been developing a gateway architecture along with collaborators and the schematic is shown here. Our approach provides an effective way to develop a new science gateway which is capable of the seamless integration of distributed resources and databases, while the development cycle is reletively shorter due to our design strategy to build interoperable, distributed, adative, and user-friendly research cyberinfrastructure.

Schematic of Our Science Gateway Architecture

image01

Related publications

Sharath Maddineni, Joohyun Kim, Yaakoub El-Khamra, and Shantenu Jha, “Dynamic Application Runtime Environment (DARE): A Standards-based Framework For Building Science Gateways”, J. Grid Computing, (2012), 10(4):647

Our research interest is to develop novel strategies for large scale bigdata genomics projects. Some major focus areas are stated below.

  • Integrative Epigenetics Environment for Whole Genome Shotgun Bisulfite-Seq Data Analytics and Chromatin Modeling
  • Big Data Cyberinfrastructure effectively supporting poulation sequencing-based Genome-wide association studies
  • Large scale de novo assembly using distributed parallel computing
  • Genome-wide structural bioinformatics for coding proteins and non-coding RNAs

Large scale simulations of biomolecules, in particular, RNAs have been pursued. Our strategy is to utilize High-performance Computing (HPC) machines as well as to develop efficient sampling methods such as Replica Exchange Statistical Temperature Molecular Dynamics. Some of our recent contributions, along with collaborators, are as follow

  • SAM-I riboswitch branch migration simulation using Anton (Collaboration with Dr. Fareed and Dr. Huang)
  • RESTMD using Hadoop (Collaboration with Dr. Keyes (Boston University)





SAM-I Riboswitch RNA structures modelled and simulated with Anton over microseconds

Next-generation Sequencing Infrastructure

We are currently engaged in genelab NGS sequencing core, BIOMMED, School of Vet. Medicine, LSU as a bioinformatics support team. For more information, visit this link. The overall infrastructure behind the sequencing core and bioinformatics support can be illustrated as follows.

image01

NGS Data Analytics

Currenlty, offered NGS data analytics are as follow.
  • Major sequencing protocols including RNA-Seq, ChIP-Seq, and miRNA-Seq data analyses with downstream analyses such as KEGG pathway and GO
  • Alignment and variant calling using various open source tools and user-requested tools
  • In-house RNA-Seq pipeline - primarily utilzing cufflinks and other open source tools (currently, the full-fledged analysis including extensive downstream analyses is available for prokaryotes)
  • In-house ChIP-Seq pipeline using alignment tools and MACS

Workflow of RNA-Seq data analytics

image01

Related publications

Joohyun Kim, Sharath Maddineni, and Shantenu Jha, “Advancing Next-Generation Sequencing Data Analytics with Scalable Distributed Infrastructure”, (published online with Concurrency and Computation: Practice and Experience).

Felix Francis, Joohyun Kim, Thiru Ramaraj, Andrew Farmer, Milton C. Rush, and Jong Hyun Ham, “Comparative genomic analysis of two Burkholderia glumae strains from different geographic origins reveals a high degree of plasticity in genome structure associated with genomic islands” , Mol. Genet. Genomics, (2013) 288:195–203.