2010 MCBIOS Graduate Student Awardees
GPN was a sponsor for the 2010 MCBIOS (MidSouth Computational Biology and BioInformatics Society) Graduate Presentation Awards at the Annual Meeting held at Arkansas State University in Jonesboro, Arkansas. There were three awards in each of three categories: Oral Presentation; Poster Presentation, Computational Merit; and Poster Presentation: Biological Merit. The winners and the titles of their papers are listed below. Congratulations to all the presenters at the conference!
First place: Lineage Specific Activity from Novel Piggyac Elements and Evidence of Horizontal Transfer in Mouse Lemurs (Microcebus)
- Heidi Pagan, Mississippi State University (MSU)
- All of the model mammalian genomes examined thus far (human, mouse, opossum, etc.) have demonstrated a generalized shutdown of Class II transposable elements (TEs) at roughly the same time (~40 mya). The first notable exception was the little brown bat, Myotis lucifugus, quickly followed by the discovery that the hAT superfamily had recently been particularly active through horizontal transfer events among several different mammalian species. As part of our efforts to determine the overall validity of a mammalian-wide shutdown of Class II TEs, we have examined the mouse lemur (Microcebus murinus) genome.
Second place: Assembling a Novel Fungal Genome from Short Read Sequencing Data
- Juliet Tang, Mississippi State University (MSU)
- We used Illumina GAII paired-end (PE) sequencing to produce a de novo assembly of the genome of Antrodia radiculosa, a copper-tolerant brown rot fungus that is capable of aggressive wood decay. Our run produced 117 million PE reads, each 76 nt long, the majority of which were high quality. Bases with poor scores (<= D) tended to occur at progressively higher frequencies as the sequence was read further into the fragment. To evaluate how these bad scores affected the assembly, datasets were progressively cleaned and tested to see if they produced an assembly in Velvet. Neither the original nor the semi-clean datasets assembled. An assembly was produced when all reads with poor scores were removed, at which point there were still 50,000 N's left that were primarily singlets. The assembly was characterized by an N50 of 22 Kbp, giving an approximate genome size of 31 Mbp, which is close to the size of other related fungal genomes. Roughly half the genome was covered by 452 contigs > 20 Kbp and the largest contig was 110 Kbp. Relatively few contigs had percent GC content < 30%, indicating little contamination from mitochondrial DNA. Preliminary blastp results showed that many of the translation products of genes predicted from our assembly gave e-scores < 1e-60 with about 70% homology to proteins from other fungi. These results demonstrate that de novo assembly of a small eukaryotic genome using short read sequencing can provide enough high quality sequence for genome annotation.
Third place: Promoter Prediction in Halothiobacillus Neapolitanus C2 based on stress-induced DNA duplex destabilization
- Aleksandra Markovets, Mississippi Valley State University (MVSU)
- In the post-genomic era when scientists can sequence the genomes of many organisms, one of the biggest challenges is the correct identification of promoter regions, which is essential for the understanding of gene regulation. Since wet-lab promoter prediction techniques are time consuming, in silico methods have been used to facilitate the process. Most of these traditional computational methods are based on motifs searching, which are insufficiently conserved to predict at a high level. To compensate for this shortcoming, DNA structural properties, such as curvature, stacking energy and stress-induced DNA duplex destabilization (SIDD), have been used. Of particular concern for this study is the prediction of promoters in the proteobacteria Halothiobacillus neapolitanus c2. We have implemented a method that predicts promoter sequences in this organism by using the DNA SIDD of the genome. Recent studies have shown that SIDD is a distinctive structural attribute of promoter regions. The SIDD predicted promoter-containing sites of this research can be used as targets for experimental verification or further bioinformatics investigation.
First place: Recognition, Categorization, and Characterization of Transposable Elements in a Non-muroid Rodent: Spermophilus Tridecemlineatus
- Neal Platt, Mississippi State University
- Transposable elements are key drivers of genome evolution. Understanding the dynamics of their mobilization will lead to a more complete understanding of speciation mechanisms, morphological evolution, karyotypic megaevolution, as well as population genetics. The release of genomic scaffolds from the Spermophilus tridecemlineatus sequencing project presents the opportunity to study the history of transposable element mobilization in a highly diverse rodent lineage. Transposable elements recovered and characterized using various computational approaches (Repeat Scout, Piler, RepClass, Censor). In total the Spermophilus genome contains >80 transposable elements not found in other species at the sub-family level. Herein 44 uncharacterized elements are presented. Class I elements, the retrotransposons, dominate the genomic landscape with Spermophilus specific SINEs, LINEs, and LTRs occupying 17.1%, 8.9%, and 4.4% of the genome respectively. Class II elements, the DNA transposons, do not appear to be actively amplifying and occupy less than 0.2% of the genome. Though in an unassembled form, data gleaned from Spermophilus is important in a phylogenetic context. Representing a basal rodent linage, comparisons between Rattus and Mus are important to gain a more complete understanding of transposable element dynamics in rodents. More work remains to fully elucidate the dynamics of transposable element mobilization in the genome of Spermophilus; however data presented herein represents a crucial starting point.
Second place: West Nile Virus Infection in Humans: Trends from 2003-2008 in Mississippi and its Neighboring States
- Gabrielle Cooper, Jackson State University
- West Nile Virus (WNV) is a single stranded, RNA flavivirus carried by birds and transmitted to humans by Culex mosquitoes. In North America, this zoonotic disease was first discovered in New York in 1999 and now the dominant vector-borne disease in the continent. WNV infection is a seasonal epidemic that occurs mostly in summer months and continues through the fall. In our previous study, we noticed that after Hurricane Katrina, the incidence of West Nile Neuroinvasive Disease (WNND) sharply increased in the hurricane-affected regions of Louisiana and Mississippi. Our objective was to continue to analyze the trends of the number of human infections in Mississippi and its neighboring states of Alabama, Arkansas, Louisiana and Tennessee. As before, we obtained data from the Center for Disease Control and Prevention on the number of cases that were reported for each state for the year 2008. During 2008, Mississippi and its surrounding States experienced approximately a 60% decrease (422 to 160) in the total cases of human WNV infections and reported a 75% (28 to 4) decrease in fatalities since 2006. In the Southeastern region, for the past three years Mississippi continues to report the highest number of infection cases. Though Mississippi has reported a significant decrease in the number of human infections with West Nile Virus, it continues to rank among the top 5 states in the overall incidence in the United States.
Third place: Computational Analysis of Bovine Viral Diarrhea Virus Infected Monocytes: Identification of Cytopathic and Non-Cytopathic Strain Differences.
- Mais Ammari, Mississippi State University
- Computational tools for high throughput biological data set analyses are designed to accelerate knowledge discovery in a rapid, accurate and efficient manner. However, biologists need to evaluate and apply appropriate tools for data analyses. Here we describe a combinatorial computational work flow that includes Gene Ontology and pathways analysis to proteomic datasets from a non-model organism. Pathogenesis of the disease caused by Bovine Viral Diarrhea Viruses (BVDV) in cattle is complex and involves persistent latent infection and immune suppression with a non-cytopathic (ncp) strain during early gestation, followed by an acute infection by a cytopathic (cp) strain. The molecular mechanisms that underscore the immune suppression in cattle caused by BVDV are not well understood. Using comparative proteomics, we evaluated the effect of cp and ncp BVDV in bovine monocytes to determine their role in viral immune suppression and uncontrolled inflammation. Proteins were isolated by differential detergent fractionation and identified by 2D-LC ESI MS/MS. We carried out GO based modeling using AgBase computational tools. Pathway analysis was carried out using Ingenuity Pathways Analysis (IPA). GO based modeling allowed capturing species specific information while IPA analysis facilitated ortholog based transfer of biological information. This combinatorial approach identified strain-related differences in significant biological functions and pathways that could explain the observed biological differences. We will discuss the identified molecular mechanisms as well as the advantages/limitations of the applied computational methods.
First place: Transcriptional time lagged information approach to improving the accuracy of gene regulatory network reconstruction
- Vijender Chaitankar, University of Southern Mississippi
- Traditional computational methods for inferring gene regulatory networks from time series cDNA microarray data do not generally consider transcriptional time lag between genes. Zou et.al showed that time lag plays an important role in inference accuracy as evident in their Dynamic Bayesian Network based approach. In this paper, we aim to develop a gene regulatory network (GRN) inference scheme that implements information theoretic approaches and considers the time lag between genes. In particular, our scheme implements the predictive minimum descriptive length (PMDL) approach based on the mutual information (MI) and conditional mutual information (CMI) estimates from the microarray data. However, we considered a default time lag of 1 while calculating the MI and CMI quantities; whereas, the time series microarray data is generally not captured at regular time intervals. To consider the effects of time-lags on inference accuracy, we have introduced two new parameters in our GRN inference scheme: time-lagged MI and time-lagged CMI. The time-lags between the genes were computed and these new information theoretic metrics allowed us to devise a novel GRN inference scheme which provides higher inference accuracy based on precision and recall quantities. We will also show the performance of the algorithm across different time points. Such analysis will show how the algorithm accuracy depends on different sizes of time-series microarray data. Our approach involves a user defined threshold, for which we will present the sensitivity analysis to give the user an idea about what range of threshold should be used.
Second place: GORIF: A Tool for Generifs to Gene Ontology
- Lakshmi Pillai, Mississippi State University
- Functional annotation of gene products is done by reviewing published experimental literature or inferred computationally based on sequence and structure. The Gene Ontology (GO) is the de facto standard for functional annotation and relies on both manually reviewed and computationally derived annotations. Manual GO biocuration provides the highest quality annotations but the process is protracted and requires highly trained experts. Computationally derived annotations rapidly provide broad coverage of GO but are not species specific or as detailed as literature biocuration. Our aim was to create a simplified master or â€˜at-a-glanceâ€™ table for GO annotation that was more rapidly generated than manual biocuration but higher quality than electronic. We wrote a perl script (GORif) to mine statements from the NCBI gene Reference Into Function file and tested using an example chicken immune gene dataset. GeneRIFâ€™s are statements added by domain experts based upon published papers and serve as a source of semi-reviewed biological knowledge for a gene set. Because natural language processing is extremely complex, all geneRIFâ€™s could not be mined computationally and some were still biocurated. In addition, advanced computational stemming techniques improved results. The resulting annotationsâ€”in chart form where genes are scored based on their participation in specific immune processes in a positive (pro/+1), negative (anti/-1) or neutral fashion (0)â€”can then be used for rapid quantitative modeling of functional genomics data. GORif helps fill the void between high quality biocuration and low quality electronic annotation for rapid computational systems biology modeling.
Third place: Design of a DNA-Based Shift Register
- Christy Gearheart, University of Louisville
- DNA-based circuit design is an area of research in which traditional silicon-based technologies are replaced by naturally occurring phenomena taken from biochemistry and molecular biology. This research focuses on further developing DNA-based methodologies to mimic digital data manipulation. This research concentrates on the manipulation of data, demonstrating how information can be parsed through a digital circuit comprised on DNA - based logic gates. A novel logic gate design based on chemical reactions is presented in which observance of double stranded sequences indicates a truth evaluation. Circuits are obfuscated by removing of physical sequence connections, allowing client-specific representative strands for input sequences, altering the input sequence strands over time, and varying the input sequence length. Shifting along the input stream to parse individual inputs is accomplished through alternative splicing of DNA sequences stored in plasmid vectors.