Machine Learning and Artificial Intelligence

Knowledge Discovery from Semantically Heterogeneous Data

The goal of this project is to design algorithms for learning classifiers from semantically heterogeneous data sources, classifiers that offer rigorous performance guarantees (relative to their centralized or batch counterparts). Our recent work has focused on theoretically sound approaches to this problem, under fairly general assumptions. Several sets of experiments for evaluating the robustness of this approach in real world scenarios are underway.

For more information, please visit INDUS.

Funding:

  • Research Grant, Collaborative Research: Learning Classifiers from Autonomous, Semantically Heterogeneous, Distributed Data, National Science Foundation (2007-2010). Vasant Honavar (PI-ISU) and Doina Caragea (PI-KSU).

Machine Learning and Bioinformatics

EST Data Analysis

Pipeline for EST data analysis: EST data analysis is an essential first step for all EST projects. Several EST data analysis pipelines are available as web-servers, e.g.: ESTpass, EGassembler and ESTexplorer. However, they all have limitations, e.g., with respect to the amount of data that can be uploaded at once or the type and format of the annotations and statistics they provide. Given the increasing number of EST projects at KSU and the limitations of the publicly available pipelines, we have developed a local ArthropodEST pipeline for EST analysis, using existing open source software tools. A browsable ArthropodEST database will be created in the near future. Other tools for annotation will be added to the pipeline and customized statistics will be available.

Identifying Specialized Salivary Proteins in Aphids: The centrality of aphid saliva to the interaction of aphid and plant has been known for several decades and is unquestioned at a physiological level. However, at the level of individual salivary proteins and enzymes, our knowledge is rudimentary and incomplete at best. The goal of the project is to use bioinformatics and molecular genetics approaches, based on salivary-gland cDNA libraries in the pea aphid, to identify and evaluate the importance of individual proteins of aphid saliva.

Alternative Splicing and Gene Prediction

Alternative splicing provides a mechanism for generating different gene transcripts (isoforms) from the same genomic sequence. Although alternative splicing has been extensively explored in some organisms (e.g., humans and yeast), it has not been much studied in insects. Thus, in this project, we focus on tools for identifying alternative splicing of mRNA in insects.

Gene Regulatory Networks

We are exploring machine learning approaches (e.g., Dynamic Bayesian Networks) for constructing gene regulatory networks. In this context, we are also interested in identifying transcription factor binding sites and subsequently transcription factors that are involved in regulating a gene. Our goal is to use the resulting networks to predict how variation in genes affects the overall pathways and consequently, responses to different conditions or environments.

Funding:

  • Research Grant, Computational Methods to Characterize Regulatory Networks Involved in Plant Response to Abiotic Stresses. KSU Ecological Genomics Seed Grant (2008-2009). Haiyan Wang (PI), Doina Caragea (co-PI) and Susan J. Brown (co-PI), $35,589.
  • Advanced Genomics at K-State: Ultra-High Throughput DNA Sequencing. KSU Targeted Excellence Program (2008-2011). Eduard Akhunov, Bikram Gill, Frank White, Karen Garrett, James Nelson, Susan Brown, Loretta Johnson, Michael Herman, Jianming Yu, and Sanjeev Narayanan, Ludek Zurek and Doina Caragea as co-principal investigators, $850,000.
projects.txt · Last modified: 2009/02/06 21:13 by dcaragea
www.chimeric.de Creative Commons License Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0