INtelligent Data Understanding System

Project Summary

Advances in networks, sensors, storage, computing, and high throughput data acquisition, have led to a proliferation of autonomous, distributed data sources in many areas of human activity. New discoveries in biological, physical, and social sciences and engineering are being driven by our ability to discover, share, integrate and analyze disparate types of data. Statistically-based machine learning algorithms offer some of the most cost-effective approaches to discovery of experimentally testable predictive models and hypotheses from data. However, the large size, distributed nature, and autonomy of the data sources (and the attendant differences in access, queries allowed, processing capabilities, structure, organization, and underlying data models and data semantics) present hurdles to effective utilization of machine learning. This research aims to overcome these hurdles by developing efficient, resource-aware distributed algorithms and software services to support collaborative, integrative knowledge acquisition such a setting. The research team will implement, deploy, and evaluate the resulting algorithms using benchmark data sets, associated data models and ontologies, and user-specified inter-ontology mappings on a distributed test-bed of networked databases and services at Iowa State University and Kansas State University. The resulting open-source software can potentially transform collaborative e-science in the same way that Web has transformed information sharing. Broader impacts of this research include enhanced opportunities for research-based training of graduate and undergraduate students, interdisciplinary collaborations, participation of under-represented groups, and development of increasingly sophisticated software to support collaborative, integrative e-science. The ISU project web site (http://www.cild.iastate.edu/projects/indus.html) together with the KSU web site (http://people.cis.ksu.edu/~dcaragea/mlb/doku.php?id=indus) provide access to information about the project, benchmark data, publications, software, and documentation.

Project Funding

Research Grant #0711356 - Collaborative Research: Learning Classifiers from Autonomous, Semantically Heterogeneous, Distributed Data, National Science Foundation (2007-2010). Vasant Honavar (PI-ISU) and Doina Caragea (PI-KSU).

Project Publications

  • Honavar, V. and Caragea, D. Collaborative Knowledge Acquisition from Semantically Disparate, Distributed Data Sources. Springer UK. To appear, 2008.
  • Honavar, V. and Caragea, D. (2008) Invited Chapter. Towards Semantics-Enabled Infrastructure for Knowledge Acquisition from Distributed Data. In: Next Generation of Data Mining. Eds.: Kargupta, H., Han, J., Yu, P., Motwani, R., and Kumar, V. CRC Press.
  • Caragea, D. and Honavar, V. (2008). Learning Classifiers from Distributed Data Sources. In: Encyclopedia of Database Technologies and Applications, 2nd Ed. Ferraggine, V.E., Doorn, J.H., and Rivero, L.C. (Eds.). Idea Group Publishers. To appear, 2008.
  • Caragea, D. and Honavar, V. (2008). Learning Classifiers from Semantically Heterogeneous Data. In: Encyclopedia of Data Warehousing and Mining, 2nd Ed. Wang, J. (Ed.). Idea Group Publishers. To appear, 2008.
  • Paradesi, M.S.R., Caragea, D., and Hsu, W.H. (2007). Structural Prediction of Protein-Protein Interactions in Saccharomyces cerevisiae. In: Proceedings of the 2007 IEEE 7th International Symposium on BioInformatics and BioEngineering (BIBE'07). Boston, MA.

Acknowledgments

This project is supported by the National Science Foundation under Grant No. 0711356. Any opinions, findings, and conclusions or recommendations expressed on this website are those of the authors and do not necessarily reflect the views of the National Science Foundation.

indus.txt · Last modified: 2009/03/07 23:22 by dcaragea
www.chimeric.de Creative Commons License Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0