The Wisdom of Crowds for Gene Network Inference and Disease
- Thu, January 17, 2013 @ 10:00 AM
- School of Medicine
- Biomedical informatics
- HSEB 2110
- 26 S 2000 E (Google Map Link)
- City, State, Zip
- Salt Lake City, UT 84112
- Jo Ann Thompson
Gustavo Stolovitzky, PhD -
- Event Audience
- Open to Public
The opportunities opened up by the diverse and rich biological datasets currently being generated are enormous. To enhance the impact that these data can have in increasing our understanding of biological systems we created an initiative called DREAM (Dialogue for Reverse Engineering Assessment and Methods). DREAM's mission is to engage the computational biology community in contributing to the solution of important problems in biomedical research by posing double-blind prediction challenges to the public. The blind nature of the challenge allows us to evaluate predictions in a rigorous framework thus avoiding the traps of self-assessment. The many predictions submitted to the community allow us to tap on the "wisdom of the crowds", by which we mean the phenomenon that the aggregate prediction is often better than the best individual prediction.
In this talk I will discuss the DREAM5 Gene Network Inference challenge, which addressed the important problem of reconstructing gene-gene interactions from high throughput data. We performed a comprehensive blind assessment of over 30 network inference methods on E. coli, S. aureus, S. cerevisiae and in-silico microarray data. We observed that no single inference method performs optimally across all data sets. In contrast, integration of predictions from multiple inference methods shows robust and high performance across diverse data sets.
Blind challenges can be also be used to enable some degree of risk assessment in an industrial research context, a concept that we call IMPROVER (Industrial Methodology for Process Verification in Research). This methodology will be exemplified with the recent "IMPROVER Diagnostic Signature Challenge". Fifty-four teams developed predictive models in four disease areas including Multiple Sclerosis, Lung Cancer, Psoriasis, and Chronic Obstructive Pulmonary Disease, which were tested on blinded newly generated data. While some methods performed very well for all the tasks, we found that no method performed best in all disease areas, indicating that data and method have to be carefully matched. The wisdom of crowds was used to determine that the difficulty in predicting disease phenotype depends mostly on the endpoint and not on the ingenuity of the predictive more. We conclude that while Psoriasis and Lung Cancer can be accurately classified with microarray data, COPD and Multiple Sclerosis need additional data to allow for a molecular prediction.