HPC Tools for Image Analysis
Management, Query, and Processing of Large Scale Confocal Microscopy Images
Biomedical imaging plays a crucial role in basic research and clinical studies in biomedicine. In recent years, researchers have enjoyed rapid advances in such imaging technologies as magnetic resonance imaging, digital high-power light microscopy, and digital confocal microscopy. A researcher can now collect detailed measurements about biological entities rapidly. Information synthesized from such measurements can be used in testing of specific scientific hypotheses, diagnosis of complex diseases, assessment of the effectiveness of treatment regimens, and study of biological processes in vivo. Digital microscopy scanners are capable of capturing multi-Gigapixel images from single slides, thus producing images of sizes up to several tens of Gigabytes each. A research study may have hundreds of slides from a specimen, pushing the size of imaging datasets to multiple Terabytes. The sheer size of the images and the complexity of image processing operations create roadblocks to effective integration of large-scale imaging data in research.
In this joint work between systems researchers from our department and researchers from the Biomedical Informatics Research Network consortium, we are developing techniques and component-based, Grid-enabled middleware systems for storing, querying, and processing extremely large images obtained from digitized confocal microscopy devices. We have developed parallel, out-of-core techniques for different classes of data processing operations commonly employed on images from confocal microscopy scanners. These techniques are combined into data pre-processing and analysis pipelines using a component-based middleware system. Our approach enables execution of applications on a parallel storage or compute cluster. We embed parallel computing techniques as part of our distributed framework in order to optimize the execution of each operation within a cluster system. The implementation also facilitates efficient execution of the data processing pipeline across heterogeneous collections of clusters. It employs combined task- and data-parallelism, allowing portions of the pipeline to run on different clusters, while taking advantage of parallel execution within a cluster. The ability to execute the data processing pipeline across multiple clusters can make it possible for collaborating research groups to take advantage of the processing and storage capacity of individual high-performance Grid nodes located at each collaborating institution.
Project Researchers
Ashish Sharma, Ph.D.
Project Publications
Publications |
Vijay S. Kumar, Tahsin M. Kurc, Umit V. Catalyurek, Metin N. Gurcan, Joel H. Saltz, "Performance vs. Accuracy Trade-offs for Large-scale Image Analysis Applications", Proceedings of the IEEE Cluster 2007, 2007. |