DataCutter
DataCutter is a framework designed to provide support for processing of large scientific datasets in heterogeneous environments. It supports a filter-stream programming model for executing application-specific data processing as a network of components, referred to as filters. This model enables combined use of task- and data-parallelism. The runtime system supports execution of filters on heterogeneous collections of storage and compute clusters in a distributed environment. Multiple copies of a filter can be instantiated. Processing, network, and data copying overheads can be minimized by the ability to place filters and filter copies on different platforms. The runtime system also allows multiple instances of a network of filters to be executed concurrently.
The development of DataCutter is funded in part by NPACI, NSF, and DOE agencies. DataCutter is employed for application development in projects supported by DOE ASCI, PACI, DOD, and DARPA (HUBS). These applications include a virtual microscope for browsing digitized pathology slides, an isosurface rendering application, analysis of distributed datasets generated from an ensemble of oil reservoir simulations, and an image processing toolkit. The DataCutter infrastructure includes interfaces to Globus, Storage Resource Broker (SRB), and Network Weather Service (NWS) in order to leverage the security, authentication, remote file access, resource monitoring and allocation services provided by those toolkits.
Project Researchers
Umit Catalyurek, Ph.D.
Tahsin Kurc, Ph.D.
Shannon Hastings, M.S.
Stephen Langella, M.S.
Project Funding Participation
SOFTWARE: Job Scheduling for Data Centers with Multi-level Storage Systems
Next Generation Software (NGS): An Integrated Middleware and Language/Compiler for Data Intensive Applications in a Grid Environment
Software support for generating data products from very large datasets
Project Publications
Publications |
Vijay S. Kumar, Benjamin Rutt, Tahsin M. Kurc, Umit V. Catalyurek, Sunny Chow, Stephan Lamont, Maryann Martone, Joel H. Saltz, "Large Image Correction and Warping in a Cluster Environment", Proceedings of the 2006 ACM/IEEE conference on Supercomputing (SC2006), 2006: pp. 38-38. |
Sivaramakrishnan Narayanan, Tahsin M. Kurc, Umit V. Catalyurek, Joel H. Saltz, "Servicing Seismic and Oil Reservoir Simulation Data", 2005. |
Manish Parashar, Vincent Matossian, Wolfgang Bangerth, Hector Klie, Benjamin Rutt, Tahsin M. Kurc, Umit V. Catalyurek, Joel H. Saltz, Mary F. Wheeler, "Towards Dynamic Data-Driven Optimization of Oil Well Placement", Lecture Notes in Computer Science, 2005: pp. 656-663. |
Shannon L. Hastings, Lone Aagesen, Lone Aagesen, Muhammad Ali, Muhammad Ali, Tahsin M. Kurc, Stephen Langella, Umit V. Catalyurek, Tony C. Pan, Joel H. Saltz, "Image Processing for the Grid: A Toolkit for Building Grid-enabled Image Processing Applications", Proceedings of the 3rd International Symposium on Cluster Computing and the Grid, 2003: pp. 36-43. |
Umit V. Catalyurek, Michael Gray, Tahsin M. Kurc, Joel H. Saltz, Eric A. Stahlberg, Renato A. Ferreira, "A Component-based Implementation of Multiple Sequence Alignment", Proceedings of the 2003 ACM Symposium on Applied Computing, SAC2003, 2003: pp. 122-126. |
Sivaramakrishnan Narayanan, Tahsin M. Kurc, Umit V. Catalyurek, Joel H. Saltz, "Database Support for Data-Driven Scientific Applications in the Grid", Parallel Processing Letters, 2003: pp. 245-271. |
Matthew Spencer, Renato A. Ferreira, Michael Beynon, Tahsin M. Kurc, Umit V. Catalyurek, Alan Sussman, Joel H. Saltz, "Executing Multiple Pipelined Data Analysis Operations in the Grid", Proceedings of the 2002 ACM/IEEE SC02 Conference, 2002: pp. 1-18. |
Henrique Andrade, Tahsin M. Kurc, Alan Sussman, Joel H. Saltz, "Exploiting Functional Decomposition for Efficient Parallel Processing of Multiple Data Analysis Queries", Proceedings of International Parallel and Distributed Processing Symposium (IPDPS 2003), 2002: pp. 10. |
Michael Beynon, Henrique Andrade, Joel H. Saltz, "Low-Cost Non-Intrusive Debugging Strategies for Distributed Parallel Programs", Proceedings of the 4th IEEE International Conference on Cluster Computing, 2002: pp. 439-442. |
Henrique Andrade, Tahsin M. Kurc, Alan Sussman, Eugene Borovikov, Joel H. Saltz, "On Cache Replacement Policies for Servicing Mixed Data Intensive Query Workloads", Proceedings of the Second Workshop on Caching, Coherence and Consistency (WC3-02), 2002. |
Michael Beynon, Chialin Chang, Umit V. Catalyurek, Tahsin M. Kurc, Alan Sussman, Henrique Andrade, Renato A. Ferreira, Joel H. Saltz, "Processing Large-Scale Multidimensional Data in Parallel and Distributed Environments", Parallel Computing (special issue on parallel data-intensive algorithms and applications), 2002: pp. 827-859. |
Henrique Andrade, Tahsin M. Kurc, Joel H. Saltz, Alan Sussman, "Multiple Query Optimization for Data Analysis Applications on Clusters of SMPs", Proceedings of the IEEE International Symposium on Cluster Computing and the Grid (CCGrid2002), 2002: pp. 154. |
Umit V. Catalyurek, Eric A. Stahlberg, Renato A. Ferreira, Joel H. Saltz, "Improving Performance of Multiple Sequence Alignment Analysis in Multi-client Environments", Proceedings of the First International Workshop on High Performance Computational Biology (HiCOMB 2002, IPDPS 2002), 2002: pp. 0183b. |
Michael Beynon, Tahsin M. Kurc, Umit V. Catalyurek, Alan Sussman, Joel H. Saltz, "Efficient Manipulation of Large Datasets on Heterogeneous Storage Systems", Proceedings of 16th International Parallel and Distributed Processing Symposium (IPDPS), The 11th Heterogeneous Computing Workshop (HCW 2002), 2002: pp. 0084. |
Henrique Andrade, Tahsin M. Kurc, Umit V. Catalyurek, Alan Sussman, Joel H. Saltz, "Persistent Caching in a Multiple Query Optimization Framework", Proceedings of the Sixth Workshop on Languages, Compilers and Run-time Systems for Scalable Computers, 2002. |
Henrique Andrade, Tahsin M. Kurc, Alan Sussman, Joel H. Saltz, "Scheduling Multiple Data Visualization Query Workloads on a Shared Memory Machine", Proceedings of the Fifth Merged IPPS/SPDP (International Parallel Processing Symposium & Symposium on Parallel and Distributed Processing), 2002: pp. 11-18. |
Michael Beynon, Tahsin M. Kurc, Umit V. Catalyurek, Chialin Chang, Alan Sussman, Joel H. Saltz, "Distributed Processing of Very Large Datasets with DataCutter", Parallel Computing, 2001: pp. 1457-1478. |
Michael Beynon, Alan Sussman, Umit V. Catalyurek, Tahsin M. Kurc, Joel H. Saltz, "Performance Optimization for Data Intensive Grid Applications", Proceedings of the Third Annual International Workshop on Active Middleware Services (AMS2001), 2001: pp. 97-105. |
Tahsin M. Kurc, Umit V. Catalyurek, Chialin Chang, Alan Sussman, Joel H. Saltz, "Exploration and Visualization of Very Large Datasets with the Active Data Repository", IEEE Computer Graphics & Applications, 2001: pp. 24-33. |
Tahsin M. Kurc, Tahsin M. Kurc, Umit V. Catalyurek, Umit V. Catalyurek, Chialin Chang, Chialin Chang, Joel H. Saltz, Alan Sussman, Joel H. Saltz, "Visualization of Large Datasets with the Active Data Repository", IEEE Computer Graphics and Applications, 2001: pp. 24-33. |
Tahsin M. Kurc, Henrique Andrade, Joel H. Saltz, Tahsin M. Kurc, Alan Sussman, Joel H. Saltz, "Efficient Execution of Multiple Query Workloads in Data Analysis Applications", Proceedings of the 2001 ACM/IEEE conference on Supercomputing (SC2001), 2001: pp. 53. |
Alan Sussman, Larry Davis, Eugene Borovikov, "An Efficient System for Multi-Perspective Imaging and Volumetric Shape Analysis", Proceedings of the Workshop on Parallel and Distributed Computing in Image Processing, Video Processing, and Multimedia (PDIVM'2001), 2001. |
Henrique Andrade, Tahsin M. Kurc, Alan Sussman, Joel H. Saltz, "Decision Tree Construction for Data Mining on Clusters of Shared-Memory Multiprocessors", 2000. |
Tahsin M. Kurc, Michael Beynon, Alan Sussman, Joel H. Saltz, "DataCutter and A Client Interface for the Storage Resource Broker with DataCutter Services", 2000. |
Michael Beynon, Tahsin M. Kurc, Alan Sussman, Joel H. Saltz, "Design of a Framework for Data-Intensive Wide-Area Applications", Proceedings of the 9th Heterogeneous Computing Workshop (HCW2000), 2000: pp. 116-130. |
Gagan Agrawal, Renato A. Ferreira, Joel H. Saltz, Ruoming Jin, "High Level Programming Methodologies for Data Intensive Applications", Proceedings of the Fifth Workshop on Languages, Compilers and Run-time Systems for Scalable Computers, 2000. |
Michael Beynon, Michael Beynon, Alan Sussman, Tahsin M. Kurc, Tahsin M. Kurc, Alan Sussman, Joel H. Saltz, Joel H. Saltz, "Optimizing Execution of Component-based Applications using Group Instances", Proceedings of the IEEE International Symposium on Cluster Computing and the Grid (CCGrid2001), 2000: pp. 56-63. |
Michael Beynon, Renato A. Ferreira, Tahsin M. Kurc, Alan Sussman, Joel H. Saltz, "DataCutter: Middleware for Filtering Very Large Scientific Datasets on Archival Storage Systems", Eighth NASA Goddard Conference on Mass Storage Systems and Technologies/Seventeenth IEEE Symposium on Mass Storage Systems, 2000: pp. 119-133. |
Mustafa Uysal, Anurag Acharya, Joel H. Saltz, "Evaluation of Active Disks for Decision Support Databases", Proceedings of the 6th International Symposium on High-Performance Computer Architecture, 2000: pp. 337-348. |
Chialin Chang, "Cost Models for Query Processing Strategies in the Active Data Repository", 1999. |
Renato A. Ferreira, Alan Sussman, Joel H. Saltz, "Database Methods for Efficient Manipulation of Very Large Datasets", Proceedings of the 1999 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA`99), 1999. |
Michael Beynon, Alan Sussman, Joel H. Saltz, "Performance Impact of Proxies in Data Intensive Client-Server Applications", Proceedings of the 13th International Conference on Supercomputing, 1999: pp. 383 - 390. |
Chialin Chang, Renato A. Ferreira, Alan Sussman, Joel H. Saltz, "Infrastructure for Building Parallel Database Systems for Multi-dimensional Data", Proceedings of the IEEE Second Merged IPPS/SPDP Symposiums, 1999: pp. 582-588. |
Presentations |
Joel H. Saltz, "Biomedical Informatics Research", Industry Collaboration Symposium, Columbus, OH, Presented: 2006-12-04 |
Benjamin Rutt, Tahsin M. Kurc, Umit V. Catalyurek, Joel H. Saltz, "Use of the Teragrid for Sub-surface Modeling and Oil Reservoir Management Studies", Teragrid 2006, Indianapolis, IN, Presented: 2006-06-13 |
Benjamin Rutt, Vijay S. Kumar, Tony C. Pan, Tahsin M. Kurc, Umit V. Catalyurek, Joel H. Saltz, "Distributed Out-of-Core Preprocessing of Very Large Microscopy Images for Efficient Querying", IEEE International Conference on Cluster Computing, Boston, Massachusetts, USA, Presented: 2005-09-28 |
Sivaramakrishnan Narayanan, Tahsin M. Kurc, Umit V. Catalyurek, Joel H. Saltz, "Servicing Seismic and Oil Reservoir Simulation Data through Grid Data Services", Very Large Databases (VLDB) Workshop on Data Management in Grids, Trondheim, Norway, Presented: 2005-09-02 |
Benjamin Rutt, "DataCutter (Overview)", First DIALOGUE Workshop: Applications-Driven Issues in Data Grids, Columbus, OH, Presented: 2005-08-02 |
Umit V. Catalyurek, "Supporting Large Scale Data Driven Science in Distributed Environments", Minisymposium on Distributed Data Management Infrastructures for Scalable Computational Science and Engineering Applications, SIAM Conference on Computational Science and Engineering (SIAM CSE '05), Orlando, FL, Presented: 2005-02-13 |
Shannon L. Hastings, Scott Oster, Stephen Langella, Tahsin M. Kurc, Tony C. Pan, Joel H. Saltz, "GridPacs: A Grid-enabled System for Management and Analysis of Large Image Datasets", SuperComputing 2004 (SC2004), Pittsburgh, PA, Presented: 2004-11-09 |
Joel H. Saltz, "Middleware Support for Data Ensemble Analysis", Presentation at UT Austin, Presented: 2003-06-12 |
Joel H. Saltz, "Middleware for Dynamic Data Driven Application System", ICCS 2003, Melbourne, Australia, Presented: 2003-06-02 |
Tahsin M. Kurc, Mario Lauria, Srini Parthasarathy, Joel H. Saltz, Radhakrishnan Sundaresan, "A Slacker Coherence Protocol for Pull-based Monitoring of on-line Data Sources", CCGrid 2003, Presented: 2003-05-15 |
Tahsin M. Kurc, Joel H. Saltz, "DataCutter Overview", Presented: 2003-04-08 |
Tahsin M. Kurc, "Systems Support for Large Scale Data Exploration and Analysis in Earth Systems Sciences", Presented: 2003-03-17 |
Joel H. Saltz, "Middleware Support for Data Ensemble Analysis", Lawrence Livermore National Laboratory, Presented: 2003-03-17 |
Umit V. Catalyurek, "A Component-based Implementation of Multiple Sequence Alignment", 18th ACM Symposium on Applied Computing (SAC2003) Bioinformatics Track, Presented: 2003-03-10 |
Umit V. Catalyurek, "Executing Multiple Pipelined Data Analysis Operations in the Grid", 2002 ACM/IEEE SC2002 Conference, Presented: 2002-10-21 |
Joel H. Saltz, "Dynamic Data Driven Application Systems", Clusters and Computational Grids for Scientific Computing 2002, Presented: 2002-09-10 |
Shannon L. Hastings, Stephen Langella, Joel H. Saltz, "Image Processing for the Grid", Presented: 2002-09-09 |
Umit V. Catalyurek, "A Hypergraph-Partitioning Approach for Coarse-Grain Decomposition", 2001 ACM/IEEE SC2001 Conference, Presented: 2001-10-14 |
Umit V. Catalyurek, "A Fine-Grain Hypergraph Model for 2D Decomposition of Sparse Matrices", International Parallel and Distributed Processing Symposium (IPDPS), Irregular 2001, Presented: 2001-04-23 |
Asif Ahmad, Jyoti Kamal, "Cleansing and Geocoding Spatial Data", Presented: 2001-03-01 |
Tech Reports |
Michael Beynon, Tahsin M. Kurc, Umit V. Catalyurek, Joel H. Saltz, "A Component-based Implementation of Iso-surface Rendering for Visualizing Large Datasets", Issued: 2001-05-01 |
Abstracts |
Tahsin M. Kurc, Wolfgang Bangerth, Hector Klie, Mrinal Sen, Paul L. Stoffa, Mary F. Wheeler, Umit V. Catalyurek, Benjamin Rutt, Joel H. Saltz, Manish Parashar, "‘Where is my oil, dude?’ Supporting Dynamic, Data-Driven Oil Reservoir Simulation Studies on the Grid", (2004-11-06 to 2004-11-12), Pittsburgh |
Tony C. Pan, Shannon L. Hastings, Stephen Langella, Tahsin M. Kurc, Umit V. Catalyurek, Joel H. Saltz, "Utilizing Advances in Grid Computation to Enhance Clinical and Scientific Research", (2003-12-02 to 2003-12-05), Chicago |
Henrique Andrade, Tahsin M. Kurc, Alan Sussman, Joel H. Saltz, "Multiple-Query Optimiztation Support for the Virtual Microscope", (2001-10-04 to 2001-10-06), Pittsburgh |
Umit V. Catalyurek, Tahsin M. Kurc, Alan Sussman, Joel H. Saltz, "Improving the Performance and Functionality of the Virtual Microscope", (2000-10-26 to 2000-10-28), Pittsburgh |
Michael Beynon, Joel H. Saltz, Mustafa Uysal, Alan Sussman, "Exploration, Manipulation and Processing of Very Large Data Sets Using Filters", (1999-10-14 to 1999-10-16), Pittsburgh |