To advance our fundamental understanding of the complex behaviors and interactions among the various Earth processes that are critical to the health, resilience and sustainability of water resources, scientists need to be able to use diverse data and integrate models outside their own disciplines with sufficient model accuracy and predictability. Currently, however, this is very difficult to accomplish: (1) a vast quantity of diverse data are not readily accessible to models; and (2) diverse models developed individually by different research groups are difficult to share and integrate cross disciplines. To address these critical challenges, we propose to develop an open and sustainable framework software in cyberinfrastructure (CI) that enables easy and incremental integration of diverse data and models for knowledge discovery and interdisciplinary team-work, and also enables reproducible computing and the seamless and on demand access to various HPC resources which are essential and desirable for communities. Our proposed project is fundamentally important and addresses urgent need in enabling new scientific advances for all water related issues.
The goal of this NSF funded multi-institution multi-discipline project is to build an open data, open modeling framework software - a new cyberinfrastructure called CyberWater. CyberWater will expedite the process for fundamental knowledge discoveries and significantly reduce the time and effort on the part of users. Not only does it ease the way to bring together diverse data and model integration, model testing/validations/comparisons, etc. while ensuring reproducible computing, CyberWater also enables access to HPC facilities on demand. Our proposed framework is based on scientific workflow.
To build an open and sustainable modeling framework for scientific communities, we need to allow various research groups or individuals to freely and easily contribute their computational models, use/integrate models developed by others, have data (used by models) directly and automatically retrieved from various external data sources, easily perform reproducible computing, and have easy and on demand access to HPC facilities. Individually developed models are often diverse and heterogeneous, each with its own interface requirements regarding input/output data specifications and control parameters. The novel meta-level architecture developed for the project will effectively support communication among an assembly of diverse computational models via model agents in a workflow environment, significantly reducing the current complexity of model integration. We further design a model agent tool enabling users to generate model agents for common model types without coding. Also, when individual models are integrated, neither model source code nor recompilation is required for model sharing. In addition to direct access of various external data sources via machine-to-machine communication, the framework provides data fusion, data visualization, and provenance management, as well as intelligent HPC middleware and the integration of NSF-funded HPC effort of Apache Airavata, to easily and remotely use HPC facilities (i.e., clusters, grids, and clouds), as needed.
The challenges of diverse data and model integration are longstanding. The proposed framework software development innovatively combines open data access and open modeling framework and simultaneously address several critical issues faced by the water communities and beyond. The proposed project is expected to have profound and broad impacts in diverse fields. Scientists can use this newly developed framework software to greatly simplify their data access, model studies and model coupling, reproducible computing, and easy access to HPC facility--all of which significantly enhance research productivity. Our diverse team of hydrologists, climate experts, computer scientists and CI experts, affiliated with several universities and CUAHSI, will ensure the development of the proposed framework software with high quality software assurance, maximal community engagement and collaboration, and broadest impacts. This developed framework software in CI is adaptable and extensible to other scientific and engineering communities. CyberWater will also provide a valuable education tool for training students.
CyberWater provides a scientific workflow environment, which is not only particularly suitable for testing different hypotheses, theories, and models for complex data and model integration, but also fosters cross-community development, engaging domain scientists and CI experts. Scientists currently do not benefit from the many advantages of modern workflow environments, such as recording of provenance and reusable utilities (e.g., for re-gridding, data fusion, structural analysis, and visualization). To foster partnerships and facilitate various interdisciplinary groups or individuals in broader communities to freely and easily share and integrate their computational models, CyberWater neither requires researchers to submit their model source code nor changes/recompiles model source code for model integration. Individual models for sharing only need to provide executables, which dramatically simplifies interdisciplinary model applications and supports collaborative works. This feature will also facilitate the coupling of research codes with commercial software, often available only as executables. Users (teams or individuals) can operate and manage a model application using CyberWater without any central service and administration of CyberWater, thus making our CI approach extremely sustainable. Our project team includes hydrologists, climate experts, meteorologists, computer scientists and CI experts, from multiple universities and CUAHSI, and collaborates closely to ensure CyberWater, once fully developed, will indeed engage the broad communities for domain scientists’ benefits. Moreover, CyberWater will also serve as a valuable education tool to teach/train university students how to conduct interdisciplinary and reproducible research, thus greatly enhancing our university STEM education capability for future workforce.
CyberWater can dramatically lower the hurdle for a model user with limited skills in computer and technology and assist a user to quickly employ models from other domains. It will advance the state of the art by making it possible to quickly, effectively and reliably connect diverse models together to form a comprehensive integrated modeling framework for tackling emerging complex inter-disciplinary problems using diverse sources of data to produce fundamental knowledge discoveries. The current situation is such that a large interdisciplinary team must be formed to modify model code (a difficult and error-prone process) to couple the diverse domain models together. The innovations of CyberWater include:
1. significantly reducing model integration complexity, and allowing easy coupling of diverse models without coding
2. facilitating the use of diverse data by automatically ingesting data of heterogeneous types, formats, and access protocols, etc.
3. automating the flow from data to model to output visualization in real- or near real-time
4. supporting reproducible computing
5. enabling HPC on demand
- Luna, D., Chen, R., Yuan, C., Liang, Y., Liang, X., Bales, J., Castronova, A.M., Demir, I., Hooper, R.P., Krajewski, W.F. and Lin, L., 2019. CyberWater—An open and sustainable framework for diverse data and model integration. AGUFM, 2019, pp.IN11B-02.
- Liang, X., Liang, Y., Luna, D., Chen, R., Cao, Y., Fu, Y., Pamidighantam, S.,Song, F., Bales, J., Castronova, A., Demir, I., Hooper, R., Krajewski, W., Lin, L., Mantilla, R., Zhang, Y., 2020. Collaborative Research: CyberWater - An open and sustainable framework for diverse data and model integration with provenance and access to HPC. NSF CSSI PI Meeting, Seattle, WA.
- Contact / Get Involved
- Xu Liang (University of Pittsburgh), Project lead PI
- Yao Liang (Indiana University - Purdue University Indianapolis), IUPUI PI
- Fengguang Song (Indiana University - Purdue University Indianapolis), IUPUI Co-PI
- Sudhakar Pamidighantam (Indiana University), IUPUI Co-PI
- Dimuthu Upeksha (Indiana University), IUPUI/IU Programmer
- Ibrahim Demir (University of Iowa), Iowa PI
- Witold Krajewski (University of Iowa), Iowa Co-PI
- Ricardo Mantilla (University of Iowa), Iowa Co-PI
- Yang Zhang (North Carolina State University), NC State PI
- Lan Lin (Ball State University), BSU PI
- Anthony Castronova (CUAHSI), CUAHSI PI
- Richard Hooper (Tufts University), Paid Consultant through CUAHSI
- Jennifer Adam (Washington State University), Unpaid Collaborator
- Jonathan L. Goodall (University of Virginia), Unpaid Collaborator
- Nacy Wilkins-Diehr (San Diego Supercomputer Center), Unpaid Consultant
- Ge Sun (U.S. Department of Agriculture), Unpaid Consultant
- Yingping Wang (CSIRO, Australia), Unpaid Consultant
- Dave Meyer (NASA), Unpaid Consultant