Hydroinformatics Blog - Big Data Dreaming! A 42-Year CONUS Hydrologic Retrospective

Posted Jul 13, 2022

Hydroinformatics Blog Post

Organized by the CUAHSI Informatics Standing Committee. Contributions are welcome, please contact Veronica Sosa Gonzalez at email hidden; JavaScript is required.

By: Aubrey Dugger, Research Applications Lab, National Center for Atmospheric Research

The U.S. National Oceanic and Atmospheric Administration (NOAA) National Water Model (NWM) is a hydrologic prediction system that has been running in an official operational mode since 2016, making near-real-time predictions of water states (e.g., snowpack, soil moisture, lake levels) and fluxes (e.g., streamflow, evapotranspiration) across the Contiguous U.S. (CONUS), Hawaii, Puerto Rico, the U.S. Virgin Islands, and soon South-Central Alaska. The system is currently built on the National Center for Atmospheric Research (NCAR) WRF-Hydro community hydrologic modeling system, which blends the highly parallelizable, high-performance-computing capabilities from large regional-to-global weather models with the hydrologic processes and "neighborhood-level" scales necessary for applications like flood forecasting and water management. To understand baseline model performance and provide a resource to the community, as well as for statistical referencing of various NWM forecast products, NOAA and NCAR produced a 42-year (February 1979 through December 2020) National Water Model version 2.1 (NWM v2.1) CONUS retrospective. This approximately 50 TB dataset includes:

  • 3-hourly, 1-km estimates of surface energy budget terms, snowpack attributes, soil moisture, and major water fluxes (e.g., evapotranspiration, snowmelt)

  • 3-hourly, 250-m of shallow water table depth

  • 1-hourly estimates of streamflow at approximately 2.7 million NHDPlus v2 reaches

  • 1-hourly inflows, outflows, and water levels at approximately 5,800 lakes and reservoirs

In addition to hydrologic model outputs, the retrospective data repository also includes the 42-year meteorological forcings based on NOAA's Analysis of Record for Calibration (AORC) product, including:

  • 1-hourly, 1-km estimates of precipitation, air temperature, incoming shortwave and longwave radiation, wind speed (U and V components), humidity, and surface pressure

All inputs/outputs are available in native format as individual NetCDF files per timestep (e.g., hourly, 3-hourly) for the full CONUS model extent. New for NWM v2.1, this data is also available in a cloud-optimized Zarr format (as described below). See the NOAA Open Data Dissemination Program's NWM retrospective page for full details on the dataset.

Methods and Infrastructure

The NWM v2.1 retrospective data repository is publically available through resources like Amazon Web Services and Google Cloud. However, aspiring explorers into the depths of this huge data repository will need to leverage techniques and tools from hydroinformatics to query, analyze, and visualize the data to answer scientific questions. A number of terrific resources from the community have come online over the past year to help on this journey.

Rich Signell from the U.S. Geological Survey (USGS) released a set of tutorials on how to (1) process an earlier version of the model retrospective (NWM v2.0) channel outputs into Zarr format, and (2) analyze the dataset using Xarray data access and Dask parallelization tools. James McCreight from the USGS and Ishita Srivastava and Yongxin Zhang from NCAR worked with Rich to adapt his workflow for the NWM v2.1 repository and extend the Zarr conversion to other NWM data outputs (e.g., 2D surface fields). They documented their data structures, code, and workflows on GitHub.

Mike Johnson from NOAA/Lynker and Dave Blodgett from the USGS created THREDDS data services for streamflow from the earlier NWM v1.2 and v2.0 retrospectives, available on CUAHSI's HydroShare with code documented on GitHub. This resource allows users to easily query data for targeted locations and time periods without having to download the full data repository. They also developed an R-based API, nwmTools, for streamlining query and analysis of streamflow from all three versions of the NWM retrospective. This work was funded through a CUAHSI Hydroinformatics Innovation Fellowship. Dave also prepared a thinned and streamlined NetCDF version of the NWM v2.1 streamflow time series for about 18,000 USGS streamflow and water quality gauge locations across the U.S. to support a wide range of research applications.

Beyond streamflow, Irene Garousi-Nejad from Utah State University, documented on HydroShare her workflows for extracting snowpack outputs from the NWM v2.0 retrospective using Python and Google Cloud utilities, and showed how these extractions could be compared in bulk against in-situ and remote sensing observations.

There are many NWM-related resources I did not capture in this quick round-up. I encourage those who develop strategies, tools, and tutorials around best practices for accessing this rich NWM retrospective dataset to share your resources via CUAHSI HydroShare and CUAHSI Community JupyterHub.

NWM Retrospective Resource Round-Up:

NOAA National Water Model Retrospective Dataset:

Zarr + Xarray Workflows:

THREDDS Data Services:

HydroShare Workflows:

Post-Processed Data Sources:


I am merely a link collector here, so would like to thank the scientists who developed these terrific data resources. These individuals have gone the extra mile to document their data sources, codes, and workflows in accessible, reproducible, and extensible ways so that these resources could be built upon by the broader community, accelerating scientific progress. I would also like to thank Fernando Salas (NOAA Office of Water Prediction), Brian Cosgrove (NOAA Office of Water Prediction), and Dave Gochis (NCAR Research Applications Lab) for contributions to this posting as well as their broader work facilitating community engagement with the NWM.

About the author: Aubrey is a scientist and project manager at NCAR. She uses models to explore human impacts on hydrologic systems and is an incorrigible advocate for the good work of others.