Many granting agencies like the National Science Foundation require a Data Management Plan for all proposals (click here for full details). Browse the information below to learn how CUAHSI can support your next proposal. 

  • Overview

    CUAHSI has several services available to the community that can address many aspects of data management, and has also developed software and guidance for using commercial and standards based software that can help you manage your own data to fulfill this requirement.

    To meet NSF's Data Management Policy guidelines, your data management plan should include the following information: Browse the tabs below to learn about, and see example language for, aspects of a data management plan:

    • Data Inventory
    • Data and Metadata Standards
    • Data Use, Privacy, and Sharing Policies
    • Data Life Cycle Management

    For each section, we have provided an overview of what kind of information should be included and some simple example language and content. In addition, we have compiled specific guidance for NSF Principal Investigators based on the type of data a given project generates.

    We also encourage investigators to consult their universities for data curation and management options as well as with NSF program officers and other appropriate resources for their specific data needs.

  • Data Inventory

    Each Data Management Plan should include a data inventory that identifies and describes the structure of the data that will result from the proposed project. You should identify the type of data you will generate (e.g., observations, model output, etc.) and provide detail on its spatial and temporal structure. The type(s) of data you work with determine in large part the structure of the remainder of the Data Management Plan. For example, your data inventory may include:

    • Data generated from fixed point, repeated sampling, such as time series data. The sampling scheme (i.e., regular or irregular) and intervals should be identified. Examples of this type of data include river discharge, soil moisture at a point, etc.
    • Synoptic or survey sampled data, such as distributed points sampled simultaneously once or at regular or irregular intervals. Scope, location and sampling information should be included. Examples may include wet weather sampling events or ecosystem surveys.
    • GIS coverages or other spatial datasets. Data structure (i.e., vector or raster), resolution and timestamp information should be included. Examples may include land use coverages or stream networks.
    • Time varying regular or irregular gridded data. Examples may include radar precipitation products or output from groundwater models. Your data inventory should include spatial and temporal resolution of the data sets, as well as model or instrument specifications.
    • High resolution data clouds, such as high resolution ground based or airborne LIDAR.

    Example Language

    If your project generates standard time series data, your data inventory section might include information such as:

    • The project will generate daily observations at 10 point locations, for a period of two years. Sampling will occur at a set time each day, and observations will be recorded by a data logger. Data will downloaded weekly and stored as an Excel file on a project server.
  • Data and Metadata Standards

    Depending on the type of your data, you will should to specify the standards for both the data and metadata you will maintain. In many cases, listing the existing standards for your data type should adequately meet this requirement. Existing standards for many types of data can be found at the following resources.

    • Time Series and synoptic data: CUAHSI Hydrologic Information System (HIS) established standards for data and metadata based on the Observations Data Model (ODM). The ODM is standard database schema for use in the storage of point observations in a relational database; full data and metadata specifications are here

      PIs may cite that they will use ODM and follow HIS data and metadata standard:

      Horsburgh, J. S., D. G. Tarboton, D. R. Maidment, and I. Zaslavsky (2008), A relational model for environmental and water resources data, Water Resour. Res., 44, W05406, doi:10.1029/2007WR006392

    • GIS coverages and spatial data: The Federal Geographic Data Commission (FGDC) has established a U.S. Federal Metadata standard, The Content Standard for Digital Geospatial Metadata (CSDGM),Ver.2.

      A workbook with user friendly information and full documentation of the standards is available here.

    • Regular Gridded data and imagery: NASA has been working with the FGDC to establish data and metadata standards for imagery and gridded data sets.

      PIs may wish to look at NASA's report on standards for Earth Remote Sensing Data

    Example Language

    If your project generates standard time series data, your data standards and metadata section might include:

    • This project will use an existing standard for time series data, the Observations Data Model (ODM) (Horsburgh et al., 2008,  Water Resour. Res., 44, W05406, doi:10.1029/2007WR006392), which is supported by the CUAHSI Water Data Center (https://www.cuahsi.org/wdc).

  • Data Use, Privacy, and Sharing Policies

    Your Data Management Plan should include a data use policy, including when the data will be made available, and the procedure for data access. While the default may be free and open access, you should consider any special aspects of your data that may require restrictions in your data use policy, such as sensitive data or collaborations with international partners that may have their own data use policies.

    Example Language

    If your project project generates standard time series data, that does not have sensitive security or release procedures, you might specify your data use policy in this section:

    • This project will develop and abide by a data release policy. Data will remain within the project for two years or until publication (whichever is first), at which point it will be available to the broader community for academic use using CUAHSI's services. Any changes in this policy will be widely announced, and will not be retroactively applied to data already collected.
  • Data Life Cycle Management

    This section will be the most detailed of your plan. The data management life cycle covers production, management and archiving of your data. It is best for data management to commence at the beginning of a project and to become an integral part of doing the project managing the research. The life cycle should include details on each step, from data storage to publication.

    Data Storage

    Your Data Management Plan should include information on how your data will be stored during the project lifetime. Information to include in the plan may include types of storage and databases, database schemas, expected data volumes, storage management, and data backup procedures. The ODM (Horsburgh et al., 2008) is a data schema for time-series data.

    Data Curation

    A Data Management Plan should include information on quality control and assurance. Many examples of quality assurance plans are available from different organizations. For example, proposers may wish to consult EPA's extensive information on their quality assurance projects plans, or DataOne's Best Practices.

    Long term Archiving and Data Publication

    This facet of the Data Management Plan addresses NSF EAR's policy that: "Investigators...to share with other researchers, at no more than incremental cost and within a reasonable time, the primary data, samples, physical collections and other supporting materials created or gathered in the course of work under NSF grants. Grantees are expected to encourage and facilitate such sharing." After the funding period ends, you must detail how you will archive your data and make it available for further use. For compatible data, this need may be met by using CUAHSI's services or software for data publication and archiving. 

    Example Language

    If your project generates standard time series data, your data management life cycle section might include information such as:

    • Storage: Project data will be stored before publication as Excel files on University servers. The servers are secure and are backed up nightly. Taped back ups are stored off site.
    • Curation: Graduate students will perform quality control on the data when they are downloaded from the data logger following protocols documented at http://www.******.edu. The measurement equipment will be checked daily by a graduate student and any problems will be noted in the record. Suspect or outlier data will be flagged by the research assistant and checked by the PI. Missing data will be noted as -9999 along with a reason code.
    • Archiving and Publication: This project will submit all time series data to CUAHSI for publication and archiving. In this way, all data will be discoverable easily via the web and CUAHSI will maintain the data sets past the end date of the project.
  • Time Series Data Archive and Publication Options for NSF Principal Investigators

    The following is a list of options for WDC tools for archiving and publishing time series data:

    1. CUAHSI can host hydrologic data and metadata on CUAHSI servers.

    Data is loaded into a HydroServer maintained by CUAHSI in the Microsoft Azure Cloud. HydroServer capability currently includes support for time series data using the Observations Data Model and WaterOneFlow web services. Support for additional data types may be added driven by need. Under this model, CUAHSI works with you to ingest your data, and CUAHSI maintains the data archive after that. Request a database here.

    2. For ongoing projects with an existing data management system it is possible to couple this to CUAHSI's data service through an agreed upon interface.

    An example of this type of model is the one adopted by the Critical Zone Observatory (CZO) projects. An ASCII format data publication specification has been agreed upon for CZOs to periodically publish some of their data. This data is then ingested into a CUAHSI ODM where it is archived and published using WaterOneFlow services and WaterML. The same interface could be used by others, or a similar interface worked out in collaboration with the CUAHSI WDC. Under this model, CUAHSI enables publication of the data in a way that allows projects to periodically add to the data archive systematically.

    3. You may wish to host data yourself using CUAHSI software.

    The HydroServer software stack developed by the CUAHSI HIS community may be used for publishing time series of hydrologic observations on the Internet. Additional components can enable additional capabilities for spatial data using standards-based web feature, map, and coverage services. HydroServer software is open source and may be downloaded here. HydroServer also includes software for data editing and quality control and can be the basis for a complete data management system. Under this model, data archiving and publication is handled by the individual project. HydroServer was originally developed for a Windows environment, but has also been written in PHP and MySQL.

    4. If you have an established internal data management system, making your data available in the CUAHSI WDC catalog can be achieved by adapting your system to support our interfaces.

    Data sources are compatible with, and can be made discoverable through CUAHSI catalogs when they are published using the standards that CUAHSI has adopted. These include WaterOneFlow Web Services and WaterML. This will require programming and database expertise on your team and/or would need to be done in collaboration with the WDC team.

  • Verification of Data Publication

    For time series data published in the CUAHSI HIS, the National Science Foundation will soon require you to include a URL in the final project report that points to a catalog summary of your data that was funded by a particular grant. 

    A first version of the tool for citing funding information and retrieving catalog summaries based upon granting agency and grant number has been designed, implemented, and deployed on the CUAHSI's Catalog for time series data that meets this requirement. The CUAHSI Catalog’s website enables data publishers to provide publicly visible information about the funding sources and also gives the ability for granting agencies to retrieve information about data that has been published in the catalog by entering the applicable grant number. 

  • Additional Instructions for NSF Principal Investigators

    Depending upon an individual project, there are a number of options for using CUAHSI resources to fulfill National Science Foundation Data Management Plans. We encourage you to contact CUAHSI staff if you need additional information or advice on how to use CUAHSI services to publish your data. Although projects may generate many different types of data, CUAHSI's services are best suited for:

    For other data types, see the list of suggestions below:

    These other data types and files can be referenced in the metadata for data archived with CUAHSI.