JupyterHub Data Loss Incident

Posted Apr 23, 2026


JupyterHub Data Loss Incident

CUAHSI — Community Statement | April 2025

In early April 2025, we experienced a data loss incident in our JupyterHub system. This does not affect data stored in the HydroShare data repository, only data within sandboxed JupyterHub instances. In an effort to be fully transparent about what occurred, we’re providing a brief summary of the event, changes we’ve made to our maintenance practices, as a reminder of how to best manage your scientific data.

CUAHSI's JupyterHub is designed to make data archived in scientific repositories actionable and reusable within sandboxed computational environments. Because the platform is intended as a working environment rather than a permanent archive, it is not held to the same rigorous data preservation standards as repositories such as HydroShare are.

While we do not guarantee the permanent preservation of data stored in our JupyterHub, we fully recognize that continuity of access to in-progress work is an essential component of scientific research. We recognize the effect that a sudden loss of data may have on members of our community, and we’ve taken the following steps to mitigate issues in the future:

  • Revised our system upgrade procedures to review and verify data integrity.

  • Extended backup retention windows to provide a longer recovery window in the event of future incidents.

Who Is Affected

Users who stored data or work-in-progress directly within their JupyterHub sandbox instances will be affected by this incident. Users who regularly archived their work to external repositories such as HydroShare or GitHub are less likely to have experienced data loss.

Data stored in the HydroShare repository have not been impacted.

Best Practices for Data Archival

We encourage all members of our community to adopt the following best practices to safeguard their research data:

  • Use software version control (such as GitHub) to track and archive code. Regularly committing your work ensures that code is preserved independently of any particular computing platform.

  • Archive data in domain-specific repositories such as HydroShare. These platforms are built to meet rigorous data preservation standards and are the appropriate long-term home for scientific data.

  • Treat JupyterHub as a working environment, not a permanent archive. Regularly export or sync important files to an external repository or your computer.


The CUAHSI Team