Data access and time-series statistics

2019 Cyberseminar Series

Presenter(s):
Emilio Mayorga / University of Washington
Yifan Cheng / University of Washington

Talk Description

Data about water are found in many types of formats distributed by many different sources and depicting different spatial representations such as points, polygons and grids. How do we find and explore the data we need for our specific research or application? This seminar will present common challenges and strategies for finding and accessing relevant datasets, focusing on time series data from sites commonly represented as fixed geographical points. This type of data may come from automated monitoring stations such as river gauges and weather stations, from repeated in-person field observations and samples, or from model output and processed data products. We will present and explore useful data catalogs, including the CUAHSI HIS catalog accessible via HydroClient, CUAHSI HydroShare, the EarthCube Data Discovery Studio, Google Dataset search, and agency-specific catalogs. We will also discuss programmatic data access approaches and tools in Python, particularly the ulmo data access package, touching on the role of community standards for data formats and data access protocols. Once we have accessed datasets we are interested in, the next steps are typically exploratory, focusing on visualization and statistical summaries. This seminar will illustrate useful approaches and Python libraries used for processing and exploring time series data, with an emphasis on the distinctive needs posed by temporal data. Core Python packages used include Pandas, GeoPandas, Matplotlib and the geospatial visualization tools introduced at the last seminar. Approaches presented can be applied to other data types that can be summarized as single time series, such as averages over a watershed or data extracts from a single cell in a gridded dataset – the topic for the next seminar.

CUAHSI's 2019 Cyberseminar Series: Waterhackweek

Hosted by CUAHSI and University of Washington

Studies of water and environmental systems are becoming increasingly complex and require the integration of knowledge across multiple domains. At the same time, technological advances have enabled the collection of massive quantities of data for studying earth system changes. Fully leveraging these datasets and software tools requires fundamentally new approaches in the way researchers store, access and process data. Waterhackweek, supported by the National Science Foundation Cybertraining program, serves the national interest by motivating a culture shift within the hydrologic and more broadly earth science communities toward open and reproducible software practices that will enhance interdisciplinary collaboration and increase capacity for addressing complex science challenges around the availability, risks and use of water. This cyberseminar series consists of presentations from the Cybertraining investigators and research software developers, each focusing on a specific water-related use cases, tool, or library. Topics will consist of both introductory and advanced concepts that are relevant to a wide range of water and informatics use-cases, e.g. publishing large datasets, running numerical models, organizing collaborative research projects, and meeting journal requirements by following open data standards. The goal of the 2019 series is to prepare the incoming Waterhackweek (March 25-29, 2019) participants for the in-person capstone event in which their skills and creativity will be used to address natural hazards, however, these topics and technologies are also relevant to the broader water science community. We welcome all undergraduate, graduate, and early career scientists to join us in this public cyberseminar series.