Database Overview – Shale Network

Making Knowledge from Numbers

Shale Network publishes data online from all sources with quality assurance and quality control. For example, since 2010, a booming interest has developed among watershed associations to collect stream data in the region of Marcellus Shale development. Toward that end, two members of the ShaleNetwork steering committee, Candie Wilderman and Julie Vastine (ALLARM, Dickinson College), have been working to develop a volunteer-based Marcellus Monitoring protocol. Over 700 volunteers across the Shale region are collecting data and these data are published online as part of the Shale Network database, accessible through HydroDesktop. Shale Network also publishes data online from county, state, and federal agencies, and from members of colleges, universities, nonprofit groups, and private companies including members of the gas industry. We focus on finding, organizing, utilizing, and interpreting the data. The ShaleNetwork database will be housed permanently online for quality and quantity of waters in the northeastern gas production region. The team anticipates this database will pull together the network of research and citizen scientists and allow better understanding of water issues – the team “makes knowledge from the numbers”.

Shale Network Data Focus

The database contains surface water, groundwater, formation water, flowback water, and production water from hydrocarbon extraction regions. Such data will enable investigations into the interplay between water and energy.

Quality Assurance and Quality Control

We accept project data from associations, groups, or agencies in all formats. Data that will be input into the Shale Network database will generally be stored first as Excel files. We assemble these files for data entry into the CUAHSI Hydrologic Information System (HIS) for permanent publication online and for access through HydroClient and HydroShare.

The Shale Network database team will accept data and organize it and place it online so that citizens and researchers can understand whether development of shale gas is affecting water quality and quantity. The data are housed in a permanent database that is funded for permanent access online by the National Science Foundation. Data in Shale Network can be accessed along with all other data published online, including data from EPA, USGS, and academic researchers. Assessment of the quality of data is very difficult, regardless of whether data is collected by scientists or volunteers. Our overall philosophy will be to accept data and place it online for the community of citizens and researchers to evaluate, as long as the data provider has some level of quality assurance and control. Levels of data quality will be indicated within the database as a metadata field. All data in the database will not have the same data quality. However, finding data will be much easier: the researcher can then spend resources assessing data quality rather than finding data.

Quality assurance refers to measures taken to ensure data meets data quality standards; quality control means the actions implemented to achieve quality. Data quality objectives include that the data must be credible and of sufficient value for timely response to problems. We accept data from volunteers who have received training from service providers in the state of PA such as ALLARM. This training generally consists of close examination of the monitoring manuals, laboratory training on equipment, and field training including chemical monitoring, flow measurement, and visual assessment. Meters for measurement of conductivity are calibrated with standard solutions before each use and are stored according to manufacturer specifications between use. Volunteers generally work with ALLARM to pass a split sample quality control test annually. Specifically, monitors generally use meters to test waters and then collect an extra set of water samples to send to the ALLARM lab. At the lab, the water is tested using the monitors’ equipment as well as laboratory equipment and results are compared to volunteer data. If precision is acceptable, volunteers have passed quality control and can continue monitoring to provide data. If precision is not acceptable (outside limits), ALLARM re-trains volunteers. All methods are documented. ALLARM is responsible for QC/QA with volunteer groups as appropriate. Some ALLARM groups collect samples and send them to commercial laboratories as well.

Shale Network Database Technologies

The Shale Network database is growing to include data from published and unpublished sources, from citizen scientists, from county, state, and federal agencies, from industry, and from researchers. Metadata within the database indicate the source of the data. At this time, the source information does not show when data are downloaded with HydroDesktop but the information can be found on the ShaleNetwork.org website. The database will remain easily accessible through CUAHSI’s Hydrodesktop even after NSF funding for Shale Network runs out. Hydrodesktop accesses many databases other than the ShaleNetwork, and many of these databases will also have utility for understanding water quality and quantity issues with respect to the Devonian shale gas plays. The ShaleNetwork team is working with other entities such as the Susquehanna River Basin Commission to help provide their data appropriately, either within the ShaleNetwork database or at least accessible by HydroDesktop.

The database standard used by the Shale Network database is evoling from the Observations Data Model (ODM) (Horsburgh et al., 2008) to ODM2. We are also following the CUAHSI-HIS data and metadata standards available in the same reference. For GIS coverages and spatial data, we are following the Federal Geographic Data Commission’s (FGDC) US Federal Metadata standard, the Content Standard for Digital Geospatial Metadata (CSDGM),Ver.2.

Publishing and Accessing Data with CUAHSI HIS

The CUAHSI HIS (Hydrologic Information System) is designed to manage and to publish data collected at fixed points, such as wells and surface water sampling stations. We are using this service because we have data from various sources that we want to make accessible online – this is called publishing the data – in a standard way so that everyone can access and use it. CUAHSI hosts a HydroServer and uses it to host the Shale Network database. CUAHSI already interacts heavily with the USGS, and USGS data are already a part of the CUAHSI HIS, or can be “ingested” easily. CUAHSI has also worked with EPA so that EPA data are available using HydroClient, and has worked with several state agencies around the country to format the state data into the HIS. Agencies that are interested in having their data available in HydroClient are encouraged to contact CUAHSI directly or discuss the issue with members of Shale Network.

A Hydroserver is essentially a Windows XP computer running HIS software designed for data publication. Because it does not require a commercial license we are using HydroServer Lite. HydroLite uses the Observations Data Model for data storage and a WaterOneFlow web service for data publication.

For some datasets that we are given from data providers, we use the application ODM Data Loader (ODMDL) to load data files into an ODM database. This method is appropriate for data that are the result of a project or study that has been completed and will not need periodic updating. For data that are being continuously updated (i.e., data streaming from sensors in the field) we use the ODM Streaming Data Loader (also free software). The concept behind ODMDL is that it accepts input data in table format (Excel, CSV, or tab- separated), and it is designed to enable the loading of data into ODM tables either one at a time or as bulk data loading from a single file into multiple tables all at once. Once data are loaded into an ODM database, we can look at the data using the application ODM Tools. ODM Tools provide query and visualization capabilities. If the data look good, we can then publish the data with a WaterOne Flow web service. This service essentially hooks directly into an ODM database to publish data from that database. When data are published this way, it results in a standard output format called WaterML.

The next step after publishing is to register our WaterOneFlow web service. Registering is necessary in order for people to access our data (data discovery). To do this, HIS Central is a website maintained by the CUAHSI HIS team where we register our WaterOneFlow web service. Our service will then be discoverable along with dozens of other web services registered with the system (including services for the USGS NWIS and the EPA STORET datasets). HIS Central is the largest single catalog of the nation’s water data. HIS Central includes the free and open source desktop application, Hydrodesktop. Hydrodesktop allows users to search for data across all registered data sources at once.