Imperative IV

Provide Comprehensive Data Services, Open Access, and Long-Term Stewardship of Data

One of NSF’s core expectations in the NSF - UCAR Cooperative Agreement focuses on data issues, specifically calling for NCAR to “serve as stewards of high quality scientific data on behalf of the community through maintenance, enhancement and curation.” For EOL, this charge falls on the shoulders of the Computing, Data and Software Facility (CDS). CDS is responsible for developing and maintaining EOL’s computing infrastructure, data and metadata services, collaborative tools, and software engineering, all of which are integral to Imperative IV. CDS serves as the umbrella for all data management activities in EOL, and takes a proactive approach in working with PIs to meet NSF’s requirement that, beginning in January 2011, grant proposals include comprehensive data management plans.

Developments in Data Services

EMDAC

In addition to providing and supporting instrumentation that helps scientists obtain critical observational data, EOL also supports data packaging, storage, and management through the EOL Metadata Database and Cyberinfrastructure (EMDAC).  EMDAC is a comprehensive metadata database and integrated cyberinfrastructure that will eventually be the hub of all EOL data services and that will allow us to provide accurate and timely data services for the science community.

The scope of EOL's support to the science community we serve has expanded to include datasets from new origins, e.g. satellite or network data sets, which we collectively quality assure and put into common formats to ease our users' analysis process. Advances in satellite communications and internet distribution have pushed and enabled us to publish data, information and imagery to the science community at a much faster pace. Through EMDAC, EOL will create bridges to multi-agency data portals, creating compatible metadata and data access infrastructure.  This will connect EOL to the common services currently available while also allowing us to meet future needs through a modular and extensible architecture.

In addition to unifying our databases, our user community has outlined other specific tasks that are required to complete EMDAC, including:

  • Improve interfaces for data access, ingest, and distribution
  • Develop, distribute, and support interfaces to fourth generation languages from community-developed codes used in the analysis of our datasets
  • Build a scalable database to handle growing data volumes and file types
  • Centralize and improve data metrics collection and reporting
  • Establish long-term data stewardship processes

Prototype versions of both the next generation Field Catalog and the Cooperative Distributed Interactive Atmospheric Catalog System (CODIAC) data distribution tools, both integral components of EMDAC, were developed in FY 2011. Local and remote in-field data distribution was added during the July 2011 Ice in Clouds Experiment (ICE-T) campaign, and the new Field Catalog will be deployed alongside the prior version during TORERO 2012: Tropical Ocean Troposphere Exchange of reactive halogen species and oxygenated VOC, in January-February 2012. 

Cross-Platform Software

EOL expects the research community to benefit in many ways from common, cross platform software development and management. Among these benefits: 1) EOL can make efficient use of limited software engineering resources because of the many areas of expertise (e.g., software architecture, real-time data acquisition, signal processing, web application frameworks); 2) Team-developed code ensures that EOL has the ability to respond in the event that a particular software engineer is not available; 3) Multi-platform (e.g. Linux, Mac, and Windows) development allows EOL to meet the software needs of a broad community; 4) Using common tools to manage EOL’s software engineering process reduces the overhead required for important functions such as revision control, bug tracking, and software packaging; and 5) By sharing software libraries across the lab, EOL can leverage the extensive testing that occurs and use the most reliable, proven software in our observing systems. An example of these benefits is the EOL-developed Aeros airborne display software was ported to the MacOS platform for use in ICE-T. Also, in FY 2011, EOL adapted the NCAR In-situ Data System (NIDAS) for the Integrated Surface Flux System (ISFS) to acquire data via cell-phone links, and the SD3C architecture was ported to the new HCR platform and Ka-band systems.

Local Traditional Knowledge (LTK) Data Collection Projects

An exciting new project is EOL’s participation in Local Traditional Knowledge (LTK) data collection that focuses on indigenous peoples. This activity, which addresses two of the NSF Cooperative Agreement core expectations –“serve as stewards of high quality scientific data on behalf of the community, through maintenance, enhancement and curation,” and “address challenging scientific problems that require long term focus and integration across global, regional and local scales” – is part of a long-standing collaboration among NSF, the Arctic research community, the National Snow and Ice Data Center (NSIDC), UNIDATA, NCAR’s Computational Information Systems Laboratory (CISL), and EOL.

ACADIS

LTK activities in FY 2011 focused on the Bering Sea, which is one of the most biologically productive marine environments, includes habitats that are of global importance, and is undergoing rapid changes due to climate change. NSF has funded EOL’s participation in several programs including the Bering Sea Ecosystem Study (BEST), the Bering Sea Integrated Ecosystem Research Program (BSIERP), and the Advanced Cooperative Arctic Data and Information Service (ACADIS) for the Arctic Observing Network (AON). As part of these programs, the LTK projects emphasized the importance of a variety of historical records and stories of tribal elders in documenting the changes that have come to the region due to climate change. These projects are dedicated to fostering understanding and sharing of knowledge between Arctic residents, scientists, educators, policy makers, and the general public.

The data from the region are eclectic and include written interview transcripts, audio or video tapes and files, photographs, artwork, illustrations, maps, digital geographic information such as GPS tracks, data created using the Geographic Information System (GIS), and quantitative data such as temperature, snow thickness, and wind speed/direction. Staff from EOL’s Computing, Data and Software (CDS) Facility played an important role in designing and developing the GIS mapserver site that hosts these data.