Science gateway services

CISL develops science gateways and other Grid-based technologies that support the development of virtual organizations. Our projects and initiatives share cyberinfrastructure and span climate science, regional climate change, Arctic science, solar science, digital preservation, and international efforts to develop metadata and knowledge infrastructure. Many efforts are tied to major interagency, national, and international initiatives, including, for example, the World Meteorological Organization (WMO), the Intergovernmental Panel on Climate Change (IPCC), the International Polar Year (IPY), the World Climate Research Program (WCRP), and the Library of Congress’ National Digital Information and Infrastructure Preservation Program (NDIIPP). Most of these projects use open source, web portal infrastructure called the Science Gateway Framework (SGF). CISL’s work on all of the projects in this suite of science gateway services is supported through NSF Core funding and augmented by special funding as noted below.

Climate model workflow using a science gateway
This image shows an integrated climate model workflow built on shared science gateway cyberinfrastructure. The CESM portal workflow interface, coupled with SGF-based gateway services, allows users to configure model runs, then publish model output data and related model metadata via a single interface. This integration work demonstrates higher-level capabilities spanning TeraGrid-XSEDE, ESG-CET, and Curator projects via an end-to-end science gateway workflow.

Our contributions to science gateways support CISL’s computing imperative for software cyberinfrastructure by developing software specific to the simulation, analysis, and forecasting needs of the atmospheric and related sciences. They also address CISL’s computing frontier for center virtualization by developing science gateways and other Grid-based technologies that provide critical cyberinfrastructure (CI) to broad communities. Finally, work accomplished in ESG-CET, CADIS, WMO, and other collaborations address CISL’s strategic action item to address the challenges posed by large and heterogeneous environmental data, and to establish metadata standards for diverse collections of data and models.

Science Gateway Framework

The SGF is shared cyberinfrastructure that was developed to support many of our portal activities in providing scientific data access and services to a broad community of users. In FY2011 we continued significant development of the SGF-based gateway services with a vigorous release schedule focused heavily on the operational, production needs of the IPCC AR5/CMIP5 data management system. We also continued to develop and operate the NCAR CDP, ESG, and CADIS gateways and the NCAR Chronopolis node. These production-level gateway services represent over 25,000 registered users and over 10 million managed files. The NCAR ESG gateway alone provides access to over 1 PB of managed holdings and monthly delivers over 25 TB of scientific data to the community. In the sections that follow, we provide a brief summary for each of the related science gateway initiatives.

The SGF effort is supported through NSF Core funding, augmented by special projects including ESG-CET, NOAA GIP, CADIS, OPERA/ACADIS, and TeraGrid.

Earth System Grid Center for Enabling Technologies (ESG-CET)

This long-term, DOE-funded initiative develops a globally distributed petascale data management environment for CMIP5/IPCC-AR5 and U.S. climate science. The ESG-CET, or simply ESG, is a gateway to scientific data collections that are hosted at eight sites around the globe including NCAR. The ESG gateway is a complex software package with many stakeholders and must be easily deployable at sites around the world. NCAR/CISL leads the gateway development effort, and ESG was built using SGF components.

Early in FY2011 we began development efforts in support of CMIP5 data management priorities. We released three beta versions and three release candidates and reached a significant milestone in February, 2011, the 1.2.0 CMIP5 baseline release. This release included significant improvements to the installation of the SGF ESG Gateway application, enhanced documentation, security updates as well as basic data replication support. The 1.2 version significantly reduced the maintenance burden of the gateway application. Version 1.3.0 was released in June 2011 under the Apache2 open source software license and included a Spring framework upgrade, REST-based services, and security workflow enhancements. Version 1.3.2 was released in August 2011. A notable feature of this version is METAFOR Conceptual Information Model (CIM) model metadata ingestion and display. This work area was done in the context of the Earth System Curator project (with NOAA GIP support) and involved significant collaboration across ESG-CET and the EU-based METAFOR project. Version 1.3.2 is the most recent software release and is in production use at CMIP5 collaborator sites worldwide serving thousands of IPCC AR5 researchers. During Q3 and Q4 we collaborated closely with Argonne National Laboratory to develop a new data access feature integrating the GlobusOnline service within the SGF gateway. A significant reimplementation of the SGF search infrastructure was accomplished during this period based primarily on end user feedback from the CMIP5 community. This new search capability is available as a 2.0 Beta release, and it provides a more flexible platform to address current and future search needs which include support for metadata handling of observational data. Lastly, NCL-based product services were integrated within the SGF/ESG Gateway utilizing NOAA’s Live Access Server (LAS). The NCL LAS product services integration with GlobusOnline service, and the gateway 2.0 version search capabilities will be delivered to the community early in FY2012.

ESG-CET has been supported through NSF Core funds as well as DOE’s SciDAC-2 program. DOE funding for ESG-CET ended in September 2011, and NSF Core funds will support the operational ESG data management system in the future. We continue to pursue additional funding sources to operate this petascale system, a critical community resource for CCSM, CESM, and CMIP5/IPCC.

Global Organization for Earth System Science Portals (GO-ESSP) and Earth System Grid Federation (ESGF)

This is an international collaboration building federated data systems for CMIP5/IPCC-AR5. NCAR is a founding member and contributes SGF technology to this project. In FY2011 CISL staff served on the GO-ESSP Steering Committee, co-organized the May 2011 GO-ESSP workshop in Asheville, North Carolina, participated in the ESGF/CMIP5 planning/technical meeting, and presented two talks at the workshop on the SGF and CADIS projects.

CISL’s contributions to GO-ESSP this year were supported by a combination of NSF Core funds and ESG-CET support.

Cooperative Arctic Data and Information Service (CADIS)

CADIS is an NSF project to manage data from over 30 Arctic Observing Network (AON) programs to support the IPY. The CADIS gateway is a collaboration between CISL and NCAR’s Earth Observing Laboratory, the National Snow and Ice Data Center, and Unidata. CADIS provides an end-to-end service where AON projects upload and publish their data, publish and manage their metadata, and users discover and access data in near-real time. Accomplishments in FY2011 include integration of automated, real-time FTP upload services, search enhancements, user interface improvements, dedicated community support, and a pilot Chronopolis-based data preservation service.

The follow-on Advanced Cooperative Arctic Data and Information Service (ACADIS) project started in July 2011 and will continue this work. CADIS project funding ended in August 2011. In both cases, this community data service is supported by NSF Core and NSF Special funds.

Earth System Curator (ESC)

This project extends the ESG gateways to integrate modeling, data, and knowledge. We continued to develop and extend the model “trackback” capability, which provides a connection between datasets and the models that created them. One of the highlights of this year’s progress was highly collaborative work that spanned CIRES/NOAA, the EU-based METAFOR project, ESG-CET, and NCAR/CISL. The focus was on ingesting and handling the output of the METAFOR Conceptual Information Model (CIM), which is being produced by the METAFOR CMIP5 Questionnaire. In February 2011 SGF version 1.3.0 was released which included a number of ESC project-related features and enhancements, including a CIM Atom feed client for ingest of CIM documents, RDF triple store lookup performance improvements, and user interface updates based on user feedback. We also explored technical options for managing CIM document interface changes and kept the CIM harvesting and Trackback UI up-to-date as the CIM evolved. SGF Version 1.3.2 was released in August 2011 including METAFOR Conceptual Information Model (CIM) model metadata ingestion and trackback display in a production-ready form. This work area involved significant collaboration across ESG-CET and the EU-based METAFOR project, and it is in use in the eight ESG federation gateways worldwide.

This work is supported by NOAA/GIP via the Earth System Curator project.

Community Data Portal (CDP)

The CDP offers a broad range of scientific data collections that includes observations, climate, atmospheric chemistry, space weather, field programs, models, analyses, and more. Many programs and projects at NCAR, UCAR, and UCAR Community Programs (UCP) are represented in the portal. CDP provides a self-publishing model that offers data management tools directly to projects and PIs. By automatically sharing these metadata with other portals and international centers, data discovery is enhanced worldwide. Roughly 2,000 registered CDP users are discovering, accessing, and using 8,000 collections. In FY2011 we prioritized support of the AR5/CMIP5 activity during a critical high-use phase and provided basic operational and critical bug fix support for CDP.

CDP is supported by NSF Core funding.

WMO Information System (WMO-WIS)

CISL is a major contributor to the development of the World Meteorological Organization (WMO) Information System (WIS). Under the auspices of the United Nations, the World Meteorological Organization (WMO) is designing, developing, and deploying WIS as a next-generation globally federated information system for weather, climate, hydrology, oceanography, and many other disciplines. CISL plays a strong role in the management and technical direction of WIS and has contributed ideas, strategies, and services developed through our work with CDP, ESG, CADIS, and other efforts. CISL staff serve on many WIS committees, including the driving Intercommission Coordination Group (ICG-WIS), the Expert Team on GISC and DCPC Demonstration Program (ET-GDDP), the ICG-WIS subcommittee on WIS Center Identification, and the Inter-Programme Expert Team on Metadata and Data Interoperability (IPET-MDI). Beyond our committee participation, in FY2011 CISL was an active member of the IPET-MDI expert team and collaborated to produce the WMO Core Metadata Profile Version 1.2.

WMO-WIS is supported by NSF Core funding.

North American Regional Climate Change Assessment Program (NARCCAP)

This project shares regional climate simulation data via the SGF and the ESG data network. NARCCAP is an international program that supports regional climate assessments for the U.S. and Canada, and is served through the ESG-CET. The assets are nine high-resolution regional model outputs forced by various global models and provided by multiple PIs. The core software CI is now mature, and the archive continues to grow as experimental runs are completed and published via the portal. More than doubling in FY2011, NARCCAP-published data volume grew from 8 TB to 18 TB in managed holdings. CISL contributed by supporting operations, data publication, and user interface bug fixes and improvements.

CISL’s contributions to NARCCAP are supported by NSF Special funds, and the project will complete in the first quarter of FY2012.

TeraGrid Science Gateways: Prototyping an Environmental Science Gateway

This project is a collaborative effort across Purdue University, CU/CIRES/NOAA, and NCAR/CISL aimed at prototyping an end-to-end modeling and data management capability that incorporates distributed workflows. The strategy is to develop the new capability by building on Purdue’s CCSM/CESM modeling portal, the ESG Gateway, and new metadata capabilities developed in the Earth System Curator project. In FY2011, REST-based services and automated end-to-end data publication workflow from the Purdue CESM portal to an ESG Gateway was demonstrated for the TeraGrid Science Gateways project. This highly collaborative work was presented as a paper at this year’s TeraGrid Forum in Salt Lake City, and garnered the best paper award in the Science Gateways track.

This science gateway prototyping effort is supported by NSF’s TeraGrid project via a GIG award.

Chronopolis: Federated Digital Preservation over Space and Time

There is a critical and growing need to organize, preserve, and make accessible the increasing number of digital holdings that represent vital intellectual capital, much of which is precious and irreplaceable. Chronopolis is a strategic collaboration among the San Diego Supercomputing Center (SDSC, lead organization), NCAR/CISL, the University of California Library System, and the University of Maryland; it is aimed at developing national-scale digital preservation infrastructure that has the potential to broadly serve any community with digital assets for science, engineering, humanities, and more. In addition to community collections, Chronopolis CI is being used to provide pilot digital preservation services for the NCAR Library and the CADIS project.

In FY2011 Chronopolis expanded its production-ready capabilities. Managed holdings more than tripled from 10 TB to a total of 32 TB. The NCAR Chronopolis node was upgraded to new server hardware and usable storage of 100 TB. The underlying data transfer infrastructure was upgraded to iRODS from SRB, and the entire repository was refreshed in the new infrastructure. A pilot Chronopolis-based data preservation service was demonstrated in collaboration with the CADIS project. Lastly, a TRAC audit to certify Chronopolis as a trusted Digital Repository was undertaken, and is near completion at end-FY2011.

This gateway data preservation service is supported by the Chronopolis project.

2011 CISL Annual Report