Yellowstone data-intensive computing environment

In early FY2012, CISL completed the procurement process for and deployed Yellowstone, the initial high-performance computing environment for the NCAR-Wyoming Supercomputing Center (NWSC). The Yellowstone environment encompasses a petascale high-performance computing (HPC) resource, a centralized file system and data storage resource, and data analysis and visualization resources.

Yellowstone
The 63 racks of the Yellowstone HPC system (pictured) and the other components of the data-centric environment, including an 11-petabyte disk resource, were installed at NWSC in June and July 2012.

Yellowstone targets the current era of data-intensive computing needs by providing a data-centric environment in which all the computing and support systems required for scientific workflows are attached to a shared, high-speed, central file system. This design will improve scientific productivity and reduce costs by eliminating the need to move or maintain multiple copies of data. The diagram below shows the foundational architecture for NWSC’s data-centric design: the large, high-speed central file system (GLADE, NCAR’s Globally Accessible Data Environment) is the centerpiece of the design.

Yellowstone architecture
This diagram shows the overall architecture for the Yellowstone environment. It shows the integration and relative sizes of systems for computing, data analysis and visualization, online and archived data, data management, external interfaces, and networks.

The environment’s specifications were driven by NCAR and community science requirements and informed by input from the science goals in the NCAR Strategic Plan, the NWSC Science Justification, the CISL Strategic Plan, the NCAR-Wyoming Partnership, CISL user surveys, and a major workflow study conducted by CISL. The Yellowstone system meets or exceeds all the technical specifications and performance goals that were specified and refined by science, technical, and business evaluation teams.

Yellowstone is a 1.5-PFLOPS HPC system providing nearly 30 times more computing capacity than NCAR’s current IBM POWER6 supercomputer, Bluefire. To match its increased computing capacity, the Yellowstone environment includes a shared file system that will increase storage capacity by 12 times and offer 15 times the sustained I/O bandwidth of the current system, and data analysis and visualization clusters that will have 20 times the capacity of the analogous resources at the Mesa Lab Computing Facility (MLCF).

  • The Yellowstone HPC resource is an IBM iDataPlex cluster comprised of 72,288 Intel Sandy Bridge EP processor cores in 4,518 16-core nodes, each with 32 GB of memory and all connected to an FDR InfiniBand fabric. The HPC resource has a peak performance of 1.504 PFLOPS and has demonstrated a computational capability of 1.2576 PFLOPS as measured by the High-Performance LINPACK (HPL) benchmark. (One PFLOPS is one quadrillion -- 1,000,000,000,000,000 -- floating point operations per second.) As measured by a CISL-defined benchmark suite, Yellowstone is expected to deliver more than IBM’s promise of 28.9 times the computational capacity of Bluefire.

  • The GLADE central disk resource has an initial usable storage capacity of 11 PB and will expand to 16 PB in the first calendar quarter of 2014. Comprised initially of 6,840 2-TB disk drives and connected to the FDR InfiniBand fabric, GLADE will have a sustainable aggregate I/O bandwidth of 90 GB/s. CISL’s data-centric approach for Yellowstone expands on the original GLADE file system, deployed in FY2010 to confirm the value of this design at NCAR’s MLCF.

  • The Geyser data analysis resource is a 640-core cluster of 16 nodes, each with 1 TB of memory, and the Caldera computational cluster has 256 cores in 16 nodes, each with two Graphics Processing Units (GPUs) for use as either computational processors or graphics accelerators.

  • In support of the operational forecasts of the NSF’s Office of Polar Programs' Antarctic Mesoscale Prediction System (AMPS) and with additional funding from NSF, CISL procured a separate, smaller iDataPlex cluster, named Erebus. Erebus has 84 nodes similar to Yellowstone’s, an FDR-10 InfiniBand interconnect, and a dedicated 58-TB file system. If needed, Yellowstone will serve as backup for Erebus.

CISL completed subcontract negotiations with IBM on 8 September 2011, and NSF approved the subcontract on 27 October. The Yellowstone system was officially named and announced on 7 November 2011. Because CISL elected to deploy the system with an FDR-14 InfiniBand interconnect, which offered significant performance and “time-to-science” benefits over the life of the system, the hardware delivery schedule was delayed. The system components, including the test equipment, arrived at NWSC between 9 May and 29 June 2012. The bulk of the hardware installation was completed by mid-July, and IBM and its partners finalized the installation, repaired or replaced problem components, installed system software, and worked through performance issues on the system through the end of August. The Yellowstone system officially entered its three-week Acceptance Testing Period (ATP) on 4 September 2012 and successfully completed ATP by 30 September. In FY2013, CISL will take ownership of the system and enter the system's four-year production period.

CISL’s commitment to a data-intensive computing strategy extends beyond Yellowstone and includes a full suite of community data services. CISL is leading the community in developing data services that can address the future challenges of data growth, preservation, and management. CISL also leads in supporting NSF’s new requirement for data management plans. Our disk and tape-based storage systems provide an efficient, safe, reliable environment for hosting datasets, and CISL anticipates that its data services will be further streamlined, improved, and expanded through the data-centric design of the Yellowstone environment.

The Yellowstone procurement was made possible through NSF Core funds, including CSL funding, and one-time NSF EaSM funding to augment NWSC resources.

2012 CISL Annual Report