Chronopolis: Federated digital preservation across space and time
This image of the Chronopolis Operational Status Page shows the integrated real time state of the Chronopolis system. Chronopolis software consists of a suite of replication, control, and validation packages running at the NCAR, SDSC, and UMD sites. This page allows the distributed system to be assessed at a glance and provides tools to drill into the information as required.There is a critical and growing need to organize, preserve, and make accessible the increasing number of digital holdings that represent vital intellectual capital, much of which is precious and irreplaceable. Chronopolis is a strategic collaboration among the San Diego Supercomputing Center (SDSC, lead organization), NCAR/CISL, the University of California Library System, and the University of Maryland; it is aimed at developing national-scale digital preservation infrastructure that has the potential to broadly serve any community with digital assets – science, engineering, humanities, and more. This new effort encompasses studying viable models and effective systems that facilitate establishing standard reference datasets, preserving collections that evolve over time, and establishing preservation resources "of last resort" for digital assets that might become lost. Digital collections that must persist for 100 or more years are one important focus of this activity. It is also worth noting the special synthesis of relationships and capabilities required to approach this problem: scientists, librarians, curators, computer scientists, and long-term distributed cyberinfrastructure.
The problem spans the gamut of academic scientific disciplines, historical collections, and digital library content. Though broadly useful, new capabilities developed in Chronopolis are expected to be powerful services that we can potentially offer to the Earth System sciences community through, for example, NCAR's Community Data Portal (CDP). This activity supports CISL's computing frontier for center virtualization by advancing grid-based data preservation technologies.
In FY2009 the Chronopolis digital preservation systems were brought to full operational status and optimizations were undertaken relative to storage and wide-area data transfer performance. Chronopolis hardware was received from SDSC and installed, the Chronopolis software suite was installed, and the interconnections between storage zones were established. A simulated disaster recovery test was undertaken and successfully completed using the TeraGrid as the data transmission medium. The network performance was tested and tuned to give 5 Gbps performance between NCAR and SDSC on the TeraGrid. A status page was designed and implemented to provide an integrated view of Chronopolis software components across the geographically separated archive sites. This page provides access to key high-level information of interest to Chronopolis users that is stored separately in Chronopolis software packages, leveraging Web 2.0 technologies to gather and display this information from across Chronopolis zones and applications. Currently Chronopolis has four Library partners who have collectively contributed about 25 TB to the archive.
In FY2010 Chronopolis will move toward being a more robust operational system, adding TRAC certification, life cycle management, business models, and governance structures that will be needed to complement the technical development. Technical development will continue in the infrastructural and usability realms. As the iRODS technology becomes available, it will join and possibly supplant SRB as a replication mechanism. User interfaces and infrastructure to allow data providers to more automatically place data into the archive and request data from the archive will be adapted and developed. New data providers will be sought to bring a larger variety of data to Chronopolis. An alliance with the MetaArchive organization is being explored and promises to increase the technical robustness of Chronopolis by bringing technical heterogeneity to techniques used to store and retrieve data.
CISL is engaging in Chronopolis as an important strategic thrust, supporting it through a combination of NSF Core funding, NCAR's Cyberinfrastructure Strategic Initiative (CSI), and focused funding from the National Library of Congress.
