Back Back to:
Search:

Hierarchical Archival Description Meets the Data Grid


Archival description of records and other documentary materials often involves the use of descriptive hierarchies. For example, at the National Archives and Records Administration Federal records are described at the record group, series, file unit, and item levels. This type of description presents some unique challenges for linking the descriptions to the actual materials they describe which are stored in an archival data grid. The University of Maryland (UMD) and the San Diego Supercomputer Center (SDSC) have extensive experience in developing data grids to store and retrieve digital objects for diverse user communities. They had not previously encountered hierarchical descriptions such as those used by the archival profession. As part of their work with NARA they developed a prototype to explore the successful integration of a data grid with hierarchical descriptions.

For the purposes of this research SDSC and UMD used a collection of digital images known as the EAP collection. The EAP collection consists of over 120,000 high resolution tiff images and over 250,000 smaller access and thumbnail images stored in an archival data grid based on the Storage Resource Broker from SDSC. The collection was originally housed on WORM media and cdrom at Archives II in College Park, MD. The rescue of this collection at the University of Maryland is documented in UMIACS-TR-2003-105. This collection was originally created to provide Internet access to surrogates of select NARA holdings. In addition to the images the collection included metadata about the images.

Arranging and describing the collection was a joint effort between the University of Maryland and San Diego Supercomputing Center. SDSC provided data mining software and database design while UMD developed browsing software to view collection metadata and status within an archival data grid. Some of the issues encountered during this research were:

  • Developing tools to automate the capture, parsing, and loading of the metadata into a descriptive hierarchy
  • Validating the metadata in addition to the data itself
  • Linking descriptions across levels of the hierarchy
  • Linking the multi-level descriptions to the images in the data grid.

Available hierarchies:

  • Presidential and donated collections
  • Federal Records

  •