Hierarchical Archival Description Meets the Data Grid
Archival description of records and other documentary materials often
involves the use of descriptive hierarchies. For example, at the National
Archives and Records Administration Federal records are described at the
record group, series, file unit, and item levels. This type of description
presents some unique challenges for linking the descriptions to the actual
materials they describe which are stored in an archival data grid. The
University of Maryland (UMD) and the San Diego Supercomputer Center (SDSC)
have extensive experience in developing data grids to store and retrieve
digital objects for diverse user communities. They had not previously
encountered hierarchical descriptions such as those used by the archival
profession. As part of their work with NARA they developed a prototype to
explore the successful integration of a data grid with hierarchical
descriptions.
For the purposes of this research SDSC and UMD used a collection of digital
images known as the EAP collection. The EAP collection consists of over
120,000 high resolution tiff images and over 250,000 smaller access and
thumbnail images stored in an archival data grid based on the Storage
Resource Broker from SDSC. The collection was originally housed on WORM
media and cdrom at Archives II in College Park, MD. The rescue of this
collection at the University of Maryland is documented in
UMIACS-TR-2003-105.
This collection was originally created to provide
Internet access to surrogates of select NARA holdings. In addition to the
images the collection included metadata about the images.
Arranging and describing the collection was a joint effort between the
University of Maryland and San Diego Supercomputing Center. SDSC provided
data mining software and database design while UMD developed browsing
software to view collection metadata and status within an archival data
grid. Some of the issues encountered during this research were:
- Developing tools to automate the capture, parsing, and loading of the
metadata into a descriptive hierarchy
- Validating the metadata in addition to the data itself
- Linking descriptions across levels of the hierarchy
- Linking the multi-level descriptions to the images in the data grid.
|