Cleaning Up Bad Data and Finding Hidden Collections: How ArchivesSpace Makes Our Archives Accessible

As is the case in so many libraries and archives, the manuscript collections at the Historical Medical Library used to be difficult to find, let alone search. Some were available through the College website as (essentially) text files. Unless a researcher knew the name of the collection he or she wanted to consult, it was virtually impossible to find the correct information.

The old interface researchers saw when searching our finding aids
The old interface researchers saw when searching our finding aids.

 

Many of the Library’s archival collections could be found through the library catalog – but these catalog records don’t display all the necessary descriptive metadata that archives need to be truly accessible. Archival collections are a different sort of beast than books. For books, one has author, title, publication date, publisher, and subject headings; perhaps even the number of pages, who wrote the foreword, and whether there are illustrations. Generally, the metadata about copies of the same book found in different libraries will be nearly, if not exactly, identical.

However, each archival collection is unique: it exists in only one location and in one form (short of photocopying or digitizing each item). Not every collection has basic information available; sometimes we just don’t know or can’t discern who created it, when it was created, or why it was created. How do we make materials like this accessible to, and usable by, researchers?

Archivists catalog archival collections by processing them. Processing involves arranging and describing the collection materials in a comprehensible way. The catalog records for archival collections are the finding aids. Finding aids document as much available information as possible about the collection: a brief biography of the person or organization who created it, the dates of the materials, a detailed description and inventory of the contents, the reasons for the current arrangement, the language of the materials, from whom or where the collection came, related collections, and subject headings. All of this metadata helps researchers find relevant material and sort through even the largest of collections. With such a great amount of information to present, a standard library catalog record isn’t sufficient.

The library catalog view of a finding aid (left) vs an EAD finding aid, hosted by PACSCL (right).
The library catalog view of a finding aid (left) vs an EAD finding aid, hosted by PACSCL (right).

 

The Library is fortunate to be a member of PACSCL (Philadelphia Area Consortium of Special Collections Libraries). With the help of grants, PACSCL was able to assist its members with processing long-hidden collections and providing access to finding aids encoded in EAD (a mark-up language similar to HTML), which are hosted by the University of Pennsylvania Libraries. However, we can’t expect another institution to manage and host all of our 500+ finding aids.

The question remained: How can we make our finding aids easily accessible in a meaningful manner? We found a solution to our finding aid dilemma in ArchivesSpace. ArchivesSpace (AS) is an open-source archives-based information management system. It’s designed to support everyday archives tasks such as accessioning, creation of finding aids, and management of authorities and rights. Like its predecessors, Archivists’ Toolkit, Archon, and Archivematica, AS provides a database for creating and storing finding aids and accession records. Unlike its predecessors, AS has a huge draw because it offers a web interface out-of-the-box from which users can access all the finding aids a library has made available.

cpp_as_1
The new ArchivesSpace interface for the Library’s finding aids.

 

ArchivesSpace is available as free software, but if you need technical support, you’ll have to become a paid member of the community. For those who choose not to pay, there’s a public Google Group made up of other users that is dedicated to working on bugs and customized features. AS is open source, which means that users have access to the source code. Once the installation file is downloaded and set up on a server – well, if you can dream it, you can customize it.

Customizing AS is not for the faint of heart. AS is written using Ruby on Rails, a programming language that can take some getting used to. Fortunately, many simpler changes, such as adding a logo to the front page or changing the name of a field (like “Custodial History” to “Provenance”), can be made by either adding a plug-in to the installation or editing existing files with a few lines of code. We’ve found the “Customizing and Theming ArchivesSpace” documentation from the official AS GitHub page to be useful with its step-by-step instructions. If you plan on using the free version of AS, be sure to get a good understanding of how it’s built, and plan on learning how to write some code.

The Library has made some changes in our ArchivesSpace instance, specifically to the home page that researchers see when arriving on the site, but it’s still a work in progress. The more we work in it, the more we realize that AS has the potential to be extremely robust. One of the greatest features of AS is the full-text search. When a user searches for a term, AS searches the entire finding aid, not just titles, subject headings, or names – and this will return more results. Just like on Amazon, our AS users can sort and filter their search results. One customization that we’re working on will allow users to sort results by date.

cholera_search
A search for “cholera” returns collections and items in collections. Users can sort by relevance, title, or record type.

 

ArchivesSpace makes our finding aids easily accessible and searchable, but how difficult was it to enter all these records? Luckily, the finding aids hosted on the PACSCL site could be imported into AS in EAD format. But the majority of our finding aids are legacy finding aids, that is, finding aids available in old text formats or only hardcopy. While there are tools available for converting text to EAD, we decided to skip that step and simply cut-and-paste from old documents directly into AS. Yes, it took as long as you are imagining! This meant a lot of clean-up, as older word processor documents and scanned materials often have issues in rendering certain characters.

Entering legacy finding aids into AS isn’t only cut-and-paste, however. Another great feature of AS is ability to “link” authority records to access headings. When entering a term for the first time into AS from a controlled vocabulary such as Library of Congress Name Authority File (LCNAF), Getty Art & Architecture Thesaurus (AAT), or Medical Subject Headings (MeSH), there is a field in the record for the authority ID number. We have yet to explore this feature fully, but are hoping it paves the way for our finding aids to take advantage of linked data.

Creating a new finding aid record in AS (left) and creating a new name authority/subject heading record in AS (right).
Creating a new finding aid record in AS (left) and creating a new name authority/subject heading record in AS (right).

 

Any change in creating, storing, and retrieving information is a huge task, especially when dealing with legacy data and new software. Although we have encountered issues in using ArchivesSpace to its full extent, we’ve enjoyed the challenge of “How do we make this a better tool for us and our researchers?” We’ve learned some coding skills, become masters at cut-and-paste, assigned subject headings and name authority headings to our hearts’ content, and created an easily accessible, searchable, and (we hope) useful tool for anyone wishing to learn more about our archival collections.

Browse through our finding aids in ArchivesSpace at cpparchives.org and let us know what you think!