Born-Digital Archives in
Collecting Repositories:
Turning Challenges into Byte-Size
Opportunities
Gretchen Gueguen, Mark A. Matienzo,
Simon Wilson, and Peter Chan
Session 502, 27 August 2011
Society of American Archivists Annual Meeting
AIMS
AIMS Project
"Born-Digital Collections: An Inter-
Institutional Model for Stewardship
Two year project to create a framework for
stewardship of born-digital archival
records in collecting repositories
Funded by the Andrew W. Mellon
Foundation
AIMS
Partners
AIMS
Grant Goals
Processing of Hybrid Collections
Software Development
Community Development
Unconference (May 2011, Charlottesville, VA)
UK Symposium (June 2011, London, England)
Workshop (August 2011, Chicago, IL)
White Paper and Project Report
AIMS
Framework Development
A framework for collecting and delivering
the born-digital materials that are quickly
beginning to constitute the collections of
contemporary scholarly, literary, and political
figures and organizations.
University of Virginia
AIMS
What is Collection Development?
Actions and policies of institutions to bring in
material for end users (both current and future);
includes prioritizing, developing relationships with
creators, assessments, negotiating agreements and
preparing for accessioning.
Within the AIMS framework
Viable, practical method to capture/process born-
digital material from hybrid collections requires
sound work at the beginning (i.e. policies, practices,
agreements with donors, etc.) to set up later work
AIMS
Elements of Collection
Development
1 . Prerequisites
2 . Establish relationship with donor
3 . Analyze Feasibility
4 . Negotiate Agreements
5 . Prepare for Accessioning
AIMS
Prerequisites
Neil Beagrie, "Plenty of Room at the Bo5om? Personal Digital Libraries and
Collec>ons," D Lib Magazine (June 2005)
Blagofaire. h5p://xkcd.com/239/
AIMS
Donor Relationship
AIMS
Enhanced Curation
AIMS
Analyzing Feasibility
AIMS
Negotiate Agreements
All rights reserved by Chevrolet UK
AIMS
Prepare for Accessioning...
Scope and extent determined?
Coordination with
acquisition of
Method and time
analog material?
determined?
Pre-acquisition
Enhanced curation
appraisal performed?
carried out?
Test capture if needed?
Development of new methodologies undertaken as needed/possible?
AIMS
Accessioning
Mark A. Matienzo, Yale University
AIMS
What is Accessioning?
Archival institution takes physical and legal custody
of a group of records from a donor and documents
the transfer in a register or other representation of
the institution s holdings
Within AIMS Framework
Processes which establish physical, administrative
and intellectual control over transferred records;
assessment and documentation of future needs;
documentation of actions taken; beginning of safe
storage and maintenance
AIMS
Elements of Accessioning
1 . Prerequisites
2 . Transfer records and gain administrative control
3 . Physical control and stabilization
4 . Intellectual control and documentation to
support further processes
5. Maintain accessioned records
AIMS
Case Study:
Re-Accessioning at Yale
Collaborative capacity building across two
repositories
Manuscripts and Archives
Beinecke Rare Book and Manuscript Library
Addressing previously received accessions of
containing electronic records on media
Still in testing phase, but working towards
implementing in production
AIMS
Types of Records and Media
Wide variety of records creators
Literary authors
University faculty
University offices
Architectural firms
Common types of media
Floppy disks: 5.25 and 3.5
Optical media: CDROM, CD-R, DVD-R, etc.
Zip disks
USB flash drives
AIMS
Goals of Re-Accessioning
Identify, document, and register media
Mitigate risk of media deterioration and
obsolescence
Extract basic metadata from filesystems on media
and files contained on filesystems
AIMS
Re-Accessioning Workflow
Start
accessioning Write-protect media Verify image
process Media
Record identifying Extract filesystem- Disk Meta-
images data
Retrieve media characteristics of and file-level
media in media log metadata Transfer package
Assign identifiers to Package images and Ingest transfer
Create image
media metadata for ingest package
Media FS/File
Disk Document
MD MD
image accessioning process
End
accessioning
process
AIMS
Disk Imaging
Using forensic (bit-level) imaging process
Ensure data on media is not manipulated using
write-protection
Uses software to acquire images
Includes hash-based verification process
AIMS
AIMS
Media Log
Using SharePoint list
Contains unique identifier of media
Records physical/logical characteristics of media
Documents success, failure, or status of various
processes and additional notes
AIMS
Media Log
AIMS
Media Log
AIMS
Metadata Extraction
Can be repurposed for descriptive, administrative,
and technical metadata
Uses command-line tools (Sleuthkit, fiwalk)
Outputs XML document
AIMS
Packaging and Transfer
Using BagIt packages/Bagger application
Packages contain disk images, extracted metadata,
imaging logs, and high-level accession information
Transfer to storage is verified by comparison
against manifest
AIMS
AIMS
Arrangement & Description
Simon Wilson
Hull University Archives
AIMS
Purpose of Arrangement & Description
The general objectives for Arrangement & Description are:
- to preserve context
- to establish intellectual control of the material
- to provide a means of discovery
SAA definition, emphasis on minimizing the amount of handling
Within the AIMS framework
Processes which establish intellectual control of the material including
implementation of policies and agreements with donors etc. to enable
subsequent discovery and access
AIMS
Elements of Arrangement
and Description
1. Prerequisites
2. Plan for processing
- gather supporting information; files captured from media
(accessioning); convert files (for viewing); appraisal strategy;
assess arrangement options; consider preservation issues
3. Processing
- implement arrangement strategy; add descriptive metadata and
wider context (eg Collection Level Description); copyright &
other legal considerations
4. Prepare for Discovery & Access
- remove restricted access to b-d material during processing
AIMS
Case Study - Stephen Gallagher
Background:
2005: 42 boxes paper archives
2010: born-digital material:
14,320 files (13.6GB) transferred
to us via external hard drive and
a box of Amstrad disks
Create integrated catalogue
to accommodate paper, born-
digital and future accruals
AIMS
Case Study - Stephen Gallagher
Approach:
- current work higher priority in filing system
- considered each work a distinct project
- structure reflect his way of working & the
archival principles of control that creator,
archivist & user can all understand
Series level was most logical solution
- all related files placed in the series
- reasonable return for our effort
AIMS
Case Study - Stephen Gallagher
300 files created using FinalDraft
screenwriter software
- view file (as created) to identify
appropriate format for long term
preservation
Other issues:
- copyright/third-party content
- commercial implications: access
via repository = publication?
- re-purposing of work from one
(unsuccessful) project to another
AIMS
Challenges faced
Each collection is unique, approach will vary:
- integrate born-digital material with existing material/arrangement?
- one-off collection (eg project) or likely to be subsequent accruals?
- collection type; differs for personal papers & organisational records
- same personnel work on paper and born-digital components?
- can we appraise without knowing the contents?
similar to paper material that is in a different language?
AIMS
Challenges faced
Volume of material :
- depositor perception that 'storage is cheap - does this mean
we shouldn t appraise the material we receive?
- wide range of file types encountered
- not practical to describe each and every file
- risk management - if you don t check every
file for sensitive information
- we need to automate as much of the processing as possible
AIMS
Hypatia
Digital archivists' identified a gap in current tools used experiences to
define the requirements for a new tool
Key features identified:
- need an intuitive (for archivists) graphical interface
- drag'n'drop to create the intellectual arrangement
- ability to return to original order of the material
- view some file types, add descriptive metadata etc
- high level of granularity when applying rights & permissions
Technical (acquired at accessioning) and descriptive metadata -
Discovery & Access process
AIMS
Discovery and Access
Peter Chan
Stanford University
AIMS
What is Discovery & Access
Discovery and Access refers to the systems
and workflows that make processed or
unprocessed material and the metadata that
support it available to users.
Discovery and
Access
Arrangement and
Descrip&on
Accessioning
Collec&on
Development
AIMS
Goals of D&A
To make material available to user communities by
ensuring that they can:
find out about material
understand whether it is available for consultation and if so,
how
access material.
To apply appropriate access restrictions in order to
protect private and sensitive information as well as
intellectual property.
To provide access to material in a format and/or
environment that presents the original s significant
properties.
AIMS
Case Study - Stephen Jay Gould
Papers
Analog component: 550 linear feet of papers (789 boxes, 119 cartons,
30 flat boxes, and 14 map folders.
File size and number: 59.7 MB and 2,567 files.
Media formats: 98 3 floppy diskettes; 61 5.25 floppy diskettes; 4 sets
of punch cards*; 3 computer tapes
File Types: Computer Programs; Data sets; Documents; Spreadsheets
File Formats: ASCII Text; WordPerfect 4.2, 5.0, 5.1, 6.0, 6.1; Microsoft
Word 2.0, 6.0, 97, 2000; Microsoft RTF; Microsoft Excel 4.0; Lotus 1-2-3
2.0, etc.
* During processing of the analog papers in 2011, another 21 sets of
punch cards and more floppy diskettes were found.
AIMS
D&A EAD
AIMS
D&A Facet Browsing
AIMS
D&A Full text search
AIMS
D&A See Contents on Web
AIMS
D&A Tag & Annotation by
Invited Persons / Public
Annotation:
AIMS
Impacts from
Collection Development
File formats: no restriction
Computer medium: no restriction (punch card,
open reel tape, 5.25 inch floppy, 3.5 inch floppy),
File type: no restriction (computer program, data
set, document, spreadsheet),
Agreement: permission to post contents online.
AIMS
Impacts from
Accessioning
Built 5.25 inch floppy capture station
Ask Computer History Museum to read
punch cards
Open reel tapes still outstanding
AIMS
Impacts from
Processing
AccessData FTK was used to search files with restricted information,
annotate files with appropriate descriptive metadata (book title,
articles, etc.), and rights metadata (access restriction), generate
technical metadata for the delivery platform to act upon.
Transit Solution was used to transform files to html format for display
in web.
A XSLT program was written to transform the XSL-FO output from
FTK to XML content document. A Ruby program was written to
ingest the XML content document, original files, and the display
derivatives to Fedora.
AIMS
FTK Bookmark and Label
AIMS
FTK Full Text, Pattern Search &
Fuzzy Hash
AIMS
Emulation Design Files
AIMS
Network Diagram for 50,000
Creeley Emails
AIMS
MUSE: Sentiment Analysis for
Emails
AIMS
MUSE: See Individual Email
AIMS
Want to know more?
http://born-digital-archives.blogspot.com
Gretchen Gueguen Mark Matienzo
*****@********.*** ****.********@****.***
Simon Wilson Peter Chan
*.******@****.**.** ******@********.***
AIMS
ta and
wider context (eg Collection Level Description); copyright &