Processing Capstone Email Using Predictive Coding


The Illinois State Archives, in partnership with the University of Illinois and with three-year funding offered by the National Historical Publications and Records Commission (NHPRC) is launching a project called: Processing Capstone Email Using Predictive Coding (a.k.a. the Capstone Email Project). The project seeks to develop and demonstrate a reliable and sustainable method of identifying and providing appropriate access to the email messages of state agencies that have enduring value.

Following the lead of the National Archives and Records Administration, we will start with using a Capstonei approach to identifying email messages having enduring value. This means the project will identify and secure email messages of senior administrative officers from state agencies according to the priorities of the Director of the State Archives. Once the email is secured, the project will work with experts in the areas of text analytics and electronic discovery to explore tools that use technology-assisted review techniques (predictive coding in particular) for the purposes of parsing and classifying the email.

We envision the tools will assist in identifying and prioritizing review of sensitive content, in generating descriptive metadata, aggregating email threads, identifying near-duplicates, and providing for some level of automatic appraisal and redaction. Once the selected tools have been identified and configured, we will conduct batch processing of email so it may be ingested into a digital repository. From there, the email will be made available for public access through in-person access to an offline computer terminal.

Plan of Work

Phase 1 – Kick-Off and Initial Explorations

Phase 2 – De-duplication and Assessment

Phase 3 – Auto-categorization Tools Assessment

Phase 4 – Restrictions and Redaction Tool Assessment

Phase 5 – Enhancement Tools Assessment

Phase 6 – Batch Email Processing

Phase 7 – Search and Access Tools Evaluation

Phase 8 – Rollout Process

Performance Objectives

  1. Establish proven workflows for the processing of Capstone email.
  2. Process at least 20 GB of email including at least one senior state official.
  3. Demonstrate processing efficiency exceeding manual human review.
  4. Provide public access to Capstone email.

Team Members

1. Project Director – David Joens
E-Records Archivist and Director, Illinois State Archives
(217) 782-3492,

2. Co-Principal Investigator – Joanne Kaczmarek
Associate Professor and Archivist for Electronic Records, University of Illinois
(217) 333-6834,

3. Co-Principal Investigator – Brent West
Asst. Director for Records and Information Management Services, University of Illinois
(217) 265-9190,

4. Project Manager – Amanda Myers
Records Archivist, Illinois State Archives
(217) 524-7528,

5. Text Analytics Expert – Dan Roth
Professor of Computer Science, University of Illinois
(217) 244-7068,

6. IT Infrastructure Expert – Tom Habing
Software Development Manager, University of Illinois
(217) 244-4425,

7. Research Assistant - Jiayue Niu
Tools Assessment, Workflow Development, and Email Processing, University of Illinois


iPRES Conference - October, 2016
Digital Library Federation Forum - November, 2016

iInformational Session: Capstone, A New Approach to Managing Email Records (February 4, 2014)