Processing Capstone Email Using Predictive Coding

Introduction

The Illinois State Archives, in partnership with the University of Illinois and with three-year funding offered by the National Historical Publications and Records Commission (NHPRC) is launching a project called: Processing Capstone Email Using Predictive Coding (a.k.a. the Capstone Email Project). The project seeks to develop and demonstrate a reliable and sustainable method of identifying and providing appropriate access to the email messages of state agencies that have enduring value.

Following the lead of the National Archives and Records Administration, we will start with using a Capstonei approach to identifying email messages having enduring value. This means the project will identify and secure email messages of senior administrative officers from state agencies according to the priorities of the Director of the State Archives. Once the email is secured, the project will work with experts in the areas of text analytics and electronic discovery to explore tools that use technology-assisted review techniques (predictive coding in particular) for the purposes of parsing and classifying the email.

We envision the tools will assist in identifying and prioritizing review of sensitive content, in generating descriptive metadata, aggregating email threads, identifying near-duplicates, and providing for some level of automatic appraisal and redaction. Once the selected tools have been identified and configured, we will conduct batch processing of email so it may be ingested into a digital repository. From there, the email will be made available for public access through in-person access to an offline computer terminal.

Plan of Work

Phase 1 – Kick-Off and Initial Explorations
Phase 2 – De-duplication and Assessment
Phase 3 – Auto-categorization Tools Assessment
Phase 4 – Restrictions and Redaction Tool Assessment
Phase 5 – Enhancement Tools Assessment
Phase 6 – Batch Email Processing
Phase 7 – Search and Access Tools Evaluation
Phase 8 – Rollout Process

Performance Objectives

1. Establish proven workflows for the processing of Capstone email.
2. Process at least 20 GB of email including at least one senior state official.
3. Demonstrate processing efficiency exceeding manual human review.
4. Provide public access to Capstone email.
 

Team Members

 
1. Project Director – David Joens
E-Records Archivist and Director | llinois State Archives
(217) 782-3492, djoens@ilsos.net

 
2. Co-Principal Investigator – Joanne Kaczmarek
Associate Professor and Archivist for Electronic Records | University of Illinois
(217) 333-6834, jkaczmar@illinois.edu

 
3. Co-Principal Investigator – Brent West
Asst. Director for Records and Information Management Services | University of Illinois
(217) 265-9190, bmwest@uillinois.edu

 
4. Project Manager – Amanda Myers
Records Archivist | Illinois State Archives
(217) 524-7528, AMyers@ilsos.net

 
5. Text Analytics Expert – Dan Roth
Professor of Computer Science | University of Illinois
(217) 244-7068, danr@illinois.edu

 
6. IT Infrastructure Expert – Tom Habing
Software Development Manager | University of Illinois
(217) 244-4425, thabing@illinois.edu

 
7. Archival Email Expert (January 2017 - October 2019) – Chris Prom
Assistant Archivist | University of Illinois
prom@illinois.edu

 
8. Archival Advisor (June 2017 - October 2019) – William Maher
Director of Archives | University of Illinois
w-maher@illinois.edu

 
9. Research Assistant (October 2016 - May 2017) – Jiayue Niu
Tools Assessment, Workflow Development, and Email Processing | University of Illinois
jniu6@illinois.edu

 
10. Research Assistant (June 2017 - May 2018) – Mei Mei
Tools Assessment and Workflow Development | University of Illinois
meim2@illinois.edu

 
11. Research Assistant (June 2017 - May 2018) – Aarthi Shankar
Tools Assessment and Workflow Development | University of Illinois
shankar9@illinois.edu

References

iInformational Session: Capstone, A New Approach to Managing Email Records (February 4, 2014)