Processing Capstone Email Using Predictive Coding


The Illinois State Archives, in partnership with the University of Illinois and with three-year funding offered by the National Historical Publications and Records Commission (NHPRC) is launching a project called: Processing Capstone Email Using Predictive Coding (a.k.a. the Capstone Email Project). The project seeks to develop and demonstrate a reliable and sustainable method of identifying and providing appropriate access to the email messages of state agencies that have enduring value.

Following the lead of the National Archives and Records Administration, we will start with using a Capstonei approach to identifying email messages having enduring value. This means the project will identify and secure email messages of senior administrative officers from state agencies according to the priorities of the Director of the State Archives. Once the email is secured, the project will work with experts in the areas of text analytics and electronic discovery to explore tools that use technology-assisted review techniques (predictive coding in particular) for the purposes of parsing and classifying the email.

We envision the tools will assist in identifying and prioritizing review of sensitive content, in generating descriptive metadata, aggregating email threads, identifying near-duplicates, and providing for some level of automatic appraisal and redaction. Once the selected tools have been identified and configured, we will conduct batch processing of email so it may be ingested into a digital repository. From there, the email will be made available for public access through in-person access to an offline computer terminal.

Plan of Work

 Phase 1 – Kick-Off and Initial Explorations
 Phase 2 – De-duplication and Assessment
 Phase 3 – Auto-categorization Tools Assessment
 Phase 4 – Restrictions and Redaction Tool Assessment
 Phase 5 – Enhancement Tools Assessment
 Phase 6 – Batch Email Processing
 Phase 7 – Search and Access Tools Evaluation
Phase 8 – Rollout Process

Performance Objectives

1. Establish proven workflows for the processing of Capstone email.
2. Process at least 20 GB of email including at least one senior state official.
3. Demonstrate processing efficiency exceeding manual human review.
4. Provide public access to Capstone email.

Team Members

  • 1. Project Director – David Joens
    E-Records Archivist and Director | llinois State Archives
    (217) 782-3492,
  • 2. Co-Principal Investigator – Joanne Kaczmarek
    Associate Professor and Archivist for Electronic Records | University of Illinois
    (217) 333-6834,
  • 3. Co-Principal Investigator – Brent West
    Asst. Director for Records and Information Management Services | University of Illinois
    (217) 265-9190,
  • 4. Project Manager – Amanda Myers
    Records Archivist | Illinois State Archives
    (217) 524-7528,
  • 5. Text Analytics Expert (October 2016 - May 2017)  – Dan Roth
    Professor of Computer Science | University of Illinois
    (217) 244-7068,
  • 6. IT Infrastructure Expert (October 2016 - February 2019)– Tom Habing
    Software Development Manager | University of Illinois
    (217) 244-4425,
  • 7. Archival Email Expert (January 2017 - October 2019) – Chris Prom
    Assistant Archivist | University of Illinois
  • 8. Archival Advisor (June 2017 - October 2019) – William Maher
    Director of Archives | University of Illinois
  • 9. Research Assistant (October 2016 - May 2017) – Jiayue Niu
    Tools Assessment, Workflow Development, and Email Processing | University of Illinois
  • 10. Research Assistant (June 2017 - November 2018) – Mei Mei
    Tools Assessment and Workflow Development | University of Illinois
  • 11. Research Assistant (June 2017 - August 2018) – Aarthi Shankar
    Tools Assessment and Workflow Development | University of Illinois
  • 12. Research Assistant (January 2019 - Present) – Tara Trentalange
    Tools Assessment and Workflow Development | University of Illinois
  • 13. Research Assistant (January 2019 - Present) – Joshua Hackel
    Tools Assessment and Workflow Development | University of Illinois


Research Assistant Tara Trentelange tests out the public access computer

Research Assistant Tara Trentalange tests out the public access computer


iInformational Session: Capstone, A New Approach to Managing Email Records (February 4, 2014)