Transforming Text into Discovery: OCR Enrichment of Irish Emigrant Collections
Since the summer, the Open and Digital Research Team in the Library have been testing and refining an Optical Character Recognition (OCR) pipeline. The Library’s OCR pipeline aims to convert scanned images of textual archival records into a format that is machine-readable, at scale. This is an especially important enhancement to the University’s digitised heritage collections, making its holdings more searchable and supporting diverse areas of research interest. The first test case for the pipeline has been the Irish emigrant letters taken from the Kerby A. Miller Collection, as published online to the Library’s Digital Collections and a standalone digital repository for materials relating to Irish emigrants to North America called Imirce . In December 2024, the first batch of OCR enriched material has been launched to both the Digital Collections and Imirce online, and further heritage collections are scheduled for processing in 2025. What this means for users The Digital ...