Curating a Digital-First Collection: Prof. Kerby Miller's Collection of Irish Emigrant Letters


In 2023, the University of Galway Library digitised Professor Kerby Miller’s (Professor Emeritus of History, University of Missouri) donated research collection. His collection offers a deep reading into the lives of Irish emigrants to North America and the development of Irish diaspora identities across more than 250 years. Phase 1 of making this important collection available to the public online is underway, with the aim to publish all collected Irish emigrant letters to a dedicated online portal in early 2024. The digital curation workflow involves processing, selecting and describing the letters as unique items. The curation methodology is built on an interdisciplinary approach, and this article summarises some of the strategies and advancements made to date.


The collection digitisation concluded in March 2023, with 150,000+ pages scanned to hi-resolution TIF format, according to international digital preservation standards. In anticipation of the workflow's item selection and description stages, the entire TIF collection was converted to JPEG 2000/ JPF version (for publication online) and JPEG version (for day-to-day access and reference). This conversion process took 3-4 weeks using Adobe Photoshop to run conversion actions in batches. Digital file and folder names were also revised to align with the Library’s naming conventions.

A simultaneous review of the collection’s box list determined what areas of the collection are IN and OUT of scope for selecting emigrant letters. The collection is arranged across 125 boxes, 1,491 subseries and 1,729 digital folders. The subseries are arranged around an individual, a family or other discernible grouping relating to a common association, such as location or research topic. Each folder holds a mix of material types, ranging from emigrant correspondence, memoirs, poems and newspaper clippings to research notes, researcher/ donor correspondence, genealogical records and more.

Through the box list review, 817 of 1,491 subseries were determined to be IN-scope, which supported planning and goal setting for the letter item selection and description stages. The digital extent of the collection IN-scope totals 976 folders and 77,087 pages.


The letters that Miller collected stem from library and archives holdings and private collections. While a precise number is currently unknown, the selection stage will quantify the number of unique letter items. It is estimated that Miller collected 8,000-10,000 letters over five decades. 

a) Naming 

The workflow's selection stage centres on identifying unique letter items. The letters are named using a file naming convention that supports bulk ingestion to the Library’s Digital Asset Management (DAM) system. Any file that is not a letter page is also removed from the working folder of digital assets destined for publication.  

In the example provided below, the folder on the left shows letter items named by inserting an identifier (_d00x) into the file name. The folder on the right shows the complete folder as it was digitised, holding a mix of material types. The last digit in the file name represents the page number in sequence – this unit does not change when identifying letter items. This ensures that the intellectual link between the arrangement of the physical collection and the curated digital collection is maintained. 

b) Letter Type

To guide the letter selection stage, two primary letter types have been determined: typed transcript and reproduction. These terms are used according to the definitions provided by the Dictionary of Archives Terminology:
  • Transcript: A handwritten or typed copy of a document.
  • Reproduction: A duplicate made from an original; a copy. 

Letter items curated for online publication include a typed transcript version of a letter, a reproduction (photocopy) of an original letter, or a combination of both types, where available. One of the primary curation challenges is the existence of duplicate letters across the collection. Duplicated transcripts show revisions or corrections by the Millers (Kerby and Patricia) and research assistants. There are also duplicate reproductions that may have been collected from different sources or reproduced again to improve legibility.

Combining a typed transcript and a reproduction version of a letter into a single-letter item is the preferred standard for publication. Typed transcript versions are essential for Optical Character Recognition (OCR) scanning to extract keywords for search and filtering. This will allow users to access and navigate the letters without restriction, supporting diverse areas of research interest. By contrast, the reproduction versions connect users to the records' materiality. Encountering the handwritten letters, as penned by the authors, gives readers an emotive insight into the past.

The letters will be published according to the typical standards of the University of Galway Library digital collections. Each letter item will be displayed as digital images in the IIIF document viewer, arranged according to individual author or family group. Below is an example of a digitised letter in both reproduction (left) and typed transcript (right) type for demonstration: 

c) Letter Quality 

In addition to the type of letter, the selection methodology defines specific quality criteria for letter items. When comparing duplicates, the criteria are applied to select the most desirable version of a letter transcript. Less desirable letters include the following: 
  • Excerpt: transcripts that include only part of the letter’s contents. 
  • Run on: transcripts that share start and/ or end pages with other transcripts. 
  • Annotations: transcripts that include handwritten explanatory notes, commentary or corrections to the text. 
  • Handwritten: transcripts that are not typed; written by hand. 
  • Publication: transcripts that are photocopied from a publication, such as a book, journal or newspaper. 
These versions are less desirable for different reasons. They may present technical challenges, such as impeding the application of OCR (annotations/ handwritten/ run on). They may have limited informational or evidential value if they lack the integral context of the letter (abstracts). They may also be subject to copyright restraints (publication). These quality criteria are flexible and adaptable to suit discoveries made while the work is ongoing. In this stage, the best version is identified, with the final decision to publish or not publish any letter to be made at the digital collections access stage in late 2023.


Each letter item is described according to the typical Metadata Object Description Schema (MODS) defined by the University of Galway Library. In addition to the standard metadata used across all digital collections, such as Title, Description, Date etc, the following metadata is being captured for the Miller collection to facilitate data visualisations and advanced search filtering: 
  • Sender (author) first name 
  • Sender (author) last name 
  • Sender (author) gender 
  • Recipient first name 
  • Recipient last name 
  • Recipient gender 
  • Sender (author) location  
  • Recipient location 
These descriptive elements have been selected with reference to existing digital collections of emigrant correspondence from other international institutions. The Miller collection will support a robust and engaging search experience for academic researchers and members of the public alike. 

Geographic Metadata 

To ensure consistency and for letter tagging purposes, all geographic locations are listed using comma-separated values, ordered from smallest location to largest location. The following are examples of this convention: 
  • Fort Warren, Boston, Massachusetts, United States 
  • Fallagh (townland), Kilmacthomas, Waterford (county), Ireland 
In consultation with project stakeholders, the database was selected as the primary vocabulary reference for contemporary Irish place names. When naming Irish locations, the distinction of townland, civil parish, county etc. is also defined to accurately group location tags. This will be especially useful for users to refine their search where place names are reused in hierarchical localisation, such as Galway (city) and Galway (county).  
Similarly, was selected for the vocabulary for North America and other international locations. In addition to capturing the full-text version of place names, the collection MODS includes the Unique IDentifier (UID) for each location from these two databases. The UID offers a stable geographic reference for future data modelling.

Tracking the Workflow 

Stage 1 (Processing) is complete. Stages 2 and 3 (Selection and Description) are ongoing, with a deadline of December 2023. Over the last 6-weeks, 32% of the IN-scope digital folders have been reviewed, yielding 834 unique letter items, totalling 3200+ pages and representing 250+ subseries. These letters have travelled to and from 184 distinct locations on both sides of the Atlantic, and these figures will continue to grow over the coming months.


Marie-Louise Rouget is the Project Digital Archivist for the Kerby Miller Collection. In 2023, she published her graduate research, titled 'Grave Concerns: the state of public cemetery records management in South Africa'. 

Related Links 

Image Captions 

Image 1: Letter written by Waddy Clarke to his mother, 18 December 1888. Type: reproduction. Archives reference ID: p155/57/1. 
Image 2: Screenshot of folders ‘p155_0003_0001_0001 - McClurg-McClorg' and ‘p155_0003_0001_0001’ demonstrating the file naming convention used at the letter selection stage for the Kerby Miller Collection. Archives reference ID: p155/3/1/1. 
Image 3: Letter from Joseph and Marey McClorg in Pittsburg to Mr. David McClorg in the county of Londonderry, and to the care of the Poast Master in Newtownlimavady Bovevah, 28 August 1822. Letter reproduction (left) and letter transcript (right). Kerby Miller Collection, University of Galway. Archives reference ID: p155/113/1/1. 
Image 4: Letter from James Scott, Philadelphia, to James Smyth, Moycraig, County Antrim, 7 March 1847. Type: transcript with annotations. Archives reference ID: p155/1/3/1 
Image 5: Screenshot of and UIDs in the Kerby Miller project MODS. 
Image 6: Screenshot of tab in the Digital Curation - Tracking sheet.