AI and Libraries 3: AI and Collection Management

 This post was written by Teresa Curtin, Cataloguing and Metadata Librarian at University of Galway Library.

The following is an exploration on the prospects of using untrained generative AI in assistance of cataloguing bibliographical items using Marc-21, RDA and LCSH subject headings - a comparative analysis of outputs from ChatGPT versions GPT 3.5 and 4.0.

Introduction

Potential use cases for machine learning (ML) and artificial intelligence (AI) solutions in libraries is a key area of research being examined and developed across the field of library sciences. The incorporation of AI into various aspects of the library such as chatbots, research methods guidance and topic exploration, are means in which AI could potentially assist library stakeholders, users and researchers. Linked to this, there are means in which AI could potentially assist librarians in areas such as collection management.

AI may impact metadata and bibliographic ranking in the years to come. For example, AI could quickly make batch changes across numerous records to improve their overall quality or amend missing data. Trained machine learning solutions may also allow for the improvement of metadata across records, implementation of standardised metadata and / or the application of controlled vocabulary which in turn may increase resource discovery and retrieval.

During the process of researching AI use cases in libraries, queries relating to the potential use cases and reliability of existing generative AI solutions was noted as an area for further examination. While untrained on individual repositories rules and schemas, generative AI can produce responses in library index formats. To test the usability and reliability of this, we performed tests on two different versions of ChatGPT.

Methodology

Five physical books from our collection were selected for testing using the following criteria:
  • Books were recently published, post 1990
  • All books had an ISBN
  • Each book represented an area that is commonly accessed from the library, representing different genres including fiction and non-fiction
  • A mixture of monographs and serials

 Based on the above criteria, the following were selected:

  • The Amber Spyglass by Philip Pullman - special edition of a popular young adult fantasy novel. Third book in the "His Dark Materials" series.
  • Science fiction before 1900 by Paul Alkon -- literary criticism and exploration of science fiction novels. This item is part of a larger series.
  • Coalisland, County Tyrone, in the industrial revolution 1800 - 1901 by Austin Steward -- local economic history focused on County Tyrone, Ireland.
  • Wetlands by Willian J. Mitsch and James G. Gosslink -- fourth edition, volume detailing the ecology and natural sciences of wetlands.
  • Learning python by Mark Lutz -- fifth edition, computer science instructional manual on python.

For the initial test, the free version of ChatGPT was used in April 2024. The second test took place on a newer and updated free version in June 2024.

Prompt engineering is key aspect to using generative AI tools. Creating carefully phrased prompts should be given time and consideration as clarity and focus is key for avoiding hallucinations, incorrect outputs or adding potential unintended bias to results from the query. For this experiment, using the in-house cataloguing rules and schema was utilised.

For each book, ChatGPT was provided with specific bibliographic details to assist in cataloguing and retrieval. The details provided for each were:

  • Title
  • Year of publication
  • ISBN
  • Author

A sample provided to ChatGPT was:

“Can you create a MARC21 record using RDA for the following book <inserted book details here> with 520, 650, 082 and 490 fields completed. Please use the Dewey decimal classification system for the 082 field. Please use library of congress subject headings for the 600 fields.”

Results

From the original test conducted in April, the initial results were to an acceptable quality. Records returned with the requested information including correctly assigned and retrieved titles, authors, accurate subject headings and 500 fields. Further analysis would be recommended for all subject headings suggested to ensure that they are compliant with LCSH controlled vocabularies.

Of note, it predicted the correct Dewey decimal numbers four out of five times. The incorrectly generated DDC related to the book being about the history of economics in an area. It selected a history DDC instead of an economics DDC which suited the book’s content better. The generated result was unable to understand series statements. When requested to insert a 490 and 830 field for titles which were part of a series, it was unable to complete them. It inserted the code with the correct notation but did not insert series information into them. Finally, when requested to add in associated people who were not authors, such as an illustrator, it was not able to retrieve this information. From this experiment, the fast loading and retrieval of information could be very helpful for initial record creation, but all data needed to be closely reviewed to ensure accuracy. It may assist in increasing throughput, but a professional will still need to review output. Potentials of bias from data it has pulled from, or hallucinations, must always be considered.

In the following test conducted in June on the newer free version of ChatGPT, the same parameters, prompts and titles were used. Overall, an improvement in output from ChatGPT was witnessed. Results were loaded faster and with greater accuracy. Areas where it had struggled before were now being completed correctly. For example, it was able to complete the 490 and 830 series statements without issue, including indicating what volume or number it was in the series. It selected the incorrect DDC for the economics book as seen in the last experiment but when queried it provided an in-depth explanation of its logic and assigned the correct DDC afterwards. As before, all results would still need to be reviewed and analysed by trained personnel, but the overall quality and speed of record return had increased. 

Discussion and Conclusion

The potential use cases for AI in collection management and cataloguing is a space which is being developed increasingly. While use cases for untrained generative AI sources such as ChatGPT in the library space may be limited, the use case for trained AI within libraries could lead to enhanced workflows and throughput. In relation to cataloguing, AI solutions such as Annif and FintoAI are already being implemented in libraries across Europe. Annif is trained on an institution’s records while Finto is a pretrained version of Annif for subject indexing in several languages.

Untrained AI such as ChatGPT may assist in cataloguing and the generation of initial records, but it will require analysis and review to ensure it is accurate for the record being created. This initial assistance may remove some of the more time-consuming components to cataloguing original records but should be done with the knowledge that it will need to be edited. An awareness of potential bias and hallucinations must also be remembered. Also, an area for further examination with using generative AI for record creation is the area of copyright and intellectual property. If the information is being reproduced from information published on the web, how different is the information being provided to the user versus the original source? This query leads to potential issues of intellectual property or copyright infringement. Being aware of this is another key issue to keep in mind when using generative AI. This latter issue could also be an area for further research.

Despite this though, the potential positive impacts from AI and cataloguing are a field which may positively impact those who chose to interact with it.


Future Research 

This experiment will be run periodically throughout the coming months to continue to assess the overall quality and progress being made in relation to generative AI tools and cataloguing abilities.

Of note, a means to quality assess returns to a standardised level is also being considered to highlight areas of progress and need for improvement within generative AI tools.

 

     Activity

Following the above methodology, try and run a cataloguing test of your own using ChatGPT. Record your results and overall findings. What worked well? What didn’t work as well? As with other titles in this blog series, you can share your results with us and submit for the contest (by close of business Monday, 25th November).






Comments