AI and Libraries 4: Demystifying AI for Researchers

AI promises to change how we do work, including many aspects of how third-level institutions function. This includes research — and research support. In this blog post, Oksana Dereza, author of this guide to AI for Research, delves a bit deeper into demystifying AI, offering ideas on how to support and what to say to the slightly-to-somewhat-more advanced AI user. Oksana is Digital Library Developer at University of Galway Library.

Since the release of ChatGPT, interest in AI has surged like never before. Businesses, researchers, and government bodies are diving head-first into the possibilities AI offers, eager not to miss out on what some believe is the next big technological revolution. This trend has grown so strong that many companies are now rebranding their software as “AI” solutions — a trend often called “AI washing.” But what exactly is AI, and is it truly the answer to all the world's problems?

Image source: https://www.cortical.io/blog/chatgpt-and-large-language-models-the-holy-grail-of-enterprise-ai/

What is AI?

The term Artificial Intelligence was coined in the 1950s to refer to machines that would be able to simulate human intelligence and problem-solving. However, this goal remains distant, and systems that can theoretically achieve it have been given a new, more specific name: Artificial General Intelligence (AGI). What we encounter under the name of AI today are various Generative AI (GenAI) solutions that — as the name suggests — can produce high-quality text, images, video and other content. However smart and versatile they may seem, these systems remain task-specific and lack reasoning capabilities, which makes them an example of Artificial Narrow Intelligence (ANI). As it is really developments in GenAI that has been driving excitement around AI, let’s dig a little deeper into what it consists of.

How does GenAI work?

Any GenAI system has a Large Language Model (LLM) at its core. For example, ChatGPT runs on a model called GPT that has a few different versions: GPT-3, GPT-4, GPT-4o. Like any other model, LLMs are an approximation — in this case, of a human language. This approximation is achieved by deducing patterns from a large corpus of data and memorising them through the process known as machine learning. The ultimate objective of any language model, regardless of its complexity, is the same: to predict the most likely next word (or missing word) in a sequence. It is as simple as that.

Of course, state-of-the-art LLMs work in a bit more complex way than the diagram above illustrates. They are sophisticated neural networks made up of multiple layers of interconnected elements, or “neurons”, that can pass information to each other. Yet, the core idea remains the same.

Image source: https://christophergs.com/blog/intro-to-large-language-models-llms

The current AI boom is largely driven by a new architecture for LLMs introduced in 2017 and called the transformer. The revolutionary feature of the transformer is that it relies entirely on the so-called self-attention mechanism, which simplifies the computation and, at the same time, allows the model to capture more complex relationships between words beyond just left and right context. Below is the visualisation of attention in the BERT model for the sentences “The cat is on the mat. It is not on the moon”, created using the BertViz tool. Here, “it” is strongly connected not only to the previous element (a sentence separator), but also to the word “cat,” which appears much earlier in the sequence.

Bare-bones LLMs can be fine-tuned to perform more specific tasks, such as question answering, text summarisation or translation. Fine-tuning is a bit like learning to sing when you already know how to speak: a new skill built upon existing ones with some additional information and exercises.

In addition to fine-tuning, GenAI developers can implement internal instructions that can guide a model's responses. Finally, most GenAI tools use an ensemble of different models, which explains their versatility. For example, the latest version of ChatGPT combines a multimodal LLM called GPT-4o, an image generation model DALL-E, a speech recognition model Whisper, a set of text-to-speech (TTS) models that convert written text to audio, and a set of moderation models that detect sensitive or harmful content.

A DALL-E generation of “an AI chat-bot, which is essentially a few models in a trench coat”.

In his brilliant Introduction to Large Language Models without the Hype, a computer science professor Mark Riedl described 9 things you should keep in mind when using GenAI. I asked ChatGPT to shorten his comprehensive explanations, and then manually checked the outputs and changed some wording to make sure they read well and remain fair to the original. Can you guess how much rewriting was involved to transform the raw ChatGPT outputs into the final result?

LLMs are trained on vast internet data, including harmful content like racism, sexism, insults, stereotypes, and misinformation, meaning their responses may occasionally reflect these biases.
LLMs don’t have “core beliefs”; they simply predict the next word based on patterns from internet data. They’ll generate responses for or against any topic, depending on the prompt, without holding any stance. If certain perspectives appear more often in their training data, the model may echo those more frequently, as it aims to reflect the most common responses.
LLMs have no sense of truth or morality. While they may often reflect widely accepted facts, they can just as easily generate incorrect information if prompted, as they lack any inherent understanding of right or wrong.
LLMs can make mistakes due to inconsistent training data or algorithmic imperfections. They may generate incorrect or unrelated answers — a phenomenon called “hallucination” — especially favouring familiar or frequently seen words, like small numbers or common names. This is why LLMs often struggle with precise math and factual accuracy.
LLMs are auto-regressive, meaning each word guess becomes part of the input for the next guess, allowing errors to accumulate. Even a single mistake can cascade, with the model building further errors on top of it. Transformers can’t self-correct or revise their outputs; they simply follow the sequence as it unfolds.
Always verify the outputs of a GenAI system, especially if you are asking it to do things in which you aren’t competent yourself. While mistakes may be acceptable for low-stakes tasks like writing a story, they could lead to significant losses and damage in high-stake tasks like providing stock market advice or analysing job applications.
Self-attention allows an LLM to generate more specialised responses based on the amount of information provided in the input prompt. The quality of the output directly depends on the quality of the input; better prompts yield better results. Experiment with different prompts to find what works best.
You aren't truly "having a conversation" with an LLM, as it doesn't retain memory of previous exchanges. Instead, it processes each input and response as a new interaction, creating the illusion of continuity through a programming trick that logs the conversation. While this allows for temporary coherence, there is a word limit on inputs, and once exceeded, earlier parts of the conversation are discarded, leading the model to "forget" previous details.
LLMs aren’t capable of true problem-solving or planning, as they lack goals and the ability to look ahead. Instead, they can generate plans and solutions based on patterns they've learned from training data. While their outputs may resemble structured plans, they are essentially making educated guesses based on previous examples rather than actively evaluating alternatives or considering outcomes.

What are the implications of GenAI for researchers?

So, GenAI tools like ChatGPT are powered by LLMs that have memorised a vast amount of data. Yet they still can’t operate this information in the same way that humans do. While GenAI tools excel at pattern-based tasks, like spelling correction, image captioning or generating templated text, they aren’t designed to function as encyclopedias, oracles or experts on any topic. LLMs lack human-like reasoning abilities and can’t verify the information they generate — unless paired with a knowledge graph or some other database. For example, research discovery tools like Elicit or Research Rabbit usually combine databases of research papers with GenAI-powered text search and analysis.

A knowledge graph. Image source: https://arxiv.org/html/2304.01311

When incorporating GenAI in research, it’s important to approach it with extra caution, as research is inherently about innovation, originality, and critical thought. Drawing on my research experience, my expertise in Natural Language Processing (NLP), and recommendations from major academic institutions, I’ve outlined a set of best practices to guide the use of GenAI in research scenarios, as follows:

Never present AI-generated work as your own. Be transparent about which tools you use and how you have used them in your research.
Always cite and reference material included in your research papers that is not your own work, including AI-generated content.
Meticulously fact-check all of the information produced by generative AI and verify the source of all citations the AI uses to support its claims.
Critically evaluate all AI output for any possible biases that can skew the presented information.
Do not ask AI to generate experimental data and then present it as the data you have collected, whether in its raw form or after analysis. This is data fabrication and is considered serious misconduct.
Avoid asking general-purpose AI virtual assistants like ChatGPT to produce a list of sources on a specific topic: such prompts may result in the tools fabricating false references. Use specialised tools for literature search and mapping.
Try to keep different versions as your work develops and keep notes that you make along the way. Should you be suspected of violating the research integrity policy, you can use such notes and archived files to show your working progress.
When available, consult developers' notes to find out if the tool's information is up-to-date, and if it has access to a knowledge graph or a database for fact-checking.
Keep in mind that GenAI virtual assistants like ChatGPT can’t function as a calculator, search engine, encyclopedia, oracle or expert on any topic by design. They use large amounts of data to generate responses constructed to "make sense", but they lack human-like reasoning abilities and information verification mechanisms.
Keep sensitive data away from GenAI to avoid personal and experimental data leakage. Once your data is fed to a proprietary GenAI tool, there is no guarantee that it stays private and no way to check it.

Does open-source GenAI exist?

While OpenAI’s ChatGPT has almost become a synonym of GenAI, there are many free open-source models and tools available. For example, HuggingChat offers a similar interface and lets users select the model running in the backend from the following: Llama 2 70B, CodeLlama 35B, Falcon 180B, Mistral 7B, Cohere Command R+, Google Gemma 7B. Although proprietary LLMs slightly outperform open-source ones at the moment, this gap seems to be closing rapidly.

Image source: https://huggingface.co/spaces/andrewrreed/closed-vs-open-arena-elo

Task

You are approached by a researcher with little experience of AI, who was seeking advice on whether and how to use AI in their research. What would your advice to them be? If you could draw on Mark Riedl's 9 things on GenAI, which of these do you think would be most useful in drawing their attention to (choose 2-3)?

Search This Blog

The HardiBlog