Q&A Retrieval Augmented Generation (RAG) with LangChain and Postgres

Large-language models (OpenAI/ChatGPT in particular) are all the rage at the moment. Like many developers, I am interested in exploring what is possible with this new technology. This post documents my experience exploring how to implement Q&A Retrieval Augmented Generation (RAG) using LangChain and Postgres (using the pgvector extension).

This post was originally written as a Jupyter Notebook, which can be downloaded here.

!pip install langchain openai datasets pgvector psycopg2-binary

Loading a Dataset into Postgres

We will initially load a dataset (Wikipedia article corpus) and chunk it into input that we can feed into the vector database.

from datasets import load_dataset

data = load_dataset("wikipedia", "20220301.simple", split='train[:10]')
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=400,
    chunk_overlap=20,
)

text_splitter.split_text(data[6]['text'])[:3]
['Alan Mathison Turing OBE FRS (London, 23 June 1912 – Wilmslow, Cheshire, 7 June 1954) was an English mathematician and computer scientist. He was born in Maida Vale, London.\n\nEarly life and family \nAlan Turing was born in Maida Vale, London on 23 June 1912. His father was part of a family of merchants from Scotland. His mother, Ethel Sara, was the daughter of an engineer.',
 'Education \nTuring went to St. Michael\'s, a school at 20 Charles Road, St Leonards-on-sea, when he was five years old.\n"This is only a foretaste of what is to come, and only the shadow of what is going to be.” – Alan Turing.',
 'The Stoney family were once prominent landlords, here in North Tipperary. His mother Ethel Sara Stoney (1881–1976) was daughter of Edward Waller Stoney (Borrisokane, North Tipperary) and Sarah Crawford (Cartron Abbey, Co. Longford); Protestant Anglo-Irish gentry.']

We can now generate embeddings of each dataset chunk and store these within Postgres.

from langchain.docstore.document import Document

documents = \
    [Document(page_content=chunk_text, metadata={"id": record['id'] + str(chunk_idx), "source": record['url']})
     for record in data
     for (chunk_idx, chunk_text) in enumerate(text_splitter.split_text(record['text']))]

Next, we can start a local Postgres instance with the pgvector extension present.

docker run --rm -it \
  --name vector-store \
  -p 5432:5432 \
  -e POSTGRES_USER=postgres \
  -e POSTGRES_PASSWORD=postgres \
  -e POSTGRES_DB=postgres \
  ankane/pgvector -c log_statement=all
from langchain.vectorstores.pgvector import PGVector
import sqlalchemy

connection_string = PGVector.connection_string_from_db_params(
    driver="psycopg2",
    host="0.0.0.0",
    port=5432,
    database="postgres",
    user="postgres",
    password="postgres",
)

engine = sqlalchemy.create_engine(connection_string)
with engine.connect() as conn:
    conn.execute(sqlalchemy.sql.text('CREATE EXTENSION IF NOT EXISTS vector; COMMIT;'))
from langchain.embeddings.openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model='text-embedding-ada-002')

store = PGVector.from_documents(
    embedding=embeddings,
    documents=documents,
    collection_name="wikipedia",
    connection_string=connection_string,
)

We can then search the vector database for the given articles that have the most relevance to the given query.

query = "Where was Alan Turing born and what is an Enigma machine?"

store.similarity_search(query)
[Document(page_content='Alan Mathison Turing OBE FRS (London, 23 June 1912 – Wilmslow, Cheshire, 7 June 1954) was an English mathematician and computer scientist. He was born in Maida Vale, London.\n\nEarly life and family \nAlan Turing was born in Maida Vale, London on 23 June 1912. His father was part of a family of merchants from Scotland. His mother, Ethel Sara, was the daughter of an engineer.', metadata={'id': '130', 'source': 'https://simple.wikipedia.org/wiki/Alan%20Turing'}),
 Document(page_content='Educated in Dublin at Alexandra School and College; on October 1st 1907 she married Julius Mathison Turing, latter son of Reverend John Robert Turing and Fanny Boyd, in Dublin. Born on June 23rd 1912, Alan Turing would go on to be regarded as one of the greatest figures of the twentieth century.', metadata={'id': '133', 'source': 'https://simple.wikipedia.org/wiki/Alan%20Turing'}),
 Document(page_content='A brilliant mathematician and cryptographer Alan was to become the founder of modern-day computer science and artificial intelligence; designing a machine at Bletchley Park to break secret Enigma encrypted messages used by the Nazi German war machine to protect sensitive commercial, diplomatic and military communications during World War 2. Thus, Turing made the single biggest contribution to the', metadata={'id': '134', 'source': 'https://simple.wikipedia.org/wiki/Alan%20Turing'}),
 Document(page_content='Education \nTuring went to St. Michael\'s, a school at 20 Charles Road, St Leonards-on-sea, when he was five years old.\n"This is only a foretaste of what is to come, and only the shadow of what is going to be.” – Alan Turing.', metadata={'id': '131', 'source': 'https://simple.wikipedia.org/wiki/Alan%20Turing'})]

Integrating the Data Source with an LLM

Now we can wire up the LLM to use the Postgres vector database within a RetrievalQA chain.

from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(
    model_name='gpt-3.5-turbo',
    temperature=0
)

There are several different ways we can compile and ask the LLM the question.

Stuff

The stuff documents chain (“stuff” as in “to stuff” or “to fill”) is the most straightforward of the document chains. It takes a list of documents, inserts them all into a prompt and passes that prompt to an LLM.

from langchain.chains import RetrievalQA

qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=store.as_retriever()
)

qa.run(query)

Alan Turing was born in Maida Vale, London. An Enigma machine was a cipher machine used by the Nazi German military during World War II to encrypt and decrypt secret messages. It was considered highly secure at the time and was used to protect sensitive communications. Turing played a crucial role in breaking the Enigma code, which greatly aided the Allied forces in their efforts during the war.

from langchain.chains import RetrievalQAWithSourcesChain

qa_with_sources = RetrievalQAWithSourcesChain.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=store.as_retriever()
)

qa_with_sources(query)
{'question': 'Where was Alan Turing born and what is an Engima machine?',
 'answer': 'Alan Turing was born in Maida Vale, London. An Enigma machine was a device designed by Turing at Bletchley Park to break secret Enigma encrypted messages used by the Nazi German war machine during World War 2.\n',
 'sources': '\n- https://simple.wikipedia.org/wiki/Alan%20Turing'}

Map-Reduce

The map reduce documents chain first applies an LLM chain to each document individually (the Map step), treating the chain output as a new document. It then passes all the new documents to a separate combine documents chain to get a single output (the Reduce step).

from langchain.chains.question_answering import load_qa_chain

qa = RetrievalQA(
    combine_documents_chain=load_qa_chain(llm, chain_type="map_reduce"),
    retriever=store.as_retriever()
)

qa.run(query)

The given portion of the document does not provide any information about where Alan Turing was born or what an Enigma machine is.

Sadly, this type of chain does not seem to have returned the desired results.

Building a Q&A Chatbot

We can also use the ConversationalRetrievalChain to build a chatbot that we can interact with to answer desired questions. This includes adding a memory, allowing us to ask additional questions based on previous interactions.

from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

qa = ConversationalRetrievalChain.from_llm(llm, store.as_retriever(), memory=memory)

qa({"question": "Where was Alan Turing born?"})
qa({"question": "What year was he born?"})
{'question': 'What year was he born?',
 'chat_history': [HumanMessage(content='Where was Alan Turing born?', additional_kwargs={}, example=False),
  AIMessage(content='Alan Turing was born in Maida Vale, London.', additional_kwargs={}, example=False),
  HumanMessage(content='What year was he born?', additional_kwargs={}, example=False),
  AIMessage(content='Alan Turing was born in 1912.', additional_kwargs={}, example=False)],
 'answer': 'Alan Turing was born in 1912.'}
qa = ConversationalRetrievalChain.from_llm(llm, store.as_retriever(), return_source_documents=True)

qa({"question": query, "chat_history": []})
{'question': 'Where was Alan Turing born and what is an Engima machine?',
 'chat_history': [],
 'answer': 'Alan Turing was born in Maida Vale, London. \n\nAn Enigma machine was a cipher machine used by the Nazi German military during World War II to encrypt and decrypt secret messages. It was considered highly secure at the time and was used to protect sensitive communications. Turing played a crucial role in breaking the Enigma code, which greatly aided the Allied forces in their efforts during the war.',
 'source_documents': [Document(page_content='Alan Mathison Turing OBE FRS (London, 23 June 1912 – Wilmslow, Cheshire, 7 June 1954) was an English mathematician and computer scientist. He was born in Maida Vale, London.\n\nEarly life and family \nAlan Turing was born in Maida Vale, London on 23 June 1912. His father was part of a family of merchants from Scotland. His mother, Ethel Sara, was the daughter of an engineer.', metadata={'id': '130', 'source': 'https://simple.wikipedia.org/wiki/Alan%20Turing'}),
  Document(page_content='Educated in Dublin at Alexandra School and College; on October 1st 1907 she married Julius Mathison Turing, latter son of Reverend John Robert Turing and Fanny Boyd, in Dublin. Born on June 23rd 1912, Alan Turing would go on to be regarded as one of the greatest figures of the twentieth century.', metadata={'id': '133', 'source': 'https://simple.wikipedia.org/wiki/Alan%20Turing'}),
  Document(page_content='A brilliant mathematician and cryptographer Alan was to become the founder of modern-day computer science and artificial intelligence; designing a machine at Bletchley Park to break secret Enigma encrypted messages used by the Nazi German war machine to protect sensitive commercial, diplomatic and military communications during World War 2. Thus, Turing made the single biggest contribution to the', metadata={'id': '134', 'source': 'https://simple.wikipedia.org/wiki/Alan%20Turing'}),
  Document(page_content='Education \nTuring went to St. Michael\'s, a school at 20 Charles Road, St Leonards-on-sea, when he was five years old.\n"This is only a foretaste of what is to come, and only the shadow of what is going to be.” – Alan Turing.', metadata={'id': '131', 'source': 'https://simple.wikipedia.org/wiki/Alan%20Turing'})]}

Adding a UI

Thanks to Gradio, we can front this chain with a simple chat UI.

!pip install gradio
import gradio as gr

memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
qa = ConversationalRetrievalChain.from_llm(llm, store.as_retriever(), memory=memory)

with gr.Blocks() as demo:
    chatbot = gr.Chatbot()
    msg = gr.Textbox()
    clear = gr.ClearButton([msg, chatbot])

    def respond(message, chat_history):
        result = qa({"question": message})
        chat_history.append((message, result['answer']))
        return "", chat_history

    msg.submit(respond, [msg, chatbot], [msg, chatbot])

demo.launch()

Chatbot