---
title: "Build a Fully Local RAG App With PostgreSQL, Mistral, and Ollama"
published: 2024-08-27T06:31:15.000-04:00
updated: 2024-11-08T05:04:39.000-05:00
excerpt: "Worried about sharing private information with LLMs? See how to build a fully local RAG application using PostgreSQL, Mistral, and Ollama.
"
tags: AI
authors: Haziqa Sajid, Hervé Ishimwe
---

> **TimescaleDB is now Tiger Data.**

Large language models (LLMs) are data-intensive systems that generate responses by analyzing vast amounts of information. However, as statistical models, they don't truly understand it. For example, LLMs don't inherently "know" that the sun rises in the east; they simply generate this fact based on patterns in the data they've been trained on. When faced with less common queries, LLMs may produce inaccurate or fabricated information, a problem known as hallucination.

Retrieval-augmented generation (RAG) helps mitigate hallucinations by combining information retrieval with text generation. Instead of relying solely on the model's internal knowledge, RAG pulls relevant documents or data to enhance the response. However, when using proprietary or confidential documents, you need to ensure their security during the retrieval process.

So, in this post, we will build a fully local RAG application to avoid sending private information to the LLM. We'll use **PostgreSQL** to store documents and **Ollama** to host a local model like **Mistral**.

## How to Build a Local RAG Application: Definition and Tools

In this section, we will discuss RAG and the tools required to build it locally. We will use PostgreSQL and Ollama to build RAG. PostgreSQL will act as a vector storage house, while Ollama provides us with embedding and generation models. 

### What is retrieval-augmented generation?

RAG, or retrieval-augmented generation, combines AI models with search algorithms to retrieve information from external sources and incorporate it into a pre-trained LLM. 

RAG takes input, retrieves relevant documents (e.g., from Wikipedia), and concatenates them with the input prompt. This combined context is fed to a text generator, producing the final output. This approach adapts to evolving facts, bypassing the need for retraining static LLMs and enabling access to up-to-date information.

RAG has now emerged as a go-to solution for question-answering tasks in natural language processing (NLP). However, many enterprises hesitate due to data privacy and security concerns. Building RAG locally addresses these issues and offers several advantages:

-   **Privacy**: keeps sensitive information on local hardware, ensuring data security
-   **Latency**: reduces response times by processing data locally
-   **Control**: provides greater control over data and model management
-   **Cost**: eliminates costs associated with cloud-based services
-   **Customization**: allows for tailored adjustments and fine-tuning specific to local needs
-   **Reliability**: operates independently of internet connectivity or remote server availability

RAG serves many applications, including:

-   **Healthcare**: RAG models assist in diagnosing diseases by retrieving and summarizing relevant medical literature, enhancing decision-making for healthcare professionals.
-   **Journalism**: News agencies use RAG to automatically generate articles and summarize reports, improving efficiency and content accuracy.
-   **Customer service**: Chatbots use RAG to fetch relevant information, providing accurate and context-aware responses to customer inquiries.
-   **Software development:** Tools like **OpenSauced** use RAG to build a "copilot" for git history. By retrieving relevant code snippets, commit messages and documentation, developers can gain insights into past decisions and better understand the evolution of their projects. [Learn more about how OpenSauced leverages RAG with Timescale and pgvector](https://timescale.ghost.io/blog/how-opensauced-is-building-a-copilot-for-git-history-with-pgvector-and-timescale/).

## Introduction to PostgreSQL, Mistral, and Ollama

Let’s shift our focus to the technologies required to build the local RAG system:

### PostgreSQL

Unlike other RDBMS, or relational database management systems, PostgreSQL supports both relational and non-relational data types, making it one of the most compliant, stable, and mature relational databases available today. Here are some of its features:

-   Multi-version concurrency control for concurrent reads and writes
-   Highly scalable with Unicode and international character sets
-   Cross-platform compatibility (Linux, Windows, OSX, FreeBSD, Solaris)
-   Dynamic loading for user-written code
-   [Full ACID (Atomicity, Consistency, Isolation, and Durability) compliance](https://www.timescale.com/learn/understanding-acid-compliance)
-   Extensible with custom functions and procedural languages
-   Rich data types (arrays, JSON, PostGIS)
-   Built-in replication and partitioning for large datasets
-   Versatility via a rich ecosystem of extensions: for example, [pgvector](https://github.com/pgvector/pgvector) turns PostgreSQL into a vector database

Building RAG with PostgreSQL was challenging due to its component requirements until [**pgai**](https://github.com/timescale/pgai/). **Pgai** builds on top of pgvector and simplifies RAG implementation by integrating embedding and generative AI models with PostgreSQL. This way, no additional libraries are required to get the embeddings and invoke a generation model. All will be managed by PostgreSQL and in PostgreSQL. We will use pgai in our RAG application workflow for this tutorial.

### Mistral

Mistral 7B is a 7.3B parameter model that outperforms **Llama 2 13B** on all benchmarks and Llama 1 34B on many benchmarks. It excels in code tasks, approaching CodeLlama 7B performance while remaining strong in English tasks.

![Performance comparison of Mistral with other models on different benchmarks ](https://storage.ghost.io/c/6b/cb/6bcb39cf-9421-4bd1-9c9d-fa7b6755ba0e/content/images/2024/08/Build-a-Fully-Local-RAG-App-With-PostgreSQL--Mistral--and-Ollama_mistral.png)

Performance comparison of Mistral with other models on different benchmarks ([Source](https://mistral.ai/news/announcing-mistral-7b/))

Mistral 7B uses grouped-query attention for faster inference and sliding window attention for handling longer sequences efficiently. Released under the Apache 2.0 license, it can be used without restrictions, downloaded, and deployed on any cloud. It’s easy to fine-tune and includes a chat model that surpasses Llama 2 13B in performance.

Its small size (~4.1 GB) makes it appropriate for this tutorial, as it can work on consumer-based machines without requiring a GPU.

### Ollama

Ollama (Omni-Layer Learning Language Acquisition Model) redefines language acquisition and NLP by enabling users to run LLMs locally on their machines. Developed to empower individuals and organizations, Ollama offers a user-friendly interface and seamless integration capabilities.

-   **Local execution**: LLMs are run locally, ensuring data privacy, faster processing, and reduced reliance on external servers. With the help of **pgai**, [the models can run from PostgreSQL](https://timescale.ghost.io/blog/use-open-source-llms-in-postgresql-with-ollama-and-pgai/).
-   **Extensive model library**: Access to a vast library of pre-trained LLMs, including popular models like Llama 3, tailored for various tasks and hardware.
-   **Seamless integration**: Integrates with tools like Python, LangChain, pgai, and LlamaIndex for easy AI application development.
-   **Customization and fine-tuning**: Allows users to customize and fine-tune LLMs for specific needs through prompt engineering and few-shot learning.

## Setup

To build local RAG, we need two things: PostgreSQL with [**pgai**](https://github.com/timescale/pgai/) and models hosted on Ollama. Let’s start with setting up PostgreSQL:

### Setting up PostgreSQL

In this tutorial, we will use the fastest way to set up PostgreSQL, which is using a pre-built docker container. 

#### Prerequisites

To connect to a PostgreSQL installation on Docker, you need:

-   [Docker](https://docs.docker.com/get-started/get-docker/)
-   [psql](https://timescale.ghost.io/blog/how-to-install-psql-on-mac-ubuntu-debian-windows/)

#### Create a docker network

First, let’s create a [Docker network](https://docs.docker.com/engine/network/) that TimescaleDB and Ollama containers will use to communicate. 

```
docker network create local-rag
```

#### Install and configure TimescaleDB on PostgreSQL

Follow these steps to install PostgreSQL and TimescaleDB using Timescale's provided packages:

1.  **Pull the TimescaleDB Docker image**. Here’s how to pull the Docker image:

```
docker pull timescale/timescaledb-ha:pg16

```

The TimescaleDB extension is pre-created in the default `postgres` database in both the `-ha` and `non-ha` Docker images. TimescaleDB is added to any new database you create in these images by default.

2.  **Run the Docker container**. To run the Docker container, do the following:

```
docker run -d --network local-rag --name  timescaledb -p 5432:5432 -e POSTGRES_PASSWORD=password timescale/timescaledb-ha:pg16

```

Here, the `--network local-rag` option ensures that the container uses the Docker network created above. Remember to change the port for each Docker instance if multiple container instances are running.

3.  **Connect to a database on your PostgreSQL instance**. If `psql` is not installed on the system, follow this [guide](https://timescale.ghost.io/blog/how-to-install-psql-on-mac-ubuntu-debian-windows/). For the `timescaledb-ha:pg16` image, the default user and database are both `postgres`. The default command to connect to PostgreSQL in this image is:

```
psql -d "postgres://postgres:password@localhost:5432/postgres"

```

4.  **Installing pgai**: use the following query to install the ai extension (latest version: 0.4.0).

```SQL
CREATE EXTENSION IF NOT EXISTS "ai" VERSION '0.4.0' CASCADE;
```

The `CASCADE` option automatically installs the `pgvector` and `plpython3u` extensions. After installation, you can ensure that the correct versions were installed successfully by listing all extensions using this command in `psql`:

```
\dx
```

Then exit the connection using the following command:

```
\q
```

Let’s look at setting up Ollama.

### Setting up Ollama

The following steps will set up **Ollama** on the local machine:

1.  **Download Ollama.** Ollama supports MacOS, Windows, and Linux. You can download it from the [official GitHub repo](https://github.com/ollama/ollama) or the [official website](https://ollama.com).

![Ollama homepage](https://storage.ghost.io/c/6b/cb/6bcb39cf-9421-4bd1-9c9d-fa7b6755ba0e/content/images/2024/08/Build-a-fully-local-RAG-app-with-postgres--ollama-and-mistral_ollama-homepage.png)

[Ollama homepage](https://ollama.com/)

In this tutorial, we'll install Ollama using Docker following this [guide](https://hub.docker.com/r/ollama/ollama). We first pull the official Ollama Docker image:

```
docker pull ollama/ollama

```

Then, run an Ollama container on the same network as our TimescaleDB container (i.e., `local–rag`) and remember to note the ports.

```
docker run -d --network local-rag -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
```

1.  **Get the model.** Visit the [model library](https://ollama.com/models?ref=timescale.com) to see all available models. The default download is the latest model. For the Mistral model,  the generation model we are using:

```
docker exec -it ollama ollama pull mistral

```

(The model size is 7B, so downloading takes a few minutes.)

For the [nomic-embed-text-v1.5](https://blog.nomic.ai/posts/nomic-embed-matryoshka?ref=timescale.com) model, the embedding model we are using:

```
docker exec -it ollama ollama pull nomic-embed-text

```

You can also simplify these commands by using the following command:

```
ollama pull mistral or ollama pull nomic-embed-text
```

2.  **Run the model.** Run the model with:

```
ollama run mistral
```

This starts an Ollama REPL where you can interact with the Mistral model. 

## Development of Local RAG

We have completed the setup; let’s start developing now.

### Architecture overview

Before going into the nitty-gritty of the details, let’s take a look at the architecture:

![Architecture diagram for local RAG application using PostgreSQL and Ollama](https://storage.ghost.io/c/6b/cb/6bcb39cf-9421-4bd1-9c9d-fa7b6755ba0e/content/images/2024/08/Build-a-fully-local-RAG-application-with-postgresql.png)

Architecture diagram for local RAG application using PostgreSQL and Ollama

Here’s a step-by-step explanation of the process, following the stages in the architecture:

1\. **Documents**: The process begins with collecting documents that must be indexed and stored.

2\. **Data indexing**: These documents are indexed and stored in a vector database. This database uses a PostgreSQL instance hosted on Timescale.

3\. **Query**: A user query is received, which initiates the retrieval and generation process.

4\. **Embedding models**: The query is sent to the embedding models running on `ollama:11434`. These models process the query to generate embeddings, which are numerical representations of the query's content. In this tutorial, we will use [nomic-embed-text-v1.5](https://blog.nomic.ai/posts/nomic-embed-matryoshka), a high-performing embedding model with a large token context window. The embedding dimension is `768`.

5\. **Vector database**: The generated embeddings are then used to search the vector database for the top results that are most relevant to the query.

6\. **Top results**: The top results from the vector database are retrieved and passed to the generation models.

7\. **Generation models**: The generation models, also running on `ollama:11434` and managed by Ollama, take the top results and generate a final response. The model used will be Mistral, a 7B model.

8\. **Result**: The generated response is returned as the result, completing the query process.

This architecture ensures data is processed locally, leveraging both embedding and generation models to provide accurate and relevant responses to user queries.

### Development

In this section, we will develop the components one by one with an explanation of the code. You can also access the code in this [Jupyter Notebook](https://github.com/timescale/private-rag-example/blob/main/local_rag_mistral.ipynb).

1.  **Install and import libraries**

The required library `psycopg2` is imported to handle PostgreSQL database operations.

```Python
%pip install psycopg2-binary

```

```Python
import psycopg2
```

2.  **Data preparation**

In this tutorial, we will create dummy data of different locations with their description. Here’s what it looks like:

```
dummy_data = [
    {"title": "Seoul Tower", "content": "Seoul Tower is a communication and observation tower located on Namsan Mountain in central Seoul, South Korea."},
    {"title": "Gwanghwamun Gate", "content": "Gwanghwamun is the main and largest gate of Gyeongbokgung Palace, in Jongno-gu, Seoul, South Korea."},
    {"title": "Bukchon Hanok Village", "content": "Bukchon Hanok Village is a Korean traditional village in Seoul with a long history."},
    {"title": "Myeong-dong Shopping Street", "content": "Myeong-dong is one of the primary shopping districts in Seoul, South Korea."},
    {"title": "Dongdaemun Design Plaza", "content": "The Dongdaemun Design Plaza is a major urban development landmark in Seoul, South Korea."}
]
```

In a real-world scenario, the data might be confidential and private documents for a local RAG application.

3.  **Database connection component**

This component establishes a connection to the PostgreSQL database using the `psycopg2` library.

```Python
def connect_db():
    return psycopg2.connect(
        host="your_host",
        database="your_database",
        user="your_user",
        password="your_password"
  port="your_port"
  )
```

We can extract the information from the connection string as follows:

```Python
postgres://postgres:password@localhost:5432/postgres

```

**Username**: `postgres`  
**Password**: `password`  
**Host**: `localhost`  
**Port**: `5432`  
**Database name**: `postgres`

We can also pass the connection string to the function like this:

```Python
def connect_db():
    return psycopg2.connect(
"postgres://postgres:password@localhost:5432/postgres")

```

4.  **Table creation component**

This component creates the **documents** table if it does not already exist. The table includes columns for `id`, `title`, `content`, and `embedding`.

```Python
with connect_db() as conn:
    with conn.cursor() as cur:
        cur.execute("""
            CREATE TABLE IF NOT EXISTS documents (
                id SERIAL PRIMARY KEY,
                title TEXT,
                content TEXT,
                embedding VECTOR(768)
            );
        """)
```

5.  **Data insertion component**

This component inserts data into the **documents** table. It first inserts the `title` and `content` then updates the `embedding` field using the [`ollama_embed`](https://github.com/timescale/pgai/blob/main/docs/ollama.md#embed) function from **pgai**.

```Python
with connect_db() as conn:
    with conn.cursor() as cur:
        for doc in dummy_data:
            cur.execute("""
                INSERT INTO documents (title, content, embedding)
                VALUES (
                    %(title)s,
                    %(content)s,
                    ai.ollama_embed(
                            'nomic-embed-text', 
                            concat(%(title)s, ' - ', %(content)s), 
                            host=>'http://ollama:11434'
                    )
                );
            """, doc)
```

The most advantageous aspect is that the generation of embeddings occurs directly within the database, eliminating the need for any additional libraries.

6.   **Data fetching component**

This component fetches and prints all data from the **documents** table, including the dimensions of the embeddings. This way, we can ensure the data insertion:

```Python
with connect_db() as conn:
    with conn.cursor() as cur:    
        cur.execute("""
            SELECT title, content, embedding 
            FROM documents;
        """)
            
        rows = cur.fetchall()
        for row in rows:
            print(f"Title: {row[0]}\nContent: {row[1]}\nEmbedding: {row[2][0:150]}...\n\n")
```

7.   **Retrieve and generate response component**

This component takes a query, embeds it, retrieves the most relevant documents based on cosine similarity, and generates a response using the [`ollama_generate`](https://github.com/timescale/pgai/blob/main/docs/ollama.md#generate) function.

```Python
query = 'Tell me about landmarks in Seoul.'

with connect_db() as conn:
    with conn.cursor() as cur:
        # Embed the query using the ollama_embed function
        cur.execute("""
            SELECT ai.ollama_embed('nomic-embed-text', %s, host=>'http://ollama:11434');
        """, (query,))
        query_embedding = cur.fetchone()[0]

        # Retrieve relevant documents based on cosine distance
        cur.execute("""
            SELECT title, content, 1 - (embedding <=> %s) AS similarity
            FROM documents
            ORDER BY similarity DESC
            LIMIT 3;
        """, (query_embedding,))
        rows = cur.fetchall()
                
        # Prepare the context for generating the response
        context = "\n\n".join([f"Landmark: {row[0]}\nDescription: {row[1]}" for row in rows])

        # Generate the response using the ollama_generate function
        cur.execute("""
            SELECT ai.ollama_generate('mistral', %s, host=>'http://ollama:11434');
        """, (f"Query: {query}\nContext: {context}",))
            
        model_response = cur.fetchone()[0]
        print(model_response['response'])
```

8.   **Results**

After querying, here is the result:

```
>> Seoul is home to several iconic landmarks that are must-visit attractions. One of the prominent landmarks is Seoul Tower, also known as Namsan Tower, which offers panoramic views of the city from its vantage point on Namsan Mountain. Another significant landmark is Gwanghwamun Gate, the main and largest gate of Gyeongbokgung Palace, showcasing traditional Korean architecture and historical significance.
```

Here we have it: our very own local RAG system powered by PostgreSQL and Ollama. No data is sent to public LLMs, ensuring complete privacy and security for your information.

Our friend Dave Ebbelaar has also put together a great video on this topic, so be sure to check it out.

<iframe width="200" height="113" src="https://www.youtube.com/embed/hAdEuDBN57g?feature=oembed" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen="" title="Build high-performance RAG using just PostgreSQL (Full Tutorial)"></iframe>

## Conclusion

Large language models rely on vast data but can hallucinate when queried with uncommon topics. RAG combines information retrieval with text generation to mitigate this, allowing for efficient updates without full retraining. However, standard RAG methods often send data to external LLMs, risking confidentiality breaches. Local RAG addresses this challenge by processing and generating responses entirely within a secure local environment, ensuring data privacy and security. 

In this article, we created a local RAG application using PostgreSQL with pgai, Mistral, and Ollama. With the provided code, you can develop your own RAG application for note-taking, code comprehension, or simply as a personal AI on your laptop.

Start using [**pgai** to simplify your AI pipelines](https://timescale.ghost.io/blog/pgai-giving-postgresql-developers-ai-engineering-superpowers/) and use AI within the database—no need to resort to other libraries. To get started, head over to the [pgai GitHub repo](https://github.com/timescale/pgai/tree/main) and [check out our tutorial to learn more about **Ollama and pgai**](https://timescale.ghost.io/blog/use-open-source-llms-in-postgresql-with-ollama-and-pgai/) so you can start building more private, lower-cost AI applications today. 

[Create a free Timescale](https://console.cloud.timescale.com/signup) account to access [pgvector](https://github.com/pgvector/pgvector?tab=readme-ov-file#pgvector), [pgai](https://github.com/timescale/pgai), and [pgvectorscale](https://github.com/timescale/pgvectorscale?tab=readme-ov-file#pgvectorscale) in one place—with the full benefits of a mature cloud PostgreSQL platform.