---
title: "Best Open-Source AI Model: Experimenting With Phi-4 and Ollama in PostgreSQL"
published: 2025-01-23T19:38:55.000-05:00
updated: 2025-01-23T19:38:55.000-05:00
excerpt: "Picking the best open-source AI model is easy with Ollama and pgai. Learn how to do it and see how to implement a RAG system with Microsoft’s Phi-4."
tags: AI
authors: Hervé Ishimwe
---

> **TimescaleDB is now Tiger Data.**

The emergence of lightweight, powerful open-source AI models like [Microsoft’s Phi-4](https://techcommunity.microsoft.com/blog/aiplatformblog/introducing-phi-4-microsoft%E2%80%99s-newest-small-language-model-specializing-in-comple/4357090) and [Meta’s Llama 3.2](https://ollama.com/library/llama3.2) has transformed the AI landscape, making it more accessible to developers. 

Open-source AI models are artificial intelligence models whose code, weights, and architecture are publicly available for anyone to view, use, modify, and distribute. These open-source tools for AI applications are cost-effective, highly customizable, and allow complete control over the data flow, making them ideal for building privacy-focused, LLM-powered systems.

(If you want to learn more about this, we recently built [a fully local retrieval-augmented generation (RAG) application using Mistral](https://www.timescale.com/blog/build-a-fully-local-rag-app-with-postgresql-mistral-and-ollama), another leading open-source model.)

However, identifying the best open-source embedding or generative model for your AI use case remains challenging. Running these models locally requires computational resources, technical expertise, and time to establish robust evaluation workflows. These hurdles can slow development progress and discourage adoption despite the advantages of open-source tools. 

But it doesn’t have to be this way. In this blog post, you’ll learn how to simplify this process using [Ollama](https://ollama.com/) and [pgai](https://github.com/timescale/pgai). This will enable you to experiment with different models and quickly implement a RAG system using Microsoft’s Phi-4 in PostgreSQL.

## Exploring Open-Source AI Tools: Phi-4, Microsoft’s Cutting-Edge Open-Source LLM

[Phi-4](https://techcommunity.microsoft.com/blog/aiplatformblog/introducing-phi-4-microsoft%E2%80%99s-newest-small-language-model-specializing-in-comple/4357090), developed by [Microsoft](https://azure.microsoft.com/en-us/products/phi/), is a compact, open-source large language model with **14.7 billion** **parameters**, a **16K-token context window**, and **a size of just 9.1 GB**. It is designed for research on large language models and use in general AI systems, [primarily focusing on English](https://huggingface.co/microsoft/phi-4#intended-use).

Phi-4 excels in reasoning tasks, especially mathematics, outperforming even larger models like [Gemini Pro 1.5](https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/). Its superior performance stems from its high-quality training data, including synthetic datasets, Q&A datasets, curated academic publications, and filtered public domain websites. This diversity makes Phi-4 a unique option for developers to experiment with, especially for those seeking accuracy and efficiency.

![Phi-4 performance on math competition problems. Source: Introducing Phi-4: Microsoft’s Newest Small Language Model Specializing in Complex Reasoning ](https://timescale.ghost.io/blog/content/images/2025/01/Open-Source-AI-Model-Experiment_Building-With-Phi-4-and-Ollama-in-PostgreSQL.png)

__Phi-4 performance on math competition problems.__ _****Source:****_ [__Introducing Phi-4: Microsoft’s Newest Small Language Model Specializing in Complex Reasoning__](https://techcommunity.microsoft.com/blog/aiplatformblog/introducing-phi-4-microsoft%E2%80%99s-newest-small-language-model-specializing-in-comple/4357090)

## AI Model Experimentation: Integrating Ollama and Pgai

To fully appreciate the strength of combining Ollama and pgai, let’s first understand what each tool offers individually and explore the potential of their integration.

### Ollama: Open-source models at your fingertips

[Ollama](https://github.com/ollama/ollama) (Omni-Layer Learning Language Acquisition Model) is an open-source tool that provides a unified interface for accessing and running embedding models and LLMs locally. Abstracting API complexities allows developers to focus on building applications without worrying about handling different model endpoints.

[With Ollama](https://ollama.com/library), you can easily access [Phi-4](https://ollama.com/library/phi4) and other models like [Llama 3.2](https://ollama.com/library/llama3.2) and [Mistral](https://ollama.com/library/mistral), making comparisons and experimenting straightforward. Additionally, Ollama simplifies downloading and managing embedding models, such as [nomic-embed-text](https://ollama.com/library/nomic-embed-text) and [mxbai-embed-large](https://ollama.com/library/mxbai-embed-large), which can be integrated seamlessly into your workflows.

```Python
echo "Downloading embeddings models for experimenting...."
ollama pull nomic-embed-text
ollama pull mxbai-embed-large

echo "Downloading generative models for experimenting...."
ollama pull phi4
ollama pull llama3.2
```

### Pgai: The AI engine within PostgreSQL  

[Pgai](https://github.com/timescale/pgai) is an open-source PostgreSQL extension that integrates embedding generation and response workflows into the database. This approach eliminates the need for external pipelines, enabling seamless interaction with your data. Pgai supports various model providers, including [Ollama](https://github.com/timescale/pgai/blob/main/docs/ollama.md), [OpenAI](https://github.com/timescale/pgai/blob/main/docs/openai.md), and [Cohere](https://github.com/timescale/pgai/blob/main/docs/cohere.md), making it a versatile choice for AI development.

#### Why use pgai?

1.  **Familiarity and ease of use**

Pgai leverages PostgreSQL, [a popular open-source database recognized as the “Best Database” in Stack Overflow’s Developer Survey for two consecutive years](https://survey.stackoverflow.co/2024/technology#most-popular-technologies-database). Its SQL-based interface ensures that new or experienced developers can execute AI-related functions intuitively, smoothing the transition to AI development. Moreover, using PostgreSQL as your vector database ensures efficient data management by eliminating redundancy and allowing you to store your data alongside embeddings in a unified system.

2.  [**Ollama API integration**](https://github.com/timescale/pgai/blob/90e07b1ab45275215a69e7c190a23d08f1482666/docs/ollama.md) 

Pgai abstracts the complexities of [Ollama’s API](https://github.com/ollama/ollama/blob/main/docs/api.md), allowing developers to focus on building applications without technical hurdles. This integration ensures steady progress, whether you’re a novice or a seasoned engineer.

Before continuing, let’s set up our PostgreSQL and install the necessary extensions. This tutorial will use [Paul Graham's essays](https://huggingface.co/datasets/sgoel9/paul_graham_essays) as the source data. Pgai functions exist in the ai schema.   

```Python
-- Install the pgai extension
CREATE EXTENSION IF NOT EXISTS ai CASCADE;

-- Configure at the session level the host from which Ollama is served 
SELECT set_config('ai.ollama_host', 'http://host.for.ollama:port', false);

-- Create the source table
CREATE TABLE IF NOT EXISTS essays (
    id BIGINT PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
    title TEXT NOT NULL,
    date TEXT,
    text TEXT NOT NULL
);

-- Load dataset from Hugging Face
SELECT ai.load_dataset(
    'sgoel9/paul_graham_essays', 
    table_name => 'essays', 
    if_table_exists => 'append'
);
```

### Pgai Vectorizer: Experiment with embedding models

Pgai also includes [pgai Vectorizer](https://github.com/timescale/pgai/blob/main/docs/vectorizer-quick-start.md#vectorizer-quick-start), which [automates embedding generation and synchronization with your source data using a single SQL command](https://www.timescale.com/blog/how-to-automatically-create-update-embeddings-in-postgresql). This feature is invaluable for testing multiple embedding models, saving time and computational resources. Hence, you can use it with Ollama to configure vectorizers that use open-source embedding models and then compare their individual performances and performances with different generative models in any AI-powered system. You can read about [the simple evaluation workflow we recently used to find the best open-source embedding model for RAG](https://www.timescale.com/blog/finding-the-best-open-source-embedding-model-for-rag). 

Here’s how to configure vectorizers for embedding models, `nomic-embed-text` and `mxbai-embed-large`. For more information about setting up the environment, please check out this [pgai Vectorizer quick start with Ollama](https://github.com/timescale/pgai/blob/main/docs/vectorizer-quick-start.md).

```Python
-- Configure vectorizer using nomic-embed-text
SELECT ai.create_vectorizer(
    'essays'::regclass,
    destination => 'essays_nomic_embed_embeddings', 
    embedding => ai.embedding_ollama('nomic-embed-text', 768),
    chunking => ai.chunking_recursive_character_text_splitter('text', 512, 50),
    formatting => ai.formatting_python_template('title: $title $chunk')
);

-- Configure vectorizer using mxbai-embed-large
 SELECT ai.create_vectorizer(
    'essays'::regclass,
    destination => 'essays_mxbai_embed_large_embeddings', -- name of the view
    embedding => ai.embedding_ollama('mxbai-embed-large', 1024),
    chunking => ai.chunking_recursive_character_text_splitter('text', 512, 50),
    formatting => ai.formatting_python_template('title: $title $chunk')
);
```

At this point, each vectorizer creates **an embeddings table**, named “**destination\_store**,” which stores the generated vector embeddings. **view**, whose name is the destination parameter, that **combines information from the source table, essays, and this newly created embeddings table** 

The following SQL function, `generate_rag_response`, facilitates vector search on a specified view linked to one of the generated embedding tables. It leverages the vector representation of the user’s query, created using the pgai function, [`ai.ollama_embed`](https://github.com/timescale/pgai/blob/main/docs/ollama.md#embed), and the same embedding model as the table it searches. This ensures consistency in representation and retrieves the most relevant chunks.

Then, the retrieved chunks and the user’s query are passed to the generative model alongside the user’s query for response generation using another pgai function, [`ai.ollama_chat_complete`](https://github.com/timescale/pgai/blob/main/docs/ollama.md#chat-complete), which hits the [`/api/generate`](https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-completion) endpoint from Ollama.

```Python
CREATE OR REPLACE FUNCTION generate_rag_response(
    query_text TEXT,
    embeddings_view TEXT DEFAULT 'essays_nomic_embed_embeddings',
    embedding_model TEXT DEFAULT 'nomic-embed-text', 
    generative_model TEXT DEFAULT 'phi4',
    chunk_limit INTEGER DEFAULT 3
)
RETURNS TEXT AS $$
DECLARE
   context_chunks TEXT;
   response TEXT;
   system_prompt TEXT := 'You are a helpful assistant. Provide accurate, well-reasoned responses based on the given context. If the context is insufficient to answer the question, say so.';
BEGIN
   -- Perform similarity search to find relevant text chunks
   SELECT string_agg(title || ': ' || chunk, E'\n') INTO context_chunks
   FROM
   (
       SELECT title, chunk
       FROM embeddings_view 
       ORDER BY embedding <=> ai.ollama_embed(embedding_model, query_text)
       LIMIT chunk_limit
   ) AS relevant_posts;

   -- Generate a chat response using Phi-4
   SELECT ai.ollama_chat_complete
   ( generative_model,
     jsonb_build_array
     ( jsonb_build_object('role', 'system', 'content', system_prompt)
     , jsonb_build_object
       ('role', 'user'
       , 'content', query_text || E'\nUse the following context to respond.\n' || context_chunks
       )
     )
   )->'message'->>'content' INTO response;

   RETURN response;
END;
$$ LANGUAGE plpgsql;
```

This function is versatile and supports experimentation with both embedding and generative models. **To experiment with embedding models**, you can fix the generative model while switching between different embedding-generated views. Conversely, **to test various generative models**, you can fix the embedding model and then alternate between different generative models.

Here are examples for each use case:

```Python
-- Experimenting with embedding models (mxbai-embed-large and Phi-4)
SELECT generate_rag_response(
    'Give me some startup advice',
     embedding_view := 'essays_mxbai_embed_large_embeddings', 
     embedding_model := 'mxbai-embed-large'  
);

-- Experimenting with generative models (nomic-embed-text and Llama 3.2)
SELECT generate_rag_response(
    'Give me some startup advice', 
     generative_model := 'llama3.2' 
);
```

## Conclusion

Open-source AI tools like Ollama and pgai make experimenting with embedding and generative models intuitive and efficient. By leveraging these AI tools alongside Microsoft’s Phi-4 and PostgreSQL, you can rapidly prototype and implement AI-powered applications while retaining complete control over your data. Whether comparing open-source AI models or deploying a robust RAG system, this stack allows you to innovate with ease and speed. 

If you’re building AI applications, explore [Timescale’s complete open-source AI stack](https://www.timescale.com/ai) or head to [pgai](https://github.com/timescale/pgai)’s GitHub repository to start bringing AI workflows into PostgreSQL—no need to leave your database. If you find it helpful, we would love your support—leave us a ⭐!  

### Further reading

Want to learn more? Check out these blog posts and resources about open-source AI models and tools:

-   [Stop Paying the OpenAI Tax: The Emerging Open-Source AI Stack](https://www.timescale.com/blog/the-emerging-open-source-ai-stack)
-   [Evaluating Open-Source vs. OpenAI Embeddings for RAG: A How-To Guide](https://www.timescale.com/blog/open-source-vs-openai-embeddings-for-rag)
-   [Which OpenAI Embedding Model Is Best for Your RAG App With Pgvector?](https://www.timescale.com/blog/which-openai-embedding-model-is-best)
-   [Local RAG Using Llama 3, Ollama, and PostgreSQL](https://youtu.be/-ikCYKcPoqU?si=vS_WE7-6UbGFRz3U)
-   [Vector Databases Are the Wrong Abstraction](https://www.timescale.com/blog/vector-databases-are-the-wrong-abstraction)