Architecting Agentic AI: Lessons Learned Engineering Timescale’s SQL Assistant

Posted by

Justin Murray

Introduction: From Launch Week to Deep Dive

During Launch Week, we unveiled a suite of new Timescale Cloud features—among them, SQL Assistant—in our post “Speed Up Your Workflows: Introducing SQL Assistant, Recommendations, and Insights”. That announcement showcased how we’re delivering AI-powered tooling, right where developers work. This follow-up dives into the engineering journey behind SQL Assistant, discussing the architecture, trade-offs, and culture that made it possible.

On the PopSQL team within Timescale, our experience has taught us that many of our users feel real pain when trying to extract insights from their data. The SQL experts on their team are overloaded with requests, and folks in product and business roles hit a lot of friction when trying to get answers.

The rise of generative AI and large language models (LLMs) add a new tool to our arsenal to help with this problem. They are very good at understanding business requirements, interpreting the schema of your database, and generating SQL queries—if you give them enough context. As powerful as these models are, it can be difficult, especially for less technical folks, to integrate into your workflow and properly use these to save time and effort.

We’ve embedded SQL Assistant where you work—whether that is Timescale’s data mode console or the standalone PopSQL application—so you can leverage AI without the friction.

We want to show you a peek behind the curtain, walking you through the decisions we made and the principles we used to build it.

Our Design Principles

SQL Assistant is an integration of several LLM providers into the PopSQL application as a chat interface with advanced capabilities. Not only do we bring the power and convenience of general-purpose LLM conversation directly into the application, but we also extend its abilities through the use of tools that provide tight integration with some of PopSQL’s most powerful features. Then we made it agentic, so it can quickly cruise through complex tasks and troubleshoot its own problems.

This product and feature set comes with some unique challenges, and sticking to these principles led us through our decision-making process, and gave us priorities to stick to through any ambiguity.

Safety first

As impressive as these LLMs can be, they still make frequent mistakes. There is a tradeoff between moving fast and staying safe. By default, the SQL Assistant won’t take any actions that might put your data at risk.

Zero friction

We want this experience to be fast and easy for our users. The product is a success when you can come to the SQL Assistant with your data questions, and it’ll figure out how to answer them correctly using your databases, with minimal additional interaction. The goal is a zero-friction experience that is always at your fingertips, but never in your way. Eliminate the context-switching, copy-pasting, and CLIs that other tools require.

Provider-agnostic

The world of LLMs is evolving quickly, so we also wanted to build a provider-agnostic platform where we could quickly adopt new models and compare and contrast how well they perform at various tasks. A big part of the design of this system involves abstracting away the differences between the various models and providers, and conforming to a common interface.

Pragmatic approach

Our vision for this project is quite large, and it continues to be a moving target. While the foundational models rapidly improve over time to deliver increased speed and accuracy, we have built out the integration incrementally as we learn more about what these models are good at, and how they can best benefit our users. We started small, and continue to evolve the tooling over time.

With these principles in mind, we can take a big-picture look at how the architecture is laid out, and the technical challenges we faced along the way.

Architecture in 60 Seconds

Viewed from the top, the architecture is quite simple. The application talks directly to our backend through a unified WebSocket interface to submit requests and receive incremental responses. This was chosen merely out of convenience, as the app already had a WebSocket connection to our backend. We could just as easily use an HTTP request with an event stream response, as many of the LLM providers use for their API. We will likely do the same as we look to support additional use cases with this infrastructure.

The backend proxies the requests to the API of the correct service provider for the selected model (with whatever reshaping is needed). This provider-agnostic adapter layer hides differences across OpenAI, Anthropic, Groq, Fireworks, Gemini, and more, keeping our application layer much simpler than it would be otherwise.

Building a Common Interface

Fortunately for us, nearly all LLM providers seem to have settled on APIs that are very similar to each other (though definitely not the same!). All the models we have integrated support a “chat completion” API, which effectively boils down to:

Input

The user’s prompt: what the user typed in the chat input box
A system prompt: what we, the application developers, have decided to include as custom instructions and context (more on that below)
The conversation history: previous user prompts and assistant responses
Tools: functions the LLM can leverage to help answer the prompt (more on that below)

Output

A stream* of response text and tool invocations from the LLM

💡

All current models support streaming responses. Previously, we had to add a “fake streaming” wrapper around models that only supported atomic responses.

The differences between model interfaces appear when you get down into the nitty-gritty details of the communication. They might use slightly different property names in communication objects, they might emit control flow signals in a slightly different order, they might take a different set of configuration options, with differing requirements, and they might support different features.

Alternative solutions

At this point, anyone familiar with this space might now be thinking: Why not use OpenRouter/LangChain/etc.? It’s a fair question, and for simple enough use cases, these tools can help accomplish similar goals while saving time.

For us, we seriously considered these options, and decided that the potential up-front time savings of these tools are not worth the long-term downsides. After all, building this wrapper layer isn’t that large (a few thousand lines make up a very small percentage of PopSQL’s codebase), and comes with some strong advantages:

Our team has a deeper understanding of the various LLMs and their APIs.
We can easily adapt the implementation as our needs change over time.
We can rapidly add support for new models and new features, without waiting for a 3rd party to implement support.
We don’t have to learn somebody else’s abstraction, which might not match our thinking model.
We can avoid additional points of failure (in an already very complex system) that are outside our control.
We can avoid the additional latency that would come with routing requests through another service layer.

Building it incrementally

Our approach to building this was very pragmatic: start with one integration, and expand it incrementally. Here’s what a rough roadmap looked like.

We started with OpenAI as the obvious first choice, given its popularity, and the fact that many other providers modelled their APIs after them. The first version of our interface was essentially identical to their client.chat.completions.create API, and so isn’t particularly interesting to talk about in detail here. The more interesting thing was building a tool calling loop.

Instrumenting tool use

The crash course on function calling for LLMs is actually pretty straightforward. The models are trained to understand the idea of a “tool”, and that it should try to use them, when deemed relevant, to answer prompts. A tool can do essentially anything that software can do—it is up to the application developer to define it and provide an implementation.

You provide the LLM with definitions of the available tools, including their name, a detailed description, and a formal definition of the function parameters. It is up to the model to decide if/when it uses a tool. When it does, it returns the name (e.g. function_a) of the tool and the arguments (e.g. foo, bar) to pass to it.

It is then entirely up to the application to actually invoke the function, and feed the results back into the LLM for another turn. This is complicated by the fact that a model may decide to use several tools at once, either in parallel (if supported) or in sequence.

Here’s some pseudo-code to help illustrate this:

// Perform the following at least once, looping if there are tool calls
do {
  for await (const chunk of service.streamChatCompletion({ messages, ... })) {
    // Emit the chunk to the client, for the sake of rendering the text
    // asap as it streams in
    yield { chunk, type: 'chunk' };

    // Assemble any function call as it streams in
    if (chunk.delta.tool_calls) {
      for (const tcDelta of chunk.delta.tool_calls) {
        addOrUpdateToolCall(tcDelta);
      }
    }
  }

  // There's nothing more to do, unless there is a tool call
  if (!toolCalls) break;

  // The tool calls request becomes part of the message history as an assistant
  // message, and needs to be present in the next iteration of the loop
  messages.push({
    role: 'assistant',
    toolCalls,
    ...
  });

  // tools are invoked in parallel
  const toolPromises = [];
  for (const tool of toolCalls) {
    // Check if this is a server-side tool
    const fnDef = functionsMap[tool.function.name];

    if (fnDef) {
      toolPromises.push(
        invokeTool(fnDef, tool)
      )
      continue;
    }

    // if the function is not defined on the server,
    // it must be a client-side tool
    const deferred = new Deferred();
    pendingClientToolCalls.set(tool.id, deferred);

    // Notify the client there is a tool to call
    yield {
      tool,
      type: 'clientToolCall',
    };

    // The client will call back to the server with the result (not shown)
    toolPromises.push(
      Promise.race([
        deferred.promise,
        fiveMinuteTimeout(),
      ]).finally(() => {
        pendingClientToolCalls.delete(tool.id);
      }),
    );
  }

  // Wait for all tool invocations to resolve
  const toolMessages = await Promise.all(toolPromises);

  // Add results to the message history, and send to the client for display
  for (const toolMessage of toolMessages) {
    messages.push(toolMessage);
    yield {
      message: toolMessage,
      type: 'message',
    };
  }

  // Prepare a new message for the next LLM turn
  yield {
    message: {
      content: '',
      id: nanoid(),
      role: 'assistant',
    },
    type: 'message',
  };
} while (toolCalls);

A note on AsyncGenerator

If you haven’t come across async generators before in JavaScript, they are a powerful feature to learn. They are a perfect fit for these LLM APIs, as they asynchronously generate chunks of results. We use these in multiple layers of our implementation. When a mapping layer is needed for a model, it is implemented as an async iterator that applies a transformation to the chunks as they are produced, and yields them upwards. This allows us to have a single implementation of the control flow (shown above) using a similar iterator/generator pattern. The messages yielded from there are emitted to the client as WebSocket events, which are consumed to update the UI in real time.

Adding more providers

With the above structure in place, adding additional providers was just an exercise in data transformation. We chose Anthropic next (due to its popularity), which proved to be the most different from OpenAI (to date). I’m going to avoid going into exhaustive detail here, and will just cherry-pick a few illustrative examples.

 for (const msg of args.messages) {
      const { content, role } = msg;
      if (role !== 'tool') prevToolMsg = null;
      switch (role) {
        case 'system': {
          if (system) {
            system += '\n\n';
          }
          system += content;
          break;
        }
        case 'user': {
          if (content) {
            messages.push({ content, role });
          }
          break;
        }
        case 'assistant': {
          const {
            reasoning,
            reasoning_signature,
            redacted_reasoning,
            tool_calls,
          } = msg;
          // ...

Messages have a few major differences, so we have to iterate over them and reshape them. System messages have to be extracted and concatenated to be passed as a totally separate parameter. User messages are identical, but with a subset of fields. Assistant message objects have to be broken apart and reshaped into a sub-array.

tool_calls.map(
  (tool): ToolUseBlockParam => ({
    id: tool.id,
    input: tryParseJson(tool.function.arguments) ?? {},
    name: tool.function.name,
    type: 'tool_use',
  }),
)

The arguments for tools are expected to be a JSON object, rather than a JSON-encoded string:

for await (const part of stream) {
  signal.throwIfAborted();
  switch (part.type) {
    case 'message_start': {
      prompt_tokens = part.message.usage.input_tokens;
      completion_tokens = part.message.usage.output_tokens;
      break;
    }
    case 'content_block_start': {
      switch (part.content_block.type) {
        case 'text': {
          if (part.content_block.text) {
            yield { delta: { content: part.content_block.text } };
          }
          break;
        }
        case 'thinking': {
          if (part.content_block.thinking) {
            yield { delta: { reasoning: part.content_block.thinking } };
          }
          break;
        }
        // ...

The output stream needs some reshaping too. Some messages arrive in a different order, so we capture those and emit them at the end, to match the OpenAI behavior.

While there is no doubt some very intricate complexity here, in total it is comfortably small. Our entire Anthropic adapter is only 350 lines of code, in addition to the common code shared by all providers.

As we add each provider, we also may make some small tweaks to the common interface to support any additional, unique features. This makes our interface a superset of all the other provider interfaces, but again this isn’t so bad, since these providers are all racing to add any new ideas from their competitors.

After Anthropic, it was straightforward to add support for Groq in ~100 lines with some very minor reshaping. Adding Fireworks.ai was trivial, since they are directly compatible with the OpenAI interface. Most recently, we added support for Gemini, with its own quirks, but still just 250 lines of adaptation.

With all of our adapter code written, we began building out the UI.

Building the user interface

A big part of the SQL Assistant experience is how you interact with it. We wanted to provide an open-ended conversation experience, as has become quite familiar to many from the likes of ChatGPT. We also wanted to make this experience feel well integrated with the rest of the application, while remaining light and intuitive. We think there is still some way to go to achieve this vision fully, but we have made some great strides so far.

For now, I’ll call out a few of the challenges:

Rendering streaming text in real-time

These conversational LLMs respond with markdown formatted text, which users expect to see well formatted and appear as quickly as the model can output it. Frontend performance becomes a problem as the output gets very long over time. We have a few optimizations in place, but know that more can be done here.

Visualizing tool use

It is important that the user can understand what tools are being used, and what kind of results those had. There is a tradeoff between keeping it simple and readable as a summary, while still exposing the technical details for power users. We have a framework in place to support custom widgets for specific tools, while also falling back to a default of a collapsed view of the JSON payloads.

Choosing a model

There are some stark differences between models, and we offer a lot of options. Choosing the right one can be a daunting task for the user. For now, we try to provide detailed descriptions of each model to help inform the user. In the future, we hope to automatically select the right model for the task at hand.

Gathering relevant context

A big advantage of using a built-in assistant over a standalone LLM application like ChatGPT is that the application can automatically include relevant contextual information without you having to think about it. For example, when you have a SQL document open, a snapshot of the current code is sent to the model with every request. This leads to very intuitive interactions with the LLM; you can converse with the assistant assuming it can see what you’re seeing—there is no need to explain what you’ve already done.

The way we do this is largely straightforward. At the time you submit your prompt, the application pulls together the current context from the open tab and other global configurations (like the set of available database connections), and transforms that into a markdown-formatted system prompt that is injected into the request.

From the user’s perspective, this is much faster and easier than manually constructing the equivalent prompt though some tedious copy-pasting.

Managing the context window limits

Where this gets tricky is that LLMs don’t have unlimited space for context. Each model defines its limit, and will reject your request if you exceed it. So, we have to carefully consider what we include, and have logic in place to trim things down when necessary, to stay within the limits.

We use a combination of strategies to prioritize the context:

The most important content cannot be trimmed or excluded. For example, if the prompt the user just entered is (somehow) larger than the limit, we have to fail outright
In the middle, we have information that we have found to be relevant to a majority of prompts around our primary use case (answering data questions). This includes your code, your most recent run results (if permitted in the preferences), and your whole conversation history (among other things). Within this group, we sort from least to most important (essentially from oldest to newest), and drop or truncate messages until it fits.
Lastly, we have data that we only expose through tool use (more on that below). It is either unlikely to be useful, or too large (and costly) to include in every request. We can let the LLM decide!

Token estimation

This leads us into the world of tokenizers, where things can get complicated fast. Given the state of available tooling, we’ve been somewhat forced into a place of estimating token use, rather than knowing a precise number. We’ve adopted a strategy of over-estimating (and therefore not fully using the available window size) in favor of continuing to function (perhaps at the expense of “forgetting” something and impacting accuracy/quality) over outright failure to respond.

Fortunately, these context limits on models are quite large, and keep getting larger. So our ability to include what is most relevant will keep getting easier with time. That said, careful moderation of the context will continue to be relevant to keep costs and latency low.

With all of the work around summarizing, and context-window management, we also need to dial in how (and when) we’re calling the tools for the LLM to utilize.

Instrumenting tools

Out of the box, an LLM can be very helpful, but its interactions are limited to outputting some formatted text for the user to read—it can’t directly do anything for you, or anything to help itself answer your prompt.

Tools change that entirely, giving the model the ability to take any action the app developers choose to grant it. This can be as powerful and open-ended as anything that software could do.

So, what things should a SQL Assistant be able to do? Our first few tools focused on ways to allow the LLM to pull in more context as needed. We give it the ability to use its own discretion, and choose to request relevant data from the application to help answer the prompt. These are the tools we developed for it to begin helping itself.

schemaSearch

We already had a schema searching API that powers several application features. It turns out that wiring this up to the LLM is a huge win. We explain (in the tool definition) that this function can be used to find the correct table and column names for the database, so that SQL can be written with less risk of guessing the names incorrectly.

Most models do an excellent job understanding how and when to use this tool. This directly leads to many more queries being generated that can be run without error on the first shot.

querySearch

One of PopSQL’s value propositions is that it is a place for your team to collaborate, and build a catalog of “good” shared queries for your databases. These can encode the nuances of your databases in ways that aren’t documented anywhere else—how tables might be joined with not-so-obvious foreign keys, or how one column might be preferred over another similar-sounding one.

We unlock that potential by allowing the LLM to perform a keyword search of your organization’s query catalog. It can then read through these as examples, informing it how to better construct a query like the others your team has already written.

This has the additional benefit of starting a small data-flywheel. If SQL Assistant generates a great query, you can add it to the catalog for future use/learning.

Putting the “assist” in assistant

After giving the LLM the ability to help itself gather context, it was then time to give it the ability to directly do work for the user. This truly takes things to the next level, making “assistant” an appropriate name.

updateQueryText

Before this function was added, the model would helpfully output some SQL code in the conversation, and then it was up to the user to incorporate that into their document.

This function gives the model the ability to directly edit the document itself.

We give the LLM an API that allows it to make precise adjustments, or do a complete rewrite if necessary. We visualize the changes as a diff, so you can easily see exactly what was changed by the model.

If you aren’t happy with the result, it is as easy as cmd+z to undo. PopSQL’s version history also makes it a breeze to go back in time, avoiding any risk of losing a valuable query.

executeQueryDocument

After making changes to the SQL code, the logical next step is to run it. When the model is satisfied with the code, it will ask for your permission:

We have safety in mind by default, so we want to give you a chance to review the code before running it (spoiler alert: you can turn this off).

An advantage over the normal Run button is that we automatically feed the result back into the model for another turn.

Often the model will give you a nice summary of the result. Where this gets fascinating, however, is when the model makes a mistake.

When the query execution fails, we feed back the error message as the tool result, which gives the model a chance to correct its mistake and try again!

Going agentic

The final step in this journey is to fully hand over the reins to the LLM.

It’s a simple change: we remove the approval step, and just execute the query whenever the models request to.

The result is very powerful: you go from asking a data question to getting an answer, without any interactions in between.

Great, but can it really do X?

Don’t let this simple example fool you into thinking this is only suitable for simple tasks. We have already seen some impressive examples when working with our internal databases. We’ve asked the assistant to do some anomaly analysis on our analytics data, watched it spend several minutes churning through dozens of queries and analyzing results, ultimately arriving at an answer that our staff admits would have taken them hours to figure out without this assistance.

While we don’t see this replacing our data team any time soon, we certainly see it as an augmentation that can drastically improve the turnaround time on certain tasks.

Safety & production readiness

By default, SQL Assistant asks for confirmation before taking risky actions, e.g.: executing SQL code in your database. We strongly recommend creating read-only credentials for your databases, and using a connection with such credentials whenever turning on Agent mode for SQL Assistant.

Similarly, by default we do not include any data from your query results in the context sent to the LLM provider. You can choose to enable data sampling in the application preferences. Doing so can greatly improve the accuracy and helpfulness of the Assistant, but comes with the same risks that arise any time data is shared with a 3rd party.

While these models are capable of some impressive results, they can still make mistakes and do unwanted things. We leave it up to you to decide your risk tolerance level, but advise that LLMs are not yet trustworthy enough to give them unguarded access to your valuable data.

Try SQL Assistant today

We hope you’ve enjoyed this glimpse into the journey for building SQL Assistant. We started simple, and incrementally built out its capabilities in a compounding way. We think we have already reached our vision to make SQL Assistant fast and easy to use, to get answers to your data questions out of your database with zero friction.

However, we won’t be stopping here. We still have many ideas to improve the accuracy of the results. We are currently prototyping a vector search for our data catalog to improve semantic awareness of your database schemas.

We also plan to continue to expand assistant capabilities into more of the application’s features. We are currently experimenting with charts and dashboard generation. Stay tuned for frequent updates!

We’ve only just begun to appreciate the power of this tooling internally, for our own team’s use. We are excited to see what else it can do, and how you all will use it too!

Sign up (for free) today and try SQL Assistant in the Data Mode view in your Timescale Console. Data Mode can be found at the top of the left side navigation. You’re in Ops mode by default, but you can easily switch back and forth between the two modes.

Please send us your feedback, with any suggestions or edge-cases!

Date published

May 16, 2025

Posted by

Justin Murray

Get Started Free with Tiger CLI

Date published

May 16, 2025

Posted by

Justin Murray

Get Started Free with Tiger CLI