MindsDB Query Engine · Open-source

One SQL dialect across 200+ data sources.

Name: MindsDB Query Engine
Author: MindsDB

MindsDB is the open-source query engine that gives AI agents a single way to read from databases, warehouses, SaaS apps, document stores, and vector indexes — with built-in Knowledge Bases for unstructured data and Jobs & Triggers for automation.

Get the open-source engine → Read the docs ↗

200+Data sources
39K+GitHub stars
500K+Deployments
GPL-3.0Open license

What it does

Connect. Unify. Automate.

Three primitives — three SQL statements — turn a sprawl of databases, documents, and APIs into a single queryable surface an agent can reason over.

01 · Connect

200+ integrations behind one SQL dialect.

Wire up a Postgres warehouse, a Salesforce account, an S3 bucket of PDFs, and a vector store — each as a "database" inside MindsDB. Then query across them with standard SQL: joins, aggregates, subqueries, the whole vocabulary. No data movement, no ETL pipelines to maintain. Handlers ship in the open and new ones merge from the community.

CREATE DATABASE postgres_prod
WITH ENGINE = 'postgres',
PARAMETERS = {
  "host": "db.internal",
  "user": "readonly",
  "password": "${POSTGRES_PWD}",
  "database": "analytics"
};

SELECT customer_id, total_arr
FROM postgres_prod.accounts
WHERE region = 'EU';

02 · Unify

Knowledge Bases turn unstructured data into queryable rows.

Point a Knowledge Base at a folder of PDFs, a Confluence space, or a stream of support tickets. MindsDB chunks the content, vectorizes it, and stores metadata you choose — author, source URL, last-updated, anything. Then query it with the same SQL: semantic search, metadata filters, joins against structured tables. The hard part of building a RAG pipeline becomes one CREATE statement.

CREATE KNOWLEDGE_BASE customer_docs
USING
  embedding_model = 'openai.text-embedding-3-small',
  content_columns = ['body'],
  metadata_columns = ['author', 'source_url', 'updated_at'];

INSERT INTO customer_docs
SELECT body, author, source_url, updated_at
FROM s3_bucket.support_pdfs;

SELECT chunk_content, source_url
FROM customer_docs
WHERE content LIKE 'invoice dispute resolution'
  AND author = 'support-team'
LIMIT 5;

03 · Automate

Jobs and Triggers keep the data layer alive.

Jobs run on a schedule — refresh a Knowledge Base nightly, sync a derived table every hour, recompute a feature view every five minutes. Triggers fire on data changes — when a new row lands in Postgres, run a follow-up query that vectorizes it into the right Knowledge Base. Together they turn the query engine into a self-maintaining data layer agents can rely on.

CREATE JOB refresh_docs (
  INSERT INTO customer_docs
  SELECT body, author, source_url, updated_at
  FROM s3_bucket.support_pdfs
  WHERE updated_at > (
    SELECT MAX(updated_at) FROM customer_docs
  )
)
EVERY 1 hour;

How it fits together

One engine, many sources, agent-ready.

MindsDB sits between agents and the systems where data actually lives. Agents speak SQL or MCP to MindsDB; MindsDB speaks each source's native protocol to fetch, join, and return rows.

Agents

OpenClaw
OpenClaw + GBrain
NanoClaw
Anton, Hermes
Any SQL client
Any MCP client

MindsDB

Query Engine

Federated query
Knowledge Bases
Jobs & Triggers
Models & views

Data sources · 200+

Postgres, MySQL, Mongo
Snowflake, BigQuery
Salesforce, HubSpot
S3, GCS, files
Pinecone, pgvector

Agents (or any SQL client) connect to MindsDB once. MindsDB dispatches queries to the right handler — Postgres, Snowflake, Salesforce, S3, a Knowledge Base — and returns a unified result set.

200+ integrations

Connectors for every system worth talking to.

Each integration is an open-source handler in the main repo — merged in the open, with a consistent SQL interface across the whole fleet.

Databases & warehouses

Postgres, MySQL, MongoDB, Snowflake, BigQuery, ClickHouse, Redshift, Databricks

Cloud & SaaS platforms

Salesforce, HubSpot, Stripe, Shopify, Slack, Notion, Jira, GitHub

Documents & file-based systems

S3, GCS, Azure Blob, local files, PDF, HTML, Markdown

Enterprise apps & APIs

SAP, Oracle, NetSuite, ServiceNow, custom REST endpoints

Vector stores & AI infrastructure

Pinecone, Weaviate, Chroma, pgvector, OpenAI, Anthropic, Hugging Face

Knowledge Bases

RAG, without the pipeline.

A traditional RAG stack means stitching together a chunker, an embedding model, a vector store, a metadata layer, and a retrieval API. A MindsDB Knowledge Base is one SQL statement: you declare the embedding model and the metadata columns, and the engine handles chunking, vectorization, storage, re-embedding on update, and hybrid retrieval.

Same SQL surface. Query a Knowledge Base like any table — semantic search via LIKE, structured filters via WHERE.
Metadata you choose. Author, source, timestamp, custom tags — used for filtering and explainability.
Re-embed automatically. A Job can refresh the index on a schedule; a Trigger can re-embed on row change.
Embedding-model neutral. Pick OpenAI, an open model, or a self-hosted endpoint per Knowledge Base.

Unstructured input

PDF MD HTML Confluence

1 Chunk 2 Vectorize 3 Tag with metadata

Queryable table

content	author	source
invoice dispute…	support	s3://docs
refund policy…	legal	confluence
onboarding step…	cs-team	s3://docs

SELECT … WHERE content LIKE '…'

Why agents need this

A reliable data layer is the hard part of agent-building.

Most agent demos fail in production for the same reason: the data layer behind them isn't shaped for what agents actually do.

Agents don't want N APIs.

One uniform SQL interface to Postgres, S3, Salesforce, and a vector store — instead of one bespoke tool per provider. Smaller prompt, fewer tool-use failures.

Agents need fresh context.

Jobs and Triggers keep Knowledge Bases and derived tables in sync without a separate orchestrator. The data the agent retrieves at run time is current, not from last week's ETL.

Agents mix structured and unstructured.

A useful answer often joins the row from your warehouse with a paragraph from a PDF. MindsDB lets you write that join in one SQL statement — instead of two systems and a glue layer.

Agents need governance you trust.

Permissions, audit, and source-of-truth live where they should — in your databases — and MindsDB enforces them on every query. The agent inherits your existing access model.

FAQ

Common questions.

What is MindsDB Query Engine?

MindsDB is an open-source SQL query engine that unifies access to 200+ data sources — databases, warehouses, SaaS applications, document stores, and vector indexes — under one SQL dialect. It exposes Knowledge Bases for unstructured data and Jobs / Triggers for automation, so an AI agent has a single, consistent surface to query whatever data it needs.

Is MindsDB still maintained?

Yes. MindsDB is actively developed under the GPL-3.0 license on GitHub (github.com/mindsdb/engine). It is a standalone open-source project that continues to ship features and connectors. For hosted agent building, the same team also offers MindsHub, which is a separate product.

How is MindsDB different from MindsHub?

They are two separate products from the same team. MindsDB is a standalone open-source data engine — install it, point it at your sources, and write SQL. MindsHub is a separate hosted platform purpose-built for running open-source agents (OpenClaw, NanoClaw, Anton, Hermes) with a Model Router, a credentials vault, and persistent execution. MindsHub is not built on top of MindsDB; the two are complementary surfaces for different jobs. If you want to build a data layer and an agent runtime yourself, MindsDB is one option; if you want a hosted runtime ready to go, use MindsHub.

How does the query engine compare to a traditional database?

MindsDB is not itself a storage engine — it does not own the data. It is a federated query layer that speaks SQL to clients and each source's native protocol to the underlying systems. Think of it as a uniform read/write surface across the systems you already run, rather than a place to copy data into.

How do AI agents use MindsDB?

Agents connect to MindsDB once — via SQL or the Model Context Protocol (MCP) — and get a single tool for reading every connected source. Instead of giving the agent one bespoke tool per provider (which blows up the prompt and the failure surface), the agent issues SELECT statements and MindsDB dispatches them to the right handler.

How do Knowledge Bases work?

A Knowledge Base is a single CREATE statement that defines an embedding model and a set of metadata columns. MindsDB then handles chunking, vectorization, storage, and re-embedding on updates. You query the Knowledge Base like any table: semantic search via the content column, structured filters via metadata. It collapses a typical RAG stack into one SQL primitive.

Can I self-host MindsDB?

Yes. MindsDB is open-source under GPL-3.0 and runs on Linux, macOS, Windows, and Docker. The repo is at github.com/mindsdb/engine. Self-hosting puts you in control of where data flows; the trade-off is operating the runtime yourself rather than letting MindsHub manage it.

Which AI models does MindsDB support?

MindsDB is model-neutral. You can plug in Anthropic, OpenAI, Google, Hugging Face, or any self-hosted endpoint — both for embeddings inside Knowledge Bases and for LLM-backed queries. Switching providers is a configuration change, not a rewrite.

Where can I find the documentation?

Docs are at docs.mindsdb.com/mindsdb. Source is at github.com/mindsdb/engine. For the hosted product, see mindshub.ai or read /mindshub-vs-mindsdb for the rebrand story.

Looking for a hosted agent runtime?

MindsHub is a separate platform from the same team — open-source agents, model routing, a credentials vault, and tool access, ready to run.

Explore MindsHub → Or fork the query engine on GitHub ↗