From 04e370661d5b90c3be6f51d065d8147e8d8f6f15 Mon Sep 17 00:00:00 2001 From: "mintlify[bot]" <109931778+mintlify[bot]@users.noreply.github.com> Date: Mon, 29 Jun 2026 21:35:43 +0000 Subject: [PATCH 1/3] refactor: use Steps component on vector-search KB page --- .../general-faqs/vector-search.mdx | 69 ++++++++++--------- 1 file changed, 36 insertions(+), 33 deletions(-) diff --git a/resources/support-center/knowledge-base/general-faqs/vector-search.mdx b/resources/support-center/knowledge-base/general-faqs/vector-search.mdx index edf0628cd..a3339e037 100644 --- a/resources/support-center/knowledge-base/general-faqs/vector-search.mdx +++ b/resources/support-center/knowledge-base/general-faqs/vector-search.mdx @@ -21,44 +21,47 @@ The main advantages of using ClickHouse for vector search compared to using more Here is a quick tutorial on how to use ClickHouse for vector search. -## 1. Create embeddings {#1-create-embeddings} -Your data (documents, images, or structured data) must be converted to _embeddings_. We recommend creating embeddings using the [OpenAI Embeddings API](https://platform.openai.com/docs/api-reference/embeddings) or using the open-source Python library [SentenceTransformers](https://www.sbert.net/). + + + Your data (documents, images, or structured data) must be converted to _embeddings_. We recommend creating embeddings using the [OpenAI Embeddings API](https://platform.openai.com/docs/api-reference/embeddings) or using the open-source Python library [SentenceTransformers](https://www.sbert.net/). -You can think of an embedding as a large array of floating-point numbers that represent your data. [Check out this guide from OpenAI to learn more about embeddings](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings). + You can think of an embedding as a large array of floating-point numbers that represent your data. [Check out this guide from OpenAI to learn more about embeddings](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings). + + + Once you have generated embeddings, you need to store them in ClickHouse. Each embedding should be stored in a separate row and can include metadata for filtering, aggregations, or analytics. Here's an example of a table that can store images with captions: -## 2. Store the embeddings {#2-store-the-embeddings} -Once you have generated embeddings, you need to store them in ClickHouse. Each embedding should be stored in a separate row and can include metadata for filtering, aggregations, or analytics. Here's an example of a table that can store images with captions: + ```sql + CREATE TABLE images + ( + `_file` LowCardinality(String), + `caption` String, + `image_embedding` Array(Float32) + ) + ENGINE = MergeTree; + ``` + + + Let's say you want to search for pictures of dogs in your dataset. You can use a distance function like `cosineDistance` to take an embedding of a dog image and search for related images: -```sql -CREATE TABLE images -( - `_file` LowCardinality(String), - `caption` String, - `image_embedding` Array(Float32) -) -ENGINE = MergeTree; -``` + ```sql + SELECT + _file, + caption, + cosineDistance( + -- An embedding of your "input" dog picture + [0.5736801028251648, 0.2516217529773712, ..., -0.6825592517852783], + image_embedding + ) AS score + FROM images + ORDER BY score ASC + LIMIT 10 + ``` -## 3. Search for related embeddings {#3-search-for-related-embeddings} -Let's say you want to search for pictures of dogs in your dataset. You can use a distance function like `cosineDistance` to take an embedding of a dog image and search for related images: + This query returns the `_file` names and `caption` of the top 10 images most likely to be related to your provided dog image. + + -```sql -SELECT - _file, - caption, - cosineDistance( - -- An embedding of your "input" dog picture - [0.5736801028251648, 0.2516217529773712, ..., -0.6825592517852783], - image_embedding - ) AS score -FROM images -ORDER BY score ASC -LIMIT 10 -``` - -This query returns the `_file` names and `caption` of the top 10 images most likely to be related to your provided dog image. - -## Further Reading {#further-reading} +## Further reading {#further-reading} To follow a more in-depth tutorial on vector search using ClickHouse, please see: - [Exact and Approximate Vector Search](/reference/engines/table-engines/mergetree-family/annindexes) From 1063ada978a500cef338dd6c65556bca6287c4b9 Mon Sep 17 00:00:00 2001 From: "mintlify[bot]" <109931778+mintlify[bot]@users.noreply.github.com> Date: Mon, 29 Jun 2026 21:43:19 +0000 Subject: [PATCH 2/3] refactor: wrap existing headings with Step, drop numbering, keep anchors --- .../knowledge-base/general-faqs/vector-search.mdx | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/resources/support-center/knowledge-base/general-faqs/vector-search.mdx b/resources/support-center/knowledge-base/general-faqs/vector-search.mdx index a3339e037..078d4fd5d 100644 --- a/resources/support-center/knowledge-base/general-faqs/vector-search.mdx +++ b/resources/support-center/knowledge-base/general-faqs/vector-search.mdx @@ -22,12 +22,16 @@ The main advantages of using ClickHouse for vector search compared to using more Here is a quick tutorial on how to use ClickHouse for vector search. - + + ## Create embeddings {#1-create-embeddings} + Your data (documents, images, or structured data) must be converted to _embeddings_. We recommend creating embeddings using the [OpenAI Embeddings API](https://platform.openai.com/docs/api-reference/embeddings) or using the open-source Python library [SentenceTransformers](https://www.sbert.net/). You can think of an embedding as a large array of floating-point numbers that represent your data. [Check out this guide from OpenAI to learn more about embeddings](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings). - + + ## Store the embeddings {#2-store-the-embeddings} + Once you have generated embeddings, you need to store them in ClickHouse. Each embedding should be stored in a separate row and can include metadata for filtering, aggregations, or analytics. Here's an example of a table that can store images with captions: ```sql @@ -40,7 +44,9 @@ Here is a quick tutorial on how to use ClickHouse for vector search. ENGINE = MergeTree; ``` - + + ## Search for related embeddings {#3-search-for-related-embeddings} + Let's say you want to search for pictures of dogs in your dataset. You can use a distance function like `cosineDistance` to take an embedding of a dog image and search for related images: ```sql From 793544290b5da17c0fc8317f5d66356101c90152 Mon Sep 17 00:00:00 2001 From: "mintlify[bot]" <109931778+mintlify[bot]@users.noreply.github.com> Date: Tue, 30 Jun 2026 08:21:07 +0000 Subject: [PATCH 3/3] fix: use Step title prop and unindent markdown in vector-search Steps --- .../general-faqs/vector-search.mdx | 86 +++++++++---------- 1 file changed, 43 insertions(+), 43 deletions(-) diff --git a/resources/support-center/knowledge-base/general-faqs/vector-search.mdx b/resources/support-center/knowledge-base/general-faqs/vector-search.mdx index 078d4fd5d..70abfa34a 100644 --- a/resources/support-center/knowledge-base/general-faqs/vector-search.mdx +++ b/resources/support-center/knowledge-base/general-faqs/vector-search.mdx @@ -22,49 +22,49 @@ The main advantages of using ClickHouse for vector search compared to using more Here is a quick tutorial on how to use ClickHouse for vector search. - - ## Create embeddings {#1-create-embeddings} - - Your data (documents, images, or structured data) must be converted to _embeddings_. We recommend creating embeddings using the [OpenAI Embeddings API](https://platform.openai.com/docs/api-reference/embeddings) or using the open-source Python library [SentenceTransformers](https://www.sbert.net/). - - You can think of an embedding as a large array of floating-point numbers that represent your data. [Check out this guide from OpenAI to learn more about embeddings](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings). - - - ## Store the embeddings {#2-store-the-embeddings} - - Once you have generated embeddings, you need to store them in ClickHouse. Each embedding should be stored in a separate row and can include metadata for filtering, aggregations, or analytics. Here's an example of a table that can store images with captions: - - ```sql - CREATE TABLE images - ( - `_file` LowCardinality(String), - `caption` String, - `image_embedding` Array(Float32) - ) - ENGINE = MergeTree; - ``` - - - ## Search for related embeddings {#3-search-for-related-embeddings} - - Let's say you want to search for pictures of dogs in your dataset. You can use a distance function like `cosineDistance` to take an embedding of a dog image and search for related images: - - ```sql - SELECT - _file, - caption, - cosineDistance( - -- An embedding of your "input" dog picture - [0.5736801028251648, 0.2516217529773712, ..., -0.6825592517852783], - image_embedding - ) AS score - FROM images - ORDER BY score ASC - LIMIT 10 - ``` - - This query returns the `_file` names and `caption` of the top 10 images most likely to be related to your provided dog image. - + + +Your data (documents, images, or structured data) must be converted to _embeddings_. We recommend creating embeddings using the [OpenAI Embeddings API](https://platform.openai.com/docs/api-reference/embeddings) or using the open-source Python library [SentenceTransformers](https://www.sbert.net/). + +You can think of an embedding as a large array of floating-point numbers that represent your data. [Check out this guide from OpenAI to learn more about embeddings](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings). + + + + +Once you have generated embeddings, you need to store them in ClickHouse. Each embedding should be stored in a separate row and can include metadata for filtering, aggregations, or analytics. Here's an example of a table that can store images with captions: + +```sql +CREATE TABLE images +( + `_file` LowCardinality(String), + `caption` String, + `image_embedding` Array(Float32) +) +ENGINE = MergeTree; +``` + + + + +Let's say you want to search for pictures of dogs in your dataset. You can use a distance function like `cosineDistance` to take an embedding of a dog image and search for related images: + +```sql +SELECT + _file, + caption, + cosineDistance( + -- An embedding of your "input" dog picture + [0.5736801028251648, 0.2516217529773712, ..., -0.6825592517852783], + image_embedding + ) AS score +FROM images +ORDER BY score ASC +LIMIT 10 +``` + +This query returns the `_file` names and `caption` of the top 10 images most likely to be related to your provided dog image. + + ## Further reading {#further-reading}