diff --git a/resources/support-center/knowledge-base/general-faqs/vector-search.mdx b/resources/support-center/knowledge-base/general-faqs/vector-search.mdx index edf0628cd..078d4fd5d 100644 --- a/resources/support-center/knowledge-base/general-faqs/vector-search.mdx +++ b/resources/support-center/knowledge-base/general-faqs/vector-search.mdx @@ -21,44 +21,53 @@ The main advantages of using ClickHouse for vector search compared to using more Here is a quick tutorial on how to use ClickHouse for vector search. -## 1. Create embeddings {#1-create-embeddings} -Your data (documents, images, or structured data) must be converted to _embeddings_. We recommend creating embeddings using the [OpenAI Embeddings API](https://platform.openai.com/docs/api-reference/embeddings) or using the open-source Python library [SentenceTransformers](https://www.sbert.net/). + + + ## Create embeddings {#1-create-embeddings} -You can think of an embedding as a large array of floating-point numbers that represent your data. [Check out this guide from OpenAI to learn more about embeddings](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings). + Your data (documents, images, or structured data) must be converted to _embeddings_. We recommend creating embeddings using the [OpenAI Embeddings API](https://platform.openai.com/docs/api-reference/embeddings) or using the open-source Python library [SentenceTransformers](https://www.sbert.net/). -## 2. Store the embeddings {#2-store-the-embeddings} -Once you have generated embeddings, you need to store them in ClickHouse. Each embedding should be stored in a separate row and can include metadata for filtering, aggregations, or analytics. Here's an example of a table that can store images with captions: + You can think of an embedding as a large array of floating-point numbers that represent your data. [Check out this guide from OpenAI to learn more about embeddings](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings). + + + ## Store the embeddings {#2-store-the-embeddings} -```sql -CREATE TABLE images -( - `_file` LowCardinality(String), - `caption` String, - `image_embedding` Array(Float32) -) -ENGINE = MergeTree; -``` + Once you have generated embeddings, you need to store them in ClickHouse. Each embedding should be stored in a separate row and can include metadata for filtering, aggregations, or analytics. Here's an example of a table that can store images with captions: -## 3. Search for related embeddings {#3-search-for-related-embeddings} -Let's say you want to search for pictures of dogs in your dataset. You can use a distance function like `cosineDistance` to take an embedding of a dog image and search for related images: + ```sql + CREATE TABLE images + ( + `_file` LowCardinality(String), + `caption` String, + `image_embedding` Array(Float32) + ) + ENGINE = MergeTree; + ``` + + + ## Search for related embeddings {#3-search-for-related-embeddings} -```sql -SELECT - _file, - caption, - cosineDistance( - -- An embedding of your "input" dog picture - [0.5736801028251648, 0.2516217529773712, ..., -0.6825592517852783], - image_embedding - ) AS score -FROM images -ORDER BY score ASC -LIMIT 10 -``` + Let's say you want to search for pictures of dogs in your dataset. You can use a distance function like `cosineDistance` to take an embedding of a dog image and search for related images: -This query returns the `_file` names and `caption` of the top 10 images most likely to be related to your provided dog image. + ```sql + SELECT + _file, + caption, + cosineDistance( + -- An embedding of your "input" dog picture + [0.5736801028251648, 0.2516217529773712, ..., -0.6825592517852783], + image_embedding + ) AS score + FROM images + ORDER BY score ASC + LIMIT 10 + ``` -## Further Reading {#further-reading} + This query returns the `_file` names and `caption` of the top 10 images most likely to be related to your provided dog image. + + + +## Further reading {#further-reading} To follow a more in-depth tutorial on vector search using ClickHouse, please see: - [Exact and Approximate Vector Search](/reference/engines/table-engines/mergetree-family/annindexes)