-
Notifications
You must be signed in to change notification settings - Fork 6.1k
Beef up MEVD docs: expanded conceptual article, new how-to guide, and working code snippets #51846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
Copilot
wants to merge
7
commits into
main
Choose a base branch
from
copilot/beef-up-mevd-docs
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+5,227
−52
Draft
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
ffd718c
Initial plan
Copilot 424793f
Beef up MEVD docs: expand vector-databases.md, add use-vector-stores …
Copilot fe8c80c
Port SK vector store docs: attribute params tables, auto-embedding, h…
Copilot 1ddb6a1
Address review feedback: xref links, ai-usage frontmatter, remove .gi…
Copilot ab4c26a
Add xref links to attribute parameters in tables, GetCollection, Upse…
Copilot 21b049a
Apply suggestions from code review
gewarren cd8eada
docs port in progress...
gewarren File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file was deleted.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,117 @@ | ||
| --- | ||
| title: Define your Vector Store data model | ||
| description: Describes how to create a data model with Microsoft.Extensions.VectorData to use when writing to or reading from a Vector Store. | ||
| ms.topic: reference | ||
| ms.date: 07/08/2024 | ||
| --- | ||
| # Define your data model | ||
|
|
||
| ## Overview | ||
|
|
||
| The Vector Store connectors use a model-first approach to interacting with databases. | ||
|
|
||
| All methods to upsert or get records use strongly typed model classes. | ||
| The properties on these classes are decorated with attributes that indicate the purpose of each property. | ||
|
|
||
| > [!TIP] | ||
| > For an alternative to using attributes, see [defining your schema with a record definition](./schema-with-record-definition.md). | ||
| > [!TIP] | ||
| > For an alternative to defining your own data model, see [using Vector Store abstractions without defining your own data model](./generic-data-model.md). | ||
|
|
||
| Here is an example of a model that is decorated with these attributes. | ||
|
|
||
| ```csharp | ||
| using Microsoft.Extensions.VectorData; | ||
|
|
||
| public class Hotel | ||
| { | ||
| [VectorStoreKey] | ||
| public ulong HotelId { get; set; } | ||
|
|
||
| [VectorStoreData(IsIndexed = true)] | ||
| public string HotelName { get; set; } | ||
|
|
||
| [VectorStoreData(IsFullTextIndexed = true)] | ||
| public string Description { get; set; } | ||
|
|
||
| [VectorStoreVector(Dimensions: 4, DistanceFunction = DistanceFunction.CosineSimilarity, IndexKind = IndexKind.Hnsw)] | ||
| public ReadOnlyMemory<float>? DescriptionEmbedding { get; set; } | ||
|
|
||
| [VectorStoreData(IsIndexed = true)] | ||
| public string[] Tags { get; set; } | ||
| } | ||
| ``` | ||
|
|
||
| ## Attributes | ||
|
|
||
| ### VectorStoreKeyAttribute | ||
|
|
||
| Use the <xref:Microsoft.Extensions.VectorData.VectorStoreKeyAttribute> attribute to indicate that your property is the key of the record. | ||
|
|
||
| ```csharp | ||
| [VectorStoreKey] | ||
| public ulong HotelId { get; set; } | ||
| ``` | ||
|
|
||
| #### VectorStoreKeyAttribute parameters | ||
|
|
||
| | Parameter | Required | Description | | ||
| |---------------|:--------:|-------------| | ||
| | <xref:Microsoft.Extensions.VectorData.VectorStoreKeyAttribute.StorageName> | No | Can be used to supply an alternative name for the property in the database. This parameter isn't supported by all connectors, for example, where alternatives like `JsonPropertyNameAttribute` are supported. | | ||
|
|
||
| ### VectorStoreDataAttribute | ||
|
|
||
| Use the <xref:Microsoft.Extensions.VectorData.VectorStoreDataAttribute> attribute to indicate that your property contains general data that is not a key or a vector. | ||
|
|
||
| ```csharp | ||
| [VectorStoreData(IsIndexed = true)] | ||
| public string HotelName { get; set; } | ||
| ``` | ||
|
|
||
| #### VectorStoreDataAttribute parameters | ||
|
|
||
| | Parameter | Required | Description | | ||
| |-------------|:--------:|-------------| | ||
| | <xref:Microsoft.Extensions.VectorData.VectorStoreDataAttribute.IsIndexed> | No | Indicates whether the property should be indexed for filtering in cases where a database requires opting in to indexing per property. The default is `false`. | | ||
| | <xref:Microsoft.Extensions.VectorData.VectorStoreDataAttribute.IsFullTextIndexed> | No | Indicates whether the property should be indexed for full text search for databases that support full text search. The default is `false`. | | ||
| | <xref:Microsoft.Extensions.VectorData.VectorStoreDataAttribute.StorageName> | No | Can be used to supply an alternative name for the property in the database. This parameter is not supported by all connectors, for example, where alternatives like `JsonPropertyNameAttribute` are supported. | | ||
|
|
||
| > [!TIP] | ||
| > For more information on which connectors support <xref:Microsoft.Extensions.VectorData.VectorStoreDataAttribute.StorageName> and what alternatives are available, see [the documentation for each connector](./out-of-the-box-connectors/index.md). | ||
|
|
||
| ### VectorStoreVectorAttribute | ||
|
|
||
| Use the <xref:Microsoft.Extensions.VectorData.VectorStoreVectorAttribute> attribute to indicate that your property contains a vector. | ||
|
|
||
| ```csharp | ||
| [VectorStoreVector(Dimensions: 4, DistanceFunction = DistanceFunction.CosineSimilarity, IndexKind = IndexKind.Hnsw)] | ||
| public ReadOnlyMemory<float>? DescriptionEmbedding { get; set; } | ||
| ``` | ||
|
|
||
| It's also possible to use <xref:Microsoft.Extensions.VectorData.VectorStoreVectorAttribute> on properties that dont' have a vector type, for example, a property of type `string`. | ||
| When a property is decorated in this way, you need to provide an <xref:Microsoft.Extensions.AI.IEmbeddingGenerator> instance to the vector store. | ||
| When upserting the record, the text that is in the `string` property is automatically converted and stored as a vector in the database. | ||
| It's not possible to retrieve a vector using this mechanism. | ||
|
|
||
| ```csharp | ||
| [VectorStoreVector(Dimensions: 4, DistanceFunction = DistanceFunction.CosineSimilarity, IndexKind = IndexKind.Hnsw)] | ||
| public string DescriptionEmbedding { get; set; } | ||
| ``` | ||
|
|
||
| > [!TIP] | ||
| > For more information on how to use built-in embedding generation, see [Let the Vector Store generate embeddings](./embedding-generation.md#letting-the-vector-store-generate-embeddings). | ||
|
|
||
| #### VectorStoreVectorAttribute parameters | ||
|
|
||
| | Parameter | Required | Description | | ||
| |------------|:--------:|-------------| | ||
| | `Dimensions` | Yes | The number of dimensions that the vector has. This is required when creating a vector index for a collection. | | ||
| | <xref:Microsoft.Extensions.VectorData.IndexKind> | No | The type of index to index the vector with. Default varies by vector store type. | | ||
| | <xref:Microsoft.Extensions.VectorData.DistanceFunction> | No | The type of function to use when doing vector comparison during vector search over this vector. Default varies by vector store type. | | ||
| | <xref:Microsoft.Extensions.VectorData.VectorStoreDataAttribute.StorageName> | No | Can be used to supply an alternative name for the property in the database. This parameter is not supported by all connectors, for example, where alternatives like `JsonPropertyNameAttribute` is supported. | | ||
|
|
||
| Common index kinds and distance function types are supplied as static values on the <xref:Microsoft.Extensions.VectorData.IndexKind> and <xref:Microsoft.Extensions.VectorData.DistanceFunction> classes. | ||
| Individual Vector Store implementations might also use their own index kinds and distance functions, where the database supports unusual types. | ||
|
|
||
| > [!TIP] | ||
| > For more information on which connectors support <xref:Microsoft.Extensions.VectorData.VectorStoreDataAttribute.StorageName> and what alternatives are available, see [the documentation for each connector](./out-of-the-box-connectors/index.md). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,77 @@ | ||
| --- | ||
| title: Using Vector Store abstractions without defining your own data model | ||
| description: Describes how to use Vector Store abstractions without defining your own data model. | ||
| ms.topic: reference | ||
| ms.date: 10/16/2024 | ||
| --- | ||
| # Use Vector Store abstractions without defining your own data model | ||
|
|
||
| The Vector Store connectors use a model-first approach to interact with databases. This makes using the connectors easy and simple, since | ||
| your data model reflects the schema of your database records and to add any additional schema information required, you can simply add attributes to your data model properties. | ||
|
|
||
| However, there are cases where it isn't desirable or possible to define your own data model. For example, imagine that you don't know at compile time what your | ||
| database schema looks like, and the schema is only provided via configuration. Creating a data model that reflects the schema would be impossible in this case. | ||
|
|
||
| In this case, you can use a `Dictionary<string, object?>` for the record type. Properties are added to the `Dictionary` with key as the property name and the value as the property value. | ||
|
|
||
| ## Supply schema information when using `Dictionary` | ||
|
|
||
| When using a `Dictionary`, connectors still need to know what the database schema looks like. Without the schema information | ||
| the connector would not be able to create a collection, or know how to map to and from the storage representation that each database uses. | ||
|
|
||
| A record definition can be used to provide the schema information. Unlike a data model, a record definition can be created from configuration at runtime, providing a solution for when schema information is not known at compile time. | ||
|
|
||
| > [!TIP] | ||
| > To see how to create a record definition, see [defining your schema with a record definition](./schema-with-record-definition.md). | ||
|
|
||
| ## Example | ||
|
|
||
| To use the `Dictionary` with a connector, simply specify it as your data model when creating a collection, and simultaneously provide a record definition. | ||
|
|
||
| ```csharp | ||
| // Create the definition to define the schema. | ||
| VectorStoreCollectionDefinition definition = new() | ||
| { | ||
| Properties = new List<VectorStoreProperty> | ||
| { | ||
| new VectorStoreKeyProperty("Key", typeof(string)), | ||
| new VectorStoreDataProperty("Term", typeof(string)), | ||
| new VectorStoreDataProperty("Definition", typeof(string)), | ||
| new VectorStoreVectorProperty("DefinitionEmbedding", typeof(ReadOnlyMemory<float>), dimensions: 1536) | ||
| } | ||
| }; | ||
|
|
||
| // When getting your collection instance from a vector store instance | ||
| // specify the Dictionary, using object as the key type for your database | ||
| // and also pass your record definition. | ||
| // Note that you have to use GetDynamicCollection instead of the regular GetCollection method | ||
| // to get an instance of a collection using Dictionary<string, object?>. | ||
| var dynamicDataModelCollection = vectorStore.GetDynamicCollection( | ||
| "glossary", | ||
| definition); | ||
|
|
||
| // Since schema information is available from the record definition | ||
| // it's possible to create a collection with the right vectors, | ||
| // dimensions, indexes, and distance functions. | ||
| await dynamicDataModelCollection.EnsureCollectionExistsAsync(); | ||
|
|
||
| // When retrieving a record from the collection, key, data, and vector values can | ||
| // now be accessed via the dictionary entries. | ||
| var record = await dynamicDataModelCollection.GetAsync("SK"); | ||
| Console.WriteLine(record["Definition"]); | ||
| ``` | ||
|
|
||
| When constructing a collection instance directly, the record definition | ||
| is passed as an option. For example, here is an example of constructing | ||
| an Azure AI Search collection instance with `Dictionary`. | ||
|
|
||
| Each vector store collection implementation has a separate `*DynamicCollection` | ||
| class that can be used with `Dictionary<string, object?>`. | ||
| This is because these implementations might support NativeAOT/trimming. | ||
|
|
||
| ```csharp | ||
| new AzureAISearchDynamicCollection( | ||
| searchIndexClient, | ||
| "glossary", | ||
| new() { Definition = definition }); | ||
| ``` |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.