Conversation
|
Hi, I’m Jit, a friendly security platform designed to help developers build secure applications from day zero with an MVS (Minimal viable security) mindset. In case there are security findings, they will be communicated to you as a comment inside the PR. Hope you’ll enjoy using Jit. Questions? Comments? Want to learn more? Get in touch with us. |
There was a problem hiding this comment.
Pull request overview
This PR introduces SQL query support to redisvl by adding a new SQLQuery class that translates SQL-like queries into Redis FT.SEARCH and FT.AGGREGATE commands. The implementation wraps the external sql-redis package to provide a more familiar SQL interface for querying Redis.
Changes:
- Added SQLQuery class that translates SQL SELECT statements to Redis queries
- Integrated SQLQuery execution into SearchIndex.query() method
- Added comprehensive integration tests covering various SQL operators and aggregations
- Created documentation notebook demonstrating SQL query usage
Reviewed changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| pyproject.toml | Adds sql-redis as optional dependency for [sql] extra |
| uv.lock | Updates lock file with sql-redis and sqlglot dependencies |
| redisvl/query/sql.py | New SQLQuery class implementation with redis_query_string() method |
| redisvl/query/init.py | Exports SQLQuery from redisvl.query module |
| redisvl/index/index.py | Adds _sql_query() method and integrates SQLQuery into query() dispatcher |
| tests/integration/test_sql_redis.py | Comprehensive integration tests for SQL query functionality |
| docs/user_guide/12_sql_to_redis_queries.ipynb | New user guide notebook demonstrating SQL query usage |
| docs/user_guide/02_hybrid_queries.ipynb | Unintentional execution artifacts from re-running cells |
Comments suppressed due to low confidence (1)
redisvl/index/index.py:1150
- The docstring for the query method still references the old type hint "Union[BaseQuery, AggregateQuery, HybridQuery]" in the Args section (line 1150), but the actual type hint now includes SQLQuery. Update the docstring to reflect that SQLQuery is also accepted.
"""Execute a query on the index.
This method takes a BaseQuery, AggregationQuery, or HybridQuery object directly, and
handles post-processing of the search.
Args:
query (Union[BaseQuery, AggregateQuery, HybridQuery]): The query to run.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| """Execute a query on the index. | ||
|
|
There was a problem hiding this comment.
The docstring for the query method states "This method takes a BaseQuery, AggregationQuery, or HybridQuery object directly" but should also mention SQLQuery since it's now a supported query type. Additionally, the Args section references the old type hint without SQLQuery.
| # Substitute non-bytes params in SQL before translation | ||
| sql = self.sql | ||
| for key, value in self.params.items(): | ||
| placeholder = f":{key}" | ||
| if isinstance(value, (int, float)): | ||
| sql = sql.replace(placeholder, str(value)) | ||
| elif isinstance(value, str): | ||
| sql = sql.replace(placeholder, f"'{value}'") | ||
| # bytes (vectors) are handled separately |
There was a problem hiding this comment.
The parameter substitution logic only handles int, float, and str types, but doesn't handle bytes (vectors) which are mentioned in the comment. However, looking at the tests, bytes parameters like vectors are passed in the params dict and seem to work. This suggests sql-redis handles bytes parameters internally. Consider adding a code comment explaining that bytes parameters are passed through to sql-redis's translate method rather than being substituted in the SQL string, to clarify the intent.
| # Decode key if bytes | ||
| str_key = key.decode("utf-8") if isinstance(key, bytes) else key | ||
| # Decode value if bytes | ||
| str_value = value.decode("utf-8") if isinstance(value, bytes) else value |
There was a problem hiding this comment.
The decoding logic assumes all bytes values should be decoded to strings. However, this may incorrectly decode values that should remain as bytes (e.g., binary data, vector embeddings). Consider checking if the value is actually a UTF-8 string before decoding, or preserve the original type for non-string data. You could wrap the decode in a try-except to handle non-UTF-8 bytes gracefully.
| # Decode key if bytes | |
| str_key = key.decode("utf-8") if isinstance(key, bytes) else key | |
| # Decode value if bytes | |
| str_value = value.decode("utf-8") if isinstance(value, bytes) else value | |
| # Decode key if bytes, but preserve non-UTF-8 bytes | |
| if isinstance(key, bytes): | |
| try: | |
| str_key = key.decode("utf-8") | |
| except UnicodeDecodeError: | |
| str_key = key | |
| else: | |
| str_key = key | |
| # Decode value if bytes, but preserve non-UTF-8 bytes | |
| if isinstance(value, bytes): | |
| try: | |
| str_value = value.decode("utf-8") | |
| except UnicodeDecodeError: | |
| str_value = value | |
| else: | |
| str_value = value |
| " MAX(age) as max_age,\n", | ||
| " AVG(age) as avg_age,\n", | ||
| " STDEV(age) as std_age,\n", | ||
| " FIRST_VALUE(age) as fist_value_age,\n", |
There was a problem hiding this comment.
Typo in the alias name: "fist_value_age" should be "first_value_age" (missing 'r' in 'first'). This appears in both the SQL query and the output, suggesting it's consistently misspelled throughout.
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "# Butm the index is still in place\n", |
There was a problem hiding this comment.
Typo in comment: "Butm" should be "But" (extra 'm' at the end).
| "# Butm the index is still in place\n", | |
| "# But the index is still in place\n", |
| "# await index.clear()" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": 19, | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "# Butm the index is still in place\n", | ||
| "# await index.exists()" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": 20, | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "# Remove / delete the index in its entirety\n", | ||
| "# await index.delete()" |
There was a problem hiding this comment.
The commented cleanup code uses await keywords (async syntax) but this notebook doesn't appear to be using async/await elsewhere. The SearchIndex methods like clear(), exists(), and delete() are synchronous methods, not async. Remove the await keywords from these commented lines.
pyproject.toml
Outdated
| "pillow>=11.3.0", | ||
| ] | ||
| sql = [ | ||
| "sql-redis @ file:///Users/robert.shelton/Documents/sql-redis/dist/sql_redis-0.1.0-py3-none-any.whl", |
There was a problem hiding this comment.
The sql-redis dependency is using an absolute local file path specific to a developer's machine (/Users/robert.shelton/Documents/sql-redis/...). This will break on other systems and in CI/CD environments. According to the PR description, this is blocked on a release/update of sql-redis package. Before merging, this should be changed to reference a published version on PyPI (e.g., "sql-redis>=0.1.0").
a7d4e1c to
7f3f5ff
Compare
| :exclude-members: add_filter,get_args,highlight,return_field,summarize | ||
|
|
||
|
|
||
| SQLQuery |
There was a problem hiding this comment.
check this example it's the most conclusive
| [project] | ||
| name = "redisvl" | ||
| version = "0.13.2" | ||
| version = "0.14.0" |
There was a problem hiding this comment.
I think this would qualify as minor release
There was a problem hiding this comment.
test file is what large comment is base off of
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 13 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Get or create Redis client | ||
| if redis_client is None: | ||
| from redis import Redis | ||
|
|
||
| redis_client = Redis.from_url(redis_url) | ||
|
|
||
| # Load schemas from Redis | ||
| registry = SchemaRegistry(redis_client) | ||
| registry.load_all() | ||
|
|
||
| # Translate SQL to Redis command | ||
| translator = Translator(registry) | ||
|
|
||
| # Substitute non-bytes params in SQL before translation | ||
| sql = self.sql | ||
| for key, value in self.params.items(): | ||
| placeholder = f":{key}" | ||
| if isinstance(value, (int, float)): | ||
| sql = sql.replace(placeholder, str(value)) | ||
| elif isinstance(value, str): | ||
| sql = sql.replace(placeholder, f"'{value}'") | ||
| # bytes (vectors) are handled separately | ||
|
|
||
| translated = translator.translate(sql) | ||
| return translated.to_command_string() |
There was a problem hiding this comment.
The method creates a temporary Redis client if redis_client is None, but never closes it. This can lead to resource leaks, especially when this method is called multiple times. The Redis connection should be properly closed after use, for example using a context manager or try-finally block.
| # Get or create Redis client | |
| if redis_client is None: | |
| from redis import Redis | |
| redis_client = Redis.from_url(redis_url) | |
| # Load schemas from Redis | |
| registry = SchemaRegistry(redis_client) | |
| registry.load_all() | |
| # Translate SQL to Redis command | |
| translator = Translator(registry) | |
| # Substitute non-bytes params in SQL before translation | |
| sql = self.sql | |
| for key, value in self.params.items(): | |
| placeholder = f":{key}" | |
| if isinstance(value, (int, float)): | |
| sql = sql.replace(placeholder, str(value)) | |
| elif isinstance(value, str): | |
| sql = sql.replace(placeholder, f"'{value}'") | |
| # bytes (vectors) are handled separately | |
| translated = translator.translate(sql) | |
| return translated.to_command_string() | |
| created_client = False | |
| # Get or create Redis client | |
| if redis_client is None: | |
| from redis import Redis | |
| redis_client = Redis.from_url(redis_url) | |
| created_client = True | |
| try: | |
| # Load schemas from Redis | |
| registry = SchemaRegistry(redis_client) | |
| registry.load_all() | |
| # Translate SQL to Redis command | |
| translator = Translator(registry) | |
| # Substitute non-bytes params in SQL before translation | |
| sql = self.sql | |
| for key, value in self.params.items(): | |
| placeholder = f":{key}" | |
| if isinstance(value, (int, float)): | |
| sql = sql.replace(placeholder, str(value)) | |
| elif isinstance(value, str): | |
| sql = sql.replace(placeholder, f"'{value}'") | |
| # bytes (vectors) are handled separately | |
| translated = translator.translate(sql) | |
| return translated.to_command_string() | |
| finally: | |
| if created_client and redis_client is not None: | |
| # Only close clients created within this method | |
| redis_client.close() |
| # Decode bytes to strings in the results (Redis may return bytes) | ||
| decoded_rows = [] | ||
| for row in result.rows: | ||
| decoded_row = {} | ||
| for key, value in row.items(): | ||
| # Decode key if bytes | ||
| str_key = key.decode("utf-8") if isinstance(key, bytes) else key | ||
| # Decode value if bytes | ||
| str_value = value.decode("utf-8") if isinstance(value, bytes) else value | ||
| decoded_row[str_key] = str_value |
There was a problem hiding this comment.
The decoding logic for result values only handles bytes-to-string conversion, but doesn't preserve other types. If a value is a list (like in TOLIST/ARRAY_AGG results shown in tests), it will remain as a list potentially containing bytes elements. The code should recursively decode bytes within lists and other nested structures, or document that certain result types may contain raw bytes.
| Requires the optional `sql-redis` package. Install with: | ||
| ``pip install redisvl[sql]`` |
There was a problem hiding this comment.
The documentation string states to install with pip install redisvl[sql], but the pyproject.toml defines the extra as sql-redis. This inconsistency will cause user confusion. The install command should be pip install redisvl[sql-redis] to match the extras definition.
| raise ImportError( | ||
| "sql-redis is required for SQL query support. " | ||
| "Install it with: pip install redisvl[sql]" |
There was a problem hiding this comment.
The error message states to install with pip install redisvl[sql], but this is inconsistent with the extras definition sql-redis in pyproject.toml. The error message should say pip install redisvl[sql-redis] to match the defined extra name.
| class SQLQuery: | ||
| """A query class that translates SQL-like syntax into Redis queries. | ||
|
|
||
| This class allows users to write SQL SELECT statements that are | ||
| automatically translated into Redis FT.SEARCH or FT.AGGREGATE commands. | ||
|
|
||
| .. code-block:: python | ||
|
|
||
| from redisvl.query import SQLQuery | ||
| from redisvl.index import SearchIndex | ||
|
|
||
| index = SearchIndex.from_existing("products", redis_url="redis://localhost:6379") | ||
|
|
||
| sql_query = SQLQuery(''' | ||
| SELECT title, price, category | ||
| FROM products | ||
| WHERE category = 'electronics' AND price < 100 | ||
| ''') | ||
|
|
||
| results = index.query(sql_query) | ||
|
|
||
| Note: | ||
| Requires the optional `sql-redis` package. Install with: | ||
| ``pip install redisvl[sql]`` | ||
| """ | ||
|
|
||
| def __init__(self, sql: str, params: Optional[Dict[str, Any]] = None): | ||
| """Initialize a SQLQuery. | ||
|
|
||
| Args: | ||
| sql: The SQL SELECT statement to execute. | ||
| params: Optional dictionary of parameters for parameterized queries. | ||
| Useful for passing vector data for similarity searches. | ||
| """ | ||
| self.sql = sql | ||
| self.params = params or {} |
There was a problem hiding this comment.
The SQLQuery class does not inherit from BaseQuery, unlike other query classes in the codebase (VectorQuery, FilterQuery, etc.). This breaks consistency with the query class hierarchy. Additionally, the _sql_query method in SearchIndex.query() doesn't validate or process the query through the standard query pipeline, which could lead to inconsistent behavior. Consider either: 1) making SQLQuery inherit from BaseQuery if possible, or 2) documenting why this design decision was made and ensuring the interface is compatible with the expected query contract.
Spec for SQLQuery class
Make sql-like commands available to be translated into Redis queries via redisvl to cut down on syntax overhead for engineers.
Ex:
This code would then produce the equivalent redis query to be executed against the database:
Expose a method on the object:
.redis_query_string()such that you can easily inspect the resulting redis query constructed from SQLQuery class invocation.Ex:
Packaging and dependencies
In order to use the
SQLQueryclass, user will have to install the optional dependency on sql-redis. This can be accomplished with the commandpip install redisvl[sql-redis].Tested/supported operators
SELECT title, category FROM {index} WHERE category = 'electronics'FT.SEARCH test_index "@category:{electronics}" RETURN 2 title categorySELECT title, category FROM {index} WHERE category != 'electronics'FT.SEARCH test_index "-@category:{electronics}" RETURN 2 title categorySELECT title, category FROM {index} WHERE category IN ('books', 'accessories')FT.SEARCH test_index "@category:{books|accessories}" RETURN 2 title categorySELECT title, price FROM {index} WHERE price > 100FT.SEARCH test_index "@price:[(100 +inf]" RETURN 2 title priceSELECT title, price FROM {index} WHERE price >= 25 AND price <= 50FT.SEARCH test_index "@price:[25 +inf] @price:[-inf 50]" RETURN 2 title priceSELECT title, price FROM {index} WHERE price = 45FT.SEARCH test_index "@price:[45 45]" RETURN 2 title priceSELECT title, price FROM {index} WHERE price != 45FT.SEARCH test_index "-@price:[45 45]" RETURN 2 title priceSELECT title, price FROM {index} WHERE price < 50FT.SEARCH test_index "@price:[-inf (50]" RETURN 2 title priceSELECT title, price FROM {index} WHERE price >= 25 AND price <= 50FT.SEARCH test_index "@price:[25 +inf] @price:[-inf 50]" RETURN 2 title priceSELECT title, price FROM {index} WHERE price BETWEEN 40 AND 60FT.SEARCH test_index "@price:[40 60]" RETURN 2 title priceSELECT title, name FROM {index} WHERE title = 'laptop'FT.SEARCH test_index "@title:laptop" RETURN 2 title nameSELECT title, name FROM {index} WHERE title != 'laptop'FT.SEARCH test_index "-@title:laptop" RETURN 2 title nameSELECT title, name FROM {index} WHERE title = 'lap*'FT.SEARCH test_index "@title:lap*" RETURN 2 title nameSELECT title, name FROM {index} WHERE name = '*book'FT.SEARCH test_index "@name:*book" RETURN 2 title nameSELECT title, name FROM {index} WHERE title = '%laptap%'FT.SEARCH test_index "@title:%laptap%" RETURN 2 title nameSELECT title, name FROM {index} WHERE title = 'gaming laptop'FT.SEARCH test_index "@title:gaming laptop" RETURN 2 title nameSELECT title, name FROM {index} WHERE title = 'laptop and keyboard'FT.SEARCH test_index "@title:laptop keyboard" RETURN 2 title nameSELECT title, name FROM {index} WHERE title IN ('Python', 'Redis')SELECT title, vector_distance(embedding, :vec) AS score FROM {index} LIMIT 3FT.SEARCH test_index "*=>[KNN 3 @embedding $vector AS score]" PARAMS 2 vector $vector DIALECT 2 RETURN 2 title score LIMIT 0 3SELECT title, cosine_distance(embedding, :vec) AS vector_distance FROM {index} LIMIT 3FT.SEARCH test_index "*=>[KNN 3 @embedding $vector AS vector_distance]" PARAMS 2 vector $vector DIALECT 2 RETURN 2 title vector_distance LIMIT 0 3Tested/supported aggregation reducer functions
SELECT COUNT(*) as total FROM {index}FT.AGGREGATE test_index "*" GROUPBY 0 REDUCE COUNT 0 AS totalSELECT category, COUNT(*) as count FROM {index} GROUP BY categoryFT.AGGREGATE test_index "*" LOAD 1 category GROUPBY 1 @category REDUCE COUNT 0 AS countSELECT category, COUNT_DISTINCT(title) as unique_titles FROM {index} GROUP BY categoryFT.AGGREGATE test_index "*" LOAD 2 category title GROUPBY 1 @category REDUCE COUNT_DISTINCT 1 @title AS unique_titlesSELECT SUM(price) as total FROM {index}FT.AGGREGATE test_index "*" LOAD 1 price GROUPBY 0 REDUCE SUM 1 @price AS totalSELECT category, SUM(price) as total_price FROM {index} GROUP BY categoryFT.AGGREGATE test_index "*" LOAD 2 category price GROUPBY 1 @category REDUCE SUM 1 @price AS total_priceSELECT MIN(price) as min_price FROM {index}FT.AGGREGATE test_index "*" LOAD 1 price GROUPBY 0 REDUCE MIN 1 @price AS min_priceSELECT category, MIN(price) as min_price FROM {index} GROUP BY categoryFT.AGGREGATE test_index "*" LOAD 2 category price GROUPBY 1 @category REDUCE MIN 1 @price AS min_priceSELECT MAX(price) as max_price FROM {index}FT.AGGREGATE test_index "*" LOAD 1 price GROUPBY 0 REDUCE MAX 1 @price AS max_priceSELECT category, MAX(price) as max_price FROM {index} GROUP BY categoryFT.AGGREGATE test_index "*" LOAD 2 category price GROUPBY 1 @category REDUCE MAX 1 @price AS max_priceSELECT category, AVG(price) as avg_price FROM {index} GROUP BY categoryFT.AGGREGATE test_index "*" LOAD 2 category price GROUPBY 1 @category REDUCE AVG 1 @price AS avg_priceSELECT STDDEV(price) as price_stddev FROM {index}FT.AGGREGATE test_index "*" LOAD 1 price GROUPBY 0 REDUCE STDDEV 1 @price AS price_stddevSELECT category, QUANTILE(price, 0.5) as median_price FROM {index} GROUP BY categoryFT.AGGREGATE test_index "*" LOAD 2 price category GROUPBY 1 @category REDUCE QUANTILE 2 @price 0.5 AS median_priceSELECT category, ARRAY_AGG(title) as titles FROM {index} GROUP BY categoryFT.AGGREGATE test_index "*" LOAD 2 category title GROUPBY 1 @category REDUCE TOLIST 1 @title AS titlesSELECT category, FIRST_VALUE(title) as first_title FROM {index} GROUP BY categoryFT.AGGREGATE test_index "*" LOAD 2 category title GROUPBY 1 @category REDUCE FIRST_VALUE 1 @title AS first_titleTODO: subsequent work that will follow in separate PRs