feat: add get_task_documents to retrieve a task's documents#1252
feat: add get_task_documents to retrieve a task's documents#1252tcferreira wants to merge 1 commit into
Conversation
Meilisearch v1.13 added `GET /tasks/{uid}/documents` to fetch the documents
associated with a task. Add `Client.get_task_documents` (and the underlying
`TaskHandler` method), backed by a streaming GET (`HttpRequests.get_stream`) and
a parser that normalizes the JSON array / NDJSON / concatenated-JSON payload the
endpoint can return.
Adds unit tests for the parser, a request-shape test for the method, and a
`get_task_documents_1` code sample.
Closes meilisearch#1221
📝 WalkthroughWalkthroughAdds experimental support for the Changesget_task_documents feature
Sequence Diagram(s)sequenceDiagram
participant Caller
participant Client
participant TaskHandler
participant HttpRequests
participant parse_task_documents
Caller->>Client: get_task_documents(uid)
Client->>TaskHandler: get_task_documents(uid)
TaskHandler->>HttpRequests: get_stream("tasks/{uid}/documents")
HttpRequests->>HttpRequests: requests.get(url, stream=True)
HttpRequests-->>TaskHandler: Response object
TaskHandler->>parse_task_documents: response.text
parse_task_documents-->>TaskHandler: List[Dict[str, Any]]
TaskHandler-->>Client: List[Dict[str, Any]]
Client-->>Caller: List[Dict[str, Any]]
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@meilisearch/_httprequests.py`:
- Around line 242-248: Add an exception handler for
`requests.exceptions.InvalidSchema` to the `get_stream` method's exception
handling block (after the ConnectionError handler) to match the pattern used in
`send_request` and `post_stream`. The handler should catch
`requests.exceptions.InvalidSchema` and raise `MeilisearchCommunicationError`
wrapping the error message, maintaining consistency across all HTTP request
methods in the SDK.
In `@meilisearch/_utils.py`:
- Around line 65-67: The splitlines() loop that processes the payload and splits
on _CONCATENATED_JSON regex pattern incorrectly handles valid JSON objects
containing "}{" within string values. Replace the naive regex-based splitting
logic (the for loops that iterate through payload.splitlines() and
_CONCATENATED_JSON.split(line)) with a JSON-aware parser that properly
understands JSON structure and correctly identifies object boundaries by
tracking quote context and brace nesting, rather than using simple string
pattern matching.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8e377971-df13-4605-b8a2-a5528e9b29db
📒 Files selected for processing (7)
.code-samples.meilisearch.yamlmeilisearch/_httprequests.pymeilisearch/_utils.pymeilisearch/client.pymeilisearch/task.pytests/client/test_client_task_meilisearch.pytests/test_utils.py
| except requests.exceptions.Timeout as err: | ||
| raise MeilisearchTimeoutError(str(err)) from err | ||
| except requests.exceptions.ConnectionError as err: | ||
| raise MeilisearchCommunicationError(str(err)) from err | ||
| except requests.exceptions.HTTPError as err: | ||
| raise MeilisearchApiError(str(err), response) from err | ||
|
|
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
cd /tmp && find . -name "_httprequests.py" -o -name "*http*requests*" 2>/dev/null | head -20Repository: meilisearch/meilisearch-python
Length of output: 168
🏁 Script executed:
cd /tmp && git ls-files | grep -i httpRepository: meilisearch/meilisearch-python
Length of output: 237
🏁 Script executed:
cd /tmp && git ls-files | head -30Repository: meilisearch/meilisearch-python
Length of output: 237
🏁 Script executed:
find . -type f -name "_httprequests.py" 2>/dev/nullRepository: meilisearch/meilisearch-python
Length of output: 199
🏁 Script executed:
find . -type f -name "*.py" | grep -i http | head -20Repository: meilisearch/meilisearch-python
Length of output: 236
🏁 Script executed:
ls -laRepository: meilisearch/meilisearch-python
Length of output: 1413
🏁 Script executed:
git ls-files | grep -E "\.(py|js|ts)$" | head -30Repository: meilisearch/meilisearch-python
Length of output: 1078
🏁 Script executed:
wc -l ./meilisearch/_httprequests.pyRepository: meilisearch/meilisearch-python
Length of output: 203
🏁 Script executed:
ast-grep outline ./meilisearch/_httprequests.pyRepository: meilisearch/meilisearch-python
Length of output: 425
🏁 Script executed:
sed -n '1,270p' ./meilisearch/_httprequests.py | cat -nRepository: meilisearch/meilisearch-python
Length of output: 11223
Add InvalidSchema exception handler to get_stream for consistency.
The get_stream method at lines 242-247 lacks an InvalidSchema handler that exists in both send_request and post_stream. Malformed base URLs currently raise raw requests.exceptions.InvalidSchema instead of wrapping it in MeilisearchCommunicationError, breaking SDK-level exception behavior consistency.
Suggested fix
except requests.exceptions.Timeout as err:
raise MeilisearchTimeoutError(str(err)) from err
except requests.exceptions.ConnectionError as err:
raise MeilisearchCommunicationError(str(err)) from err
except requests.exceptions.HTTPError as err:
raise MeilisearchApiError(str(err), response) from err
+ except requests.exceptions.InvalidSchema as err:
+ if "://" not in self.config.url:
+ raise MeilisearchCommunicationError(
+ f"""
+ Invalid URL {self.config.url}, no scheme/protocol supplied.
+ Did you mean https://{self.config.url}?
+ """
+ ) from err
+
+ raise MeilisearchCommunicationError(str(err)) from err🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@meilisearch/_httprequests.py` around lines 242 - 248, Add an exception
handler for `requests.exceptions.InvalidSchema` to the `get_stream` method's
exception handling block (after the ConnectionError handler) to match the
pattern used in `send_request` and `post_stream`. The handler should catch
`requests.exceptions.InvalidSchema` and raise `MeilisearchCommunicationError`
wrapping the error message, maintaining consistency across all HTTP request
methods in the SDK.
| for line in payload.splitlines(): | ||
| for chunk in _CONCATENATED_JSON.split(line): | ||
| stripped = chunk.strip() |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Use a JSON-aware concatenation parser instead of regex boundary splitting.
Lines 65-67 can mis-split valid payloads when a document string contains "}{" (e.g., {"text":"a}{b"}{"id":2}), causing false decode failures.
Suggested fix
except json.JSONDecodeError:
- documents: List[Dict[str, Any]] = []
- for line in payload.splitlines():
- for chunk in _CONCATENATED_JSON.split(line):
- stripped = chunk.strip()
- if stripped:
- documents.append(json.loads(stripped))
+ decoder = json.JSONDecoder()
+ documents: List[Dict[str, Any]] = []
+ idx = 0
+ while idx < len(payload):
+ while idx < len(payload) and payload[idx].isspace():
+ idx += 1
+ if idx >= len(payload):
+ break
+ document, idx = decoder.raw_decode(payload, idx)
+ documents.append(document)
return documents🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@meilisearch/_utils.py` around lines 65 - 67, The splitlines() loop that
processes the payload and splits on _CONCATENATED_JSON regex pattern incorrectly
handles valid JSON objects containing "}{" within string values. Replace the
naive regex-based splitting logic (the for loops that iterate through
payload.splitlines() and _CONCATENATED_JSON.split(line)) with a JSON-aware
parser that properly understands JSON structure and correctly identifies object
boundaries by tracking quote context and brace nesting, rather than using simple
string pattern matching.
Summary
Meilisearch v1.13 introduced
GET /tasks/{uid}/documentsto retrieve the documentsassociated with a task. This adds
Client.get_task_documents(uid)(and the underlyingTaskHandlermethod). Closes #1221.Changes
HttpRequests.get_stream— a streaming GET (mirrors the existingpost_stream),used to read the raw payload.
_utils.parse_task_documents— normalizes the payload into a list of documents.The endpoint can return a JSON array, a single JSON object, NDJSON, or several JSON
objects concatenated without a separator, so the parser handles all of those.
TaskHandler.get_task_documents/Client.get_task_documents— call the endpoint andreturn the parsed documents.
empty) and a request-shape test asserting the method hits
tasks/{uid}/documentsandparses the response.
get_task_documents_1code sample.Notes
getTaskDocumentsRoute), noted in thedocstrings.
meilisearch-jsSDK for the sameendpoint, for cross-SDK consistency.
Summary by CodeRabbit
New Features
Tests