Skip to content

feat: add get_task_documents to retrieve a task's documents#1252

Open
tcferreira wants to merge 1 commit into
meilisearch:mainfrom
tcferreira:feat/get-task-documents
Open

feat: add get_task_documents to retrieve a task's documents#1252
tcferreira wants to merge 1 commit into
meilisearch:mainfrom
tcferreira:feat/get-task-documents

Conversation

@tcferreira

@tcferreira tcferreira commented Jun 23, 2026

Copy link
Copy Markdown

Summary

Meilisearch v1.13 introduced GET /tasks/{uid}/documents to retrieve the documents
associated with a task. This adds Client.get_task_documents(uid) (and the underlying
TaskHandler method). Closes #1221.

Changes

  • HttpRequests.get_stream — a streaming GET (mirrors the existing post_stream),
    used to read the raw payload.
  • _utils.parse_task_documents — normalizes the payload into a list of documents.
    The endpoint can return a JSON array, a single JSON object, NDJSON, or several JSON
    objects concatenated without a separator, so the parser handles all of those.
  • TaskHandler.get_task_documents / Client.get_task_documents — call the endpoint and
    return the parsed documents.
  • Tests: parametrized unit tests for the parser (array / object / NDJSON / concatenated /
    empty) and a request-shape test asserting the method hits tasks/{uid}/documents and
    parses the response.
  • get_task_documents_1 code sample.

Notes

  • This is an experimental Meilisearch feature (getTaskDocumentsRoute), noted in the
    docstrings.
  • The parser mirrors the behavior of the official meilisearch-js SDK for the same
    endpoint, for cross-SDK consistency.

Summary by CodeRabbit

  • New Features

    • Added ability to retrieve documents associated with a specific task (experimental feature).
    • Documents can be accessed via the task ID to view what was added or updated.
  • Tests

    • Added test coverage for task document retrieval and payload parsing.

Meilisearch v1.13 added `GET /tasks/{uid}/documents` to fetch the documents
associated with a task. Add `Client.get_task_documents` (and the underlying
`TaskHandler` method), backed by a streaming GET (`HttpRequests.get_stream`) and
a parser that normalizes the JSON array / NDJSON / concatenated-JSON payload the
endpoint can return.

Adds unit tests for the parser, a request-shape test for the method, and a
`get_task_documents_1` code sample.

Closes meilisearch#1221
@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

Adds experimental support for the GET /tasks/{uid}/documents Meilisearch endpoint. A streaming HTTP method get_stream is added to HttpRequests. A new parse_task_documents utility normalizes multiple JSON response shapes. TaskHandler and Client expose get_task_documents(uid), backed by the streaming transport and parser. Tests and a YAML code sample are included.

Changes

get_task_documents feature

Layer / File(s) Summary
Streaming transport and document parsing utility
meilisearch/_httprequests.py, meilisearch/_utils.py
HttpRequests.get_stream issues a stream=True GET and maps Timeout, ConnectionError, and HTTPError to library-specific errors. parse_task_documents normalizes task-document payloads across JSON array, single-object, NDJSON, and concatenated-object formats, returning an empty list for blank input.
TaskHandler and Client public methods
meilisearch/task.py, meilisearch/client.py, .code-samples.meilisearch.yaml
TaskHandler.get_task_documents calls http.get_stream("tasks/{uid}/documents") and feeds response.text through parse_task_documents. Client.get_task_documents delegates to TaskHandler with an experimental-feature docstring. The get_task_documents_1 YAML code sample is inserted after get_task_1.
Unit tests
tests/test_utils.py, tests/client/test_client_task_meilisearch.py
Parametrized tests cover all parse_task_documents input shapes (array, single object, NDJSON, concatenated, empty). A mocked client test asserts the correct endpoint path and parsed output from a streamed response.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant Client
    participant TaskHandler
    participant HttpRequests
    participant parse_task_documents

    Caller->>Client: get_task_documents(uid)
    Client->>TaskHandler: get_task_documents(uid)
    TaskHandler->>HttpRequests: get_stream("tasks/{uid}/documents")
    HttpRequests->>HttpRequests: requests.get(url, stream=True)
    HttpRequests-->>TaskHandler: Response object
    TaskHandler->>parse_task_documents: response.text
    parse_task_documents-->>TaskHandler: List[Dict[str, Any]]
    TaskHandler-->>Client: List[Dict[str, Any]]
    Client-->>Caller: List[Dict[str, Any]]
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐇 Hop hop, what's this delight?
A stream of docs, JSON bright!
We parse each shape — array or lone,
NDJSON lines, concatenated stone.
get_task_documents now in sight,
The rabbit ships new endpoints right! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely summarizes the main feature added: a get_task_documents method to retrieve a task's documents.
Linked Issues check ✅ Passed All requirements from issue #1221 are met: API method added to retrieve task documents [#1221], test cases included for the new method [#1221], and code sample added under get_task_documents_1 key [#1221].
Out of Scope Changes check ✅ Passed All changes directly support the implementation of get_task_documents functionality as specified in issue #1221; no unrelated or out-of-scope changes detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@meilisearch/_httprequests.py`:
- Around line 242-248: Add an exception handler for
`requests.exceptions.InvalidSchema` to the `get_stream` method's exception
handling block (after the ConnectionError handler) to match the pattern used in
`send_request` and `post_stream`. The handler should catch
`requests.exceptions.InvalidSchema` and raise `MeilisearchCommunicationError`
wrapping the error message, maintaining consistency across all HTTP request
methods in the SDK.

In `@meilisearch/_utils.py`:
- Around line 65-67: The splitlines() loop that processes the payload and splits
on _CONCATENATED_JSON regex pattern incorrectly handles valid JSON objects
containing "}{" within string values. Replace the naive regex-based splitting
logic (the for loops that iterate through payload.splitlines() and
_CONCATENATED_JSON.split(line)) with a JSON-aware parser that properly
understands JSON structure and correctly identifies object boundaries by
tracking quote context and brace nesting, rather than using simple string
pattern matching.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8e377971-df13-4605-b8a2-a5528e9b29db

📥 Commits

Reviewing files that changed from the base of the PR and between 1464823 and 4287bce.

📒 Files selected for processing (7)
  • .code-samples.meilisearch.yaml
  • meilisearch/_httprequests.py
  • meilisearch/_utils.py
  • meilisearch/client.py
  • meilisearch/task.py
  • tests/client/test_client_task_meilisearch.py
  • tests/test_utils.py

Comment on lines +242 to +248
except requests.exceptions.Timeout as err:
raise MeilisearchTimeoutError(str(err)) from err
except requests.exceptions.ConnectionError as err:
raise MeilisearchCommunicationError(str(err)) from err
except requests.exceptions.HTTPError as err:
raise MeilisearchApiError(str(err), response) from err

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cd /tmp && find . -name "_httprequests.py" -o -name "*http*requests*" 2>/dev/null | head -20

Repository: meilisearch/meilisearch-python

Length of output: 168


🏁 Script executed:

cd /tmp && git ls-files | grep -i http

Repository: meilisearch/meilisearch-python

Length of output: 237


🏁 Script executed:

cd /tmp && git ls-files | head -30

Repository: meilisearch/meilisearch-python

Length of output: 237


🏁 Script executed:

find . -type f -name "_httprequests.py" 2>/dev/null

Repository: meilisearch/meilisearch-python

Length of output: 199


🏁 Script executed:

find . -type f -name "*.py" | grep -i http | head -20

Repository: meilisearch/meilisearch-python

Length of output: 236


🏁 Script executed:

ls -la

Repository: meilisearch/meilisearch-python

Length of output: 1413


🏁 Script executed:

git ls-files | grep -E "\.(py|js|ts)$" | head -30

Repository: meilisearch/meilisearch-python

Length of output: 1078


🏁 Script executed:

wc -l ./meilisearch/_httprequests.py

Repository: meilisearch/meilisearch-python

Length of output: 203


🏁 Script executed:

ast-grep outline ./meilisearch/_httprequests.py

Repository: meilisearch/meilisearch-python

Length of output: 425


🏁 Script executed:

sed -n '1,270p' ./meilisearch/_httprequests.py | cat -n

Repository: meilisearch/meilisearch-python

Length of output: 11223


Add InvalidSchema exception handler to get_stream for consistency.

The get_stream method at lines 242-247 lacks an InvalidSchema handler that exists in both send_request and post_stream. Malformed base URLs currently raise raw requests.exceptions.InvalidSchema instead of wrapping it in MeilisearchCommunicationError, breaking SDK-level exception behavior consistency.

Suggested fix
         except requests.exceptions.Timeout as err:
             raise MeilisearchTimeoutError(str(err)) from err
         except requests.exceptions.ConnectionError as err:
             raise MeilisearchCommunicationError(str(err)) from err
         except requests.exceptions.HTTPError as err:
             raise MeilisearchApiError(str(err), response) from err
+        except requests.exceptions.InvalidSchema as err:
+            if "://" not in self.config.url:
+                raise MeilisearchCommunicationError(
+                    f"""
+                    Invalid URL {self.config.url}, no scheme/protocol supplied.
+                    Did you mean https://{self.config.url}?
+                    """
+                ) from err
+
+            raise MeilisearchCommunicationError(str(err)) from err
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@meilisearch/_httprequests.py` around lines 242 - 248, Add an exception
handler for `requests.exceptions.InvalidSchema` to the `get_stream` method's
exception handling block (after the ConnectionError handler) to match the
pattern used in `send_request` and `post_stream`. The handler should catch
`requests.exceptions.InvalidSchema` and raise `MeilisearchCommunicationError`
wrapping the error message, maintaining consistency across all HTTP request
methods in the SDK.

Comment thread meilisearch/_utils.py
Comment on lines +65 to +67
for line in payload.splitlines():
for chunk in _CONCATENATED_JSON.split(line):
stripped = chunk.strip()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Use a JSON-aware concatenation parser instead of regex boundary splitting.

Lines 65-67 can mis-split valid payloads when a document string contains "}{" (e.g., {"text":"a}{b"}{"id":2}), causing false decode failures.

Suggested fix
     except json.JSONDecodeError:
-        documents: List[Dict[str, Any]] = []
-        for line in payload.splitlines():
-            for chunk in _CONCATENATED_JSON.split(line):
-                stripped = chunk.strip()
-                if stripped:
-                    documents.append(json.loads(stripped))
+        decoder = json.JSONDecoder()
+        documents: List[Dict[str, Any]] = []
+        idx = 0
+        while idx < len(payload):
+            while idx < len(payload) and payload[idx].isspace():
+                idx += 1
+            if idx >= len(payload):
+                break
+            document, idx = decoder.raw_decode(payload, idx)
+            documents.append(document)
         return documents
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@meilisearch/_utils.py` around lines 65 - 67, The splitlines() loop that
processes the payload and splits on _CONCATENATED_JSON regex pattern incorrectly
handles valid JSON objects containing "}{" within string values. Replace the
naive regex-based splitting logic (the for loops that iterate through
payload.splitlines() and _CONCATENATED_JSON.split(line)) with a JSON-aware
parser that properly understands JSON structure and correctly identifies object
boundaries by tracking quote context and brace nesting, rather than using simple
string pattern matching.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Meilisearch v1.13.0] Add method to get tasks documents

1 participant