UN-2836 [FEAT] Return full text contents of input file in API response#1904
UN-2836 [FEAT] Return full text contents of input file in API response#1904pk-zipstack wants to merge 2 commits intomainfrom
Conversation
Summary by CodeRabbitRelease Notes
WalkthroughAdds a new Changes
Sequence DiagramsequenceDiagram
participant Client
participant APIView as API View
participant Serializer
participant Helper as DeploymentHelper
participant Execution as ExecutionResult/DTO
Client->>APIView: POST/GET with include_extracted_text flag
APIView->>Serializer: validate request/query
Serializer-->>APIView: validated data (include_extracted_text)
APIView->>Helper: execute_workflow(..., include_extracted_text=flag)
Helper->>Execution: run/process execution
alt include_extracted_text == true
Helper->>Execution: promote_extracted_text()
Execution-->>Helper: response with top-level extracted_text
else
Helper->>Execution: ensure extracted_text removed from metadata
Execution-->>Helper: response without extracted_text
end
Helper-->>APIView: shaped execution response
APIView-->>Client: return API response
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
| Filename | Overview |
|---|---|
| backend/api_v2/constants.py | Adds INCLUDE_EXTRACTED_TEXT constant string to ApiExecution class; clean, minimal change. |
| backend/api_v2/serializers.py | Adds include_extracted_text BooleanField (default False) to both ExecutionRequestSerializer and ExecutionQuerySerializer with proper docstring. |
| backend/api_v2/api_deployment_views.py | Extracts include_extracted_text from validated data in both POST and GET handlers and passes it through to DeploymentHelper correctly. |
| backend/api_v2/deployment_helper.py | Conditionally preserves extracted_text in metadata and calls promote_extracted_text() in both execute_workflow() and process_completed_execution(); ordering relative to remove_inner_result_metadata() is correct. |
| backend/workflow_manager/workflow_v2/dto.py | Adds promote_extracted_text() to ExecutionResponse DTO — safely copies extracted_text from result[i].result.metadata to result[i] top-level with proper null/type checks. |
Sequence Diagram
sequenceDiagram
participant C as Client
participant DE as DeploymentExecution
participant DH as DeploymentHelper
participant R as ExecutionResponse
rect rgb(220, 235, 255)
Note over C,R: POST (Sync) Path
C->>DE: POST .../execute?include_extracted_text=true
DE->>DH: execute_workflow(include_extracted_text=true)
alt enable_highlight is False
DH->>R: remove_result_metadata_keys([highlight_data])
Note over DH,R: extracted_text kept in metadata
end
DH->>R: promote_extracted_text()
Note over R: result[i][extracted_text] = result[i].result.metadata.extracted_text
DH->>R: remove_inner_result_metadata() if not include_metadata
Note over R: top-level extracted_text preserved
DE-->>C: 200 OK with extracted_text at top level
end
rect rgb(220, 255, 230)
Note over C,R: GET (Async Polling) Path
C->>DE: GET .../execute?include_extracted_text=true&execution_id=...
DE->>DH: get_execution_status(execution_id)
DH-->>DE: ExecutionResponse
alt status == COMPLETED
DE->>DH: process_completed_execution(include_extracted_text=true)
DH->>R: promote_extracted_text()
Note over R: result[i][extracted_text] = result[i].result.metadata.extracted_text
DH->>R: remove_inner_result_metadata() if not include_metadata
end
DE-->>C: 200 OK with extracted_text at top level
end
Reviews (3): Last reviewed commit: "Merge branch 'main' into UN-2836-include..." | Re-trigger Greptile
Add `include_extracted_text` parameter to API deployment endpoints that returns the full extracted text of each input file at the top level of each file result, independent of `include_metadata` and the `ENABLE_HIGHLIGHT_API_DEPLOYMENT` configuration flag. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3461d01 to
20f097a
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
backend/api_v2/deployment_helper.py (1)
482-487: Logic is correct; consider extracting shared post-processing into a helper.The extracted_text handling logic here is identical to lines 277-282 in
execute_workflow(). Both blocks share the same pattern forenable_highlightchecking,highlight_dataremoval, andextracted_textconditional removal/promotion.♻️ Optional: Extract shared logic into a private helper
`@staticmethod` def _apply_response_post_processing( response: ExecutionResponse, organization: Any, include_extracted_text: bool, include_metadata: bool, include_metrics: bool, ) -> None: """Apply common post-processing to execution responses.""" enable_highlight = False if ConfigurationRegistry.is_config_key_available( "ENABLE_HIGHLIGHT_API_DEPLOYMENT" ): enable_highlight = Configuration.get_value_by_organization( config_key="ENABLE_HIGHLIGHT_API_DEPLOYMENT", organization=organization, ) if not enable_highlight: response.remove_result_metadata_keys(["highlight_data"]) if not include_extracted_text: response.remove_result_metadata_keys(["extracted_text"]) if include_extracted_text: response.promote_extracted_text() if include_metadata or include_metrics: DeploymentHelper._enrich_result_with_usage_metadata(response) if not include_metadata: response.remove_inner_result_metadata() if not include_metrics: response.remove_result_metrics()This would reduce the duplicated logic in both
execute_workflow()andprocess_completed_execution().🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/api_v2/deployment_helper.py` around lines 482 - 487, The duplicated post-processing logic for extracted_text/highlight_data in process_completed_execution() and execute_workflow() should be moved into a private helper to avoid duplication: add a static method (e.g., _apply_response_post_processing(response: ExecutionResponse, organization, include_extracted_text: bool, include_metadata: bool, include_metrics: bool)) in DeploymentHelper that encapsulates the ENABLE_HIGHLIGHT_API_DEPLOYMENT config check, removal of "highlight_data" and conditional removal/promotion of "extracted_text", and the existing metadata/metrics enrichment/removal logic (calling _enrich_result_with_usage_metadata, remove_inner_result_metadata, remove_result_metrics), then replace the duplicated blocks in execute_workflow() and process_completed_execution() with calls to this new helper.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@backend/api_v2/deployment_helper.py`:
- Around line 482-487: The duplicated post-processing logic for
extracted_text/highlight_data in process_completed_execution() and
execute_workflow() should be moved into a private helper to avoid duplication:
add a static method (e.g., _apply_response_post_processing(response:
ExecutionResponse, organization, include_extracted_text: bool, include_metadata:
bool, include_metrics: bool)) in DeploymentHelper that encapsulates the
ENABLE_HIGHLIGHT_API_DEPLOYMENT config check, removal of "highlight_data" and
conditional removal/promotion of "extracted_text", and the existing
metadata/metrics enrichment/removal logic (calling
_enrich_result_with_usage_metadata, remove_inner_result_metadata,
remove_result_metrics), then replace the duplicated blocks in execute_workflow()
and process_completed_execution() with calls to this new helper.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 76482e9f-7631-4155-b078-692e1cf52d79
📒 Files selected for processing (5)
backend/api_v2/api_deployment_views.pybackend/api_v2/constants.pybackend/api_v2/deployment_helper.pybackend/api_v2/serializers.pybackend/workflow_manager/workflow_v2/dto.py
✅ Files skipped from review due to trivial changes (1)
- backend/api_v2/constants.py
🚧 Files skipped from review as they are similar to previous changes (3)
- backend/workflow_manager/workflow_v2/dto.py
- backend/api_v2/serializers.py
- backend/api_v2/api_deployment_views.py
Test ResultsSummary
Runner Tests - Full Report
SDK1 Tests - Full Report
|
|



What
include_extracted_textboolean parameter (defaultfalse) to both sync (POST) and async polling (GET) API deployment endpoints"extracted_text": "..."include_metadataandENABLE_HIGHLIGHT_API_DEPLOYMENTconfigurationWhy
extracted_textis only available inside metadata, and is gated behind bothinclude_metadata=trueand the enterpriseENABLE_HIGHLIGHT_API_DEPLOYMENTflagHow
INCLUDE_EXTRACTED_TEXTconstant toApiExecutioninclude_extracted_textfield toExecutionRequestSerializer(POST) andExecutionQuerySerializer(GET)promote_extracted_text()method toExecutionResponseDTO — copiesextracted_textfromresult[i].result.metadatatoresult[i].extracted_textdeployment_helper.py) and GET (api_deployment_views.py) flows: wheninclude_extracted_text=true, preserveextracted_textin metadata before highlight filtering, then promote it to the top levelinclude_metadata/ highlight filtering still runs, so metadata is cleaned as usualCan this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)
false, so existing API behavior is unchanged. The promotion logic only runs when explicitly requested. No changes to the execution pipeline, caching, or metadata population.Database Migrations
Env Config
Relevant Docs
Related Issues or PRs
Dependencies Versions
Notes on Testing
include_extracted_text=truereturnsextracted_textat file-result top levelinclude_extracted_text=false(default) does not includeextracted_textinclude_extracted_text=truereturnsextracted_textat file-result top levelinclude_extracted_text=trueworks withoutinclude_metadata=trueinclude_extracted_text=trueworks even whenENABLE_HIGHLIGHT_API_DEPLOYMENTis disabledinclude_extracted_text=trueandinclude_metadata=true, extracted_text appears at top level and in metadataScreenshots
Checklist
I have read and understood the Contribution Guidelines.