feat: add text/image-to-3d-scene pipeline#340
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a new embodichain/gen_sim/prompt2scene module implementing a text/image-to-3D-scene pipeline, including LLM-backed workflows (LangGraph), prompt templating, service clients/managers for asset generation/segmentation, and a CLI entry point to run the end-to-end process.
Changes:
- Add LangGraph workflows for scene intake, relation extraction, unified-scene assembly, and downstream scene generation.
- Add agent-tool clients/managers (HTTP service wrappers + geometry/simulation utilities) used by the pipeline.
- Add prompt templating system + bundled YAML prompt templates, plus CLI/config scaffolding.
Reviewed changes
Copilot reviewed 130 out of 130 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| embodichain/gen_sim/prompt2scene/init.py | Package root (currently header-only) |
| embodichain/gen_sim/prompt2scene/.gitignore | Ignore local outputs/servers |
| embodichain/gen_sim/prompt2scene/cli/init.py | CLI package init |
| embodichain/gen_sim/prompt2scene/cli/start.py | CLI entry point |
| embodichain/gen_sim/prompt2scene/configs/client_config.json | Default agent-tool server endpoints |
| embodichain/gen_sim/prompt2scene/configs/llm_config.json | Default LLM config scaffold |
| embodichain/gen_sim/prompt2scene/llms/init.py | LLM helpers exports |
| embodichain/gen_sim/prompt2scene/llms/config.py | LLM config dataclass |
| embodichain/gen_sim/prompt2scene/llms/openai_compatible.py | OpenAI-compatible LangChain model builder + config loader |
| embodichain/gen_sim/prompt2scene/pipeline/init.py | Pipeline runner exports |
| embodichain/gen_sim/prompt2scene/pipeline/runner.py | End-to-end pipeline orchestration |
| embodichain/gen_sim/prompt2scene/prompts/init.py | Prompt rendering API |
| embodichain/gen_sim/prompt2scene/prompts/base.py | Prompt template loader/renderer |
| embodichain/gen_sim/prompt2scene/prompts/data/init.py | Bundled prompt data package init |
| embodichain/gen_sim/prompt2scene/prompts/data/image_relations.yaml | Image-relations prompt template |
| embodichain/gen_sim/prompt2scene/prompts/data/scene_intake.yaml | Scene-intake prompt template |
| embodichain/gen_sim/prompt2scene/prompts/data/text_relations.yaml | Text-relations prompt template |
| embodichain/gen_sim/prompt2scene/prompts/data/unified_scene_gen.yaml | Unified-scene-gen prompt template |
| embodichain/gen_sim/prompt2scene/utils/init.py | prompt2scene utility exports |
| embodichain/gen_sim/prompt2scene/utils/io.py | IO helpers (json/path/data-url) |
| embodichain/gen_sim/prompt2scene/utils/log.py | prompt2scene logging helpers |
| embodichain/gen_sim/prompt2scene/workflows/init.py | Workflow constants/exports |
| embodichain/gen_sim/prompt2scene/workflows/attempt_state.py | Retry/error TypedDict base |
| embodichain/gen_sim/prompt2scene/workflows/artifact_writer.py | Step artifact writing helpers |
| embodichain/gen_sim/prompt2scene/workflows/llm_output.py | Structured-model call wrappers |
| embodichain/gen_sim/prompt2scene/workflows/request.py | Input normalization + manifest |
| embodichain/gen_sim/prompt2scene/workflows/spatial.py | Spatial constant definitions |
| embodichain/gen_sim/prompt2scene/workflows/stage_errors.py | Error formatting helpers |
| embodichain/gen_sim/prompt2scene/workflows/image_relations/init.py | Image-relations workflow exports |
| embodichain/gen_sim/prompt2scene/workflows/image_relations/graph.py | Image-relations LangGraph |
| embodichain/gen_sim/prompt2scene/workflows/image_relations/nodes.py | Image-relations nodes |
| embodichain/gen_sim/prompt2scene/workflows/image_relations/prompts.py | Image-relations message builders |
| embodichain/gen_sim/prompt2scene/workflows/image_relations/schema.py | Image-relations output schema |
| embodichain/gen_sim/prompt2scene/workflows/image_relations/state.py | Image-relations state typing |
| embodichain/gen_sim/prompt2scene/workflows/image_relations/utils.py | Image-relations normalization utils |
| embodichain/gen_sim/prompt2scene/workflows/scene_intake/init.py | Scene-intake workflow exports |
| embodichain/gen_sim/prompt2scene/workflows/scene_intake/graph.py | Scene-intake LangGraph |
| embodichain/gen_sim/prompt2scene/workflows/scene_intake/nodes.py | Scene-intake nodes |
| embodichain/gen_sim/prompt2scene/workflows/scene_intake/prompts.py | Scene-intake message builders |
| embodichain/gen_sim/prompt2scene/workflows/scene_intake/schema.py | Scene-intake JSON schema |
| embodichain/gen_sim/prompt2scene/workflows/scene_intake/state.py | Scene-intake state typing |
| embodichain/gen_sim/prompt2scene/workflows/scene_intake/utils.py | Scene-intake normalization utils |
| embodichain/gen_sim/prompt2scene/workflows/text_relations/init.py | Text-relations workflow exports |
| embodichain/gen_sim/prompt2scene/workflows/text_relations/graph.py | Text-relations LangGraph |
| embodichain/gen_sim/prompt2scene/workflows/text_relations/nodes.py | Text-relations nodes |
| embodichain/gen_sim/prompt2scene/workflows/text_relations/prompts.py | Text-relations message builders |
| embodichain/gen_sim/prompt2scene/workflows/text_relations/schema.py | Text-relations JSON schema + dataclasses |
| embodichain/gen_sim/prompt2scene/workflows/text_relations/state.py | Text-relations state typing |
| embodichain/gen_sim/prompt2scene/workflows/text_relations/utils.py | Text-relations normalization utils |
| embodichain/gen_sim/prompt2scene/workflows/unified_scene/init.py | Unified-scene workflow package init (currently empty exports) |
| embodichain/gen_sim/prompt2scene/workflows/unified_scene/graph.py | Unified-scene LangGraph |
| embodichain/gen_sim/prompt2scene/workflows/unified_scene/nodes.py | Unified-scene assembly node |
| embodichain/gen_sim/prompt2scene/workflows/unified_scene/schema.py | Unified-scene dataclasses + manifest |
| embodichain/gen_sim/prompt2scene/workflows/unified_scene/state.py | Unified-scene state typing |
| embodichain/gen_sim/prompt2scene/workflows/unified_scene/utils.py | Unified-scene construction helpers |
| embodichain/gen_sim/prompt2scene/workflows/unified_scene_gen/init.py | Unified-scene-gen workflow exports |
| embodichain/gen_sim/prompt2scene/workflows/unified_scene_gen/graph.py | Unified-scene-gen LangGraph |
| embodichain/gen_sim/prompt2scene/workflows/unified_scene_gen/nodes.py | Unified-scene-gen nodes |
| embodichain/gen_sim/prompt2scene/workflows/unified_scene_gen/paths.py | Unified-scene-gen path helpers |
| embodichain/gen_sim/prompt2scene/workflows/unified_scene_gen/prompts.py | Unified-scene-gen message builders |
| embodichain/gen_sim/prompt2scene/workflows/unified_scene_gen/schema.py | Unified-scene-gen JSON schemas |
| embodichain/gen_sim/prompt2scene/workflows/unified_scene_gen/scene_update.py | Unified-scene manifest updates |
| embodichain/gen_sim/prompt2scene/workflows/unified_scene_gen/state.py | Unified-scene-gen state typing |
| embodichain/gen_sim/prompt2scene/agent_tools/init.py | Agent-tools package init (currently missing header/future/all) |
| embodichain/gen_sim/prompt2scene/agent_tools/servers/init.py | External-servers package init |
| embodichain/gen_sim/prompt2scene/agent_tools/clients/init.py | Client package exports |
| embodichain/gen_sim/prompt2scene/agent_tools/clients/base.py | Shared HTTP client retry logic |
| embodichain/gen_sim/prompt2scene/agent_tools/clients/common.py | HTTP parsing/validation helpers |
| embodichain/gen_sim/prompt2scene/agent_tools/clients/config.py | Client config loader |
| embodichain/gen_sim/prompt2scene/agent_tools/clients/image_generation_client/init.py | Image-generation client exports |
| embodichain/gen_sim/prompt2scene/agent_tools/clients/image_generation_client/client.py | Z-Image client implementation |
| embodichain/gen_sim/prompt2scene/agent_tools/clients/image_generation_client/parser.py | Z-Image response parser |
| embodichain/gen_sim/prompt2scene/agent_tools/clients/image_generation_client/schemas.py | Z-Image client schemas |
| embodichain/gen_sim/prompt2scene/agent_tools/clients/image_segmentation_client/init.py | Image-segmentation client exports |
| embodichain/gen_sim/prompt2scene/agent_tools/clients/image_segmentation_client/client.py | SAM3 client implementation |
| embodichain/gen_sim/prompt2scene/agent_tools/clients/image_segmentation_client/parser.py | SAM3 response parser |
| embodichain/gen_sim/prompt2scene/agent_tools/clients/image_segmentation_client/schemas.py | SAM3 client schemas |
| embodichain/gen_sim/prompt2scene/agent_tools/clients/image_segmentation_client/utils.py | SAM3 visualization/mask utils |
| embodichain/gen_sim/prompt2scene/agent_tools/clients/geometry_generation_client/init.py | Geometry-gen client exports |
| embodichain/gen_sim/prompt2scene/agent_tools/clients/geometry_generation_client/client.py | Geometry-gen client implementation |
| embodichain/gen_sim/prompt2scene/agent_tools/clients/geometry_generation_client/parser.py | Geometry-gen response parser |
| embodichain/gen_sim/prompt2scene/agent_tools/clients/geometry_generation_client/schemas.py | Geometry-gen client schemas |
| embodichain/gen_sim/prompt2scene/agent_tools/tools/init.py | Tools package init |
| embodichain/gen_sim/prompt2scene/agent_tools/tools/gym_export.py | Gym export tool |
| embodichain/gen_sim/prompt2scene/agent_tools/tools/image_scene_asset_generation.py | Image-scene asset generation tool |
| embodichain/gen_sim/prompt2scene/agent_tools/tools/table_fit_scene.py | Table fitting tools |
| embodichain/gen_sim/prompt2scene/agent_tools/tools/text_asset_generation.py | Text asset generation tool |
| embodichain/gen_sim/prompt2scene/agent_tools/tools/text_clutter_layout.py | Text clutter layout tool |
| embodichain/gen_sim/prompt2scene/agent_tools/tools/text_scene_metric_scale.py | Text-scene metric scaling tool |
| embodichain/gen_sim/prompt2scene/agent_tools/managers/blender_rendering_manager/init.py | Blender rendering manager exports |
| embodichain/gen_sim/prompt2scene/agent_tools/managers/blender_rendering_manager/manager.py | Blender rendering implementation |
| embodichain/gen_sim/prompt2scene/agent_tools/managers/blender_rendering_manager/schemas.py | Blender rendering schemas |
| embodichain/gen_sim/prompt2scene/agent_tools/managers/geometry_generation_manager/init.py | Geometry-generation manager exports |
| embodichain/gen_sim/prompt2scene/agent_tools/managers/geometry_generation_manager/manager.py | Geometry-generation orchestration |
| embodichain/gen_sim/prompt2scene/agent_tools/managers/geometry_generation_manager/schemas.py | Geometry-generation schemas |
| embodichain/gen_sim/prompt2scene/agent_tools/managers/geometry_manager/init.py | Geometry manager exports |
| embodichain/gen_sim/prompt2scene/agent_tools/managers/geometry_manager/manager.py | Mesh processing utilities |
| embodichain/gen_sim/prompt2scene/agent_tools/managers/geometry_manager/schemas.py | Geometry manager schemas |
| embodichain/gen_sim/prompt2scene/agent_tools/managers/geometry_manager/scene_geometry.py | Scene geometry helpers |
| embodichain/gen_sim/prompt2scene/agent_tools/managers/image_generation_manager/init.py | Image generation manager exports |
| embodichain/gen_sim/prompt2scene/agent_tools/managers/image_generation_manager/manager.py | Image generation manager implementation |
| embodichain/gen_sim/prompt2scene/agent_tools/managers/image_generation_manager/schemas.py | Image generation schemas |
| embodichain/gen_sim/prompt2scene/agent_tools/managers/image_scene_manager/init.py | Image scene manager exports |
| embodichain/gen_sim/prompt2scene/agent_tools/managers/image_scene_manager/alignment.py | Image-scene alignment utilities |
| embodichain/gen_sim/prompt2scene/agent_tools/managers/image_scene_manager/manifests.py | Image-scene manifest writers |
| embodichain/gen_sim/prompt2scene/agent_tools/managers/image_scene_manager/prompts.py | Image-scene prompt builders |
| embodichain/gen_sim/prompt2scene/agent_tools/managers/image_scene_manager/schemas.py | Image-scene schemas/constants |
| embodichain/gen_sim/prompt2scene/agent_tools/managers/image_segmentation_manager/init.py | Image segmentation manager exports |
| embodichain/gen_sim/prompt2scene/agent_tools/managers/image_segmentation_manager/manager.py | Segmentation domain operations |
| embodichain/gen_sim/prompt2scene/agent_tools/managers/image_segmentation_manager/schemas.py | Segmentation manager schemas |
| embodichain/gen_sim/prompt2scene/agent_tools/managers/matplotlib_manager/init.py | Matplotlib manager exports |
| embodichain/gen_sim/prompt2scene/agent_tools/managers/matplotlib_manager/manager.py | Visualization implementation |
| embodichain/gen_sim/prompt2scene/agent_tools/managers/matplotlib_manager/schemas.py | Visualization schemas |
| embodichain/gen_sim/prompt2scene/agent_tools/managers/metric_scale_manager/init.py | Metric scale manager exports |
| embodichain/gen_sim/prompt2scene/agent_tools/managers/metric_scale_manager/manager.py | Metric scale logic |
| embodichain/gen_sim/prompt2scene/agent_tools/managers/metric_scale_manager/schemas.py | Metric scale schemas |
| embodichain/gen_sim/prompt2scene/agent_tools/managers/optimization_manager/init.py | Optimization helpers exports |
| embodichain/gen_sim/prompt2scene/agent_tools/managers/optimization_manager/manager.py | Layout/packing utilities |
| embodichain/gen_sim/prompt2scene/agent_tools/managers/simulation_manager/init.py | Simulation manager exports |
| embodichain/gen_sim/prompt2scene/agent_tools/managers/simulation_manager/manager.py | Gravity-drop simulation wrapper |
| embodichain/gen_sim/prompt2scene/agent_tools/managers/simulation_manager/schemas.py | Simulation manager schemas |
| embodichain/gen_sim/prompt2scene/agent_tools/managers/simready_manager/init.py | Simready manager exports |
| embodichain/gen_sim/prompt2scene/agent_tools/managers/simready_manager/manager.py | Simready conversion logic |
| embodichain/gen_sim/prompt2scene/agent_tools/managers/simready_manager/schemas.py | Simready schemas |
| embodichain/gen_sim/prompt2scene/agent_tools/managers/table_clutter_fit_manager/init.py | Table/clutter fit manager exports |
| embodichain/gen_sim/prompt2scene/agent_tools/managers/table_clutter_fit_manager/manager.py | Table-to-clutter fitting logic |
| embodichain/gen_sim/prompt2scene/agent_tools/managers/text_layout_manager/init.py | Text layout manager exports |
| embodichain/gen_sim/prompt2scene/agent_tools/managers/text_layout_manager/layout.py | Text layout placement logic |
| embodichain/gen_sim/prompt2scene/agent_tools/managers/text_layout_manager/optimization.py | Text layout optimization |
| embodichain/gen_sim/prompt2scene/agent_tools/managers/text_layout_manager/settle.py | Physics settling helpers |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -0,0 +1 @@ | |||
| """Internal client + External server for agent tool calling.""" | |||
| # ---------------------------------------------------------------------------- | ||
| # Copyright (c) 2021-2026 DexForce Technology Co., Ltd. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
| # ---------------------------------------------------------------------------- No newline at end of file |
| from __future__ import annotations | ||
|
|
||
| __all__: list[str] = [] |
| from importlib import resources | ||
| from pathlib import Path | ||
| from string import Template | ||
| from typing import Any, Mapping |
| def _get_prompt_path(self, prompt_name: str) -> Path: | ||
| if "/" in prompt_name or "\\" in prompt_name: | ||
| raise ValueError(f"Prompt name must be a file name: {prompt_name}") | ||
| return resources.files(self._package).joinpath(prompt_name) |
| sim.update(step=300) | ||
|
|
||
| final_pose = obj.get_local_pose(to_matrix=True)[0].detach().cpu() | ||
| sim._deferred_destroy() |
| { | ||
| "sam3_segmentation": { | ||
| "base_url": "http://192.168.3.23:5014", | ||
| "timeout_s": 1200, | ||
| "health_path": "/health", | ||
| "segment_single_object_path": "/predict" | ||
| }, | ||
| "sam3d_generation": { | ||
| "base_url": "http://10.7.7.32:5019", | ||
| "timeout_s": 1800, | ||
| "health_path": "/health", | ||
| "generate_multiple_objects_path": "/generate_multiple_objects", | ||
| "generate_single_object_path": "/generate_single_object" | ||
| }, | ||
| "zimage": { | ||
| "base_url": "http://192.168.3.23:5013", | ||
| "timeout_s": 120, | ||
| "health_path": "/health", | ||
| "generate_single_object_path": "/generate.png" | ||
| } | ||
| } |
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
| # ---------------------------------------------------------------------------- | ||
| """External servers, ignored by git, for testing or demo purposes.""" No newline at end of file |
| missing = [ | ||
| name | ||
| for name, value in { | ||
| "api_key": api_key, | ||
| "model": model, | ||
| "base_url": base_url, | ||
| }.items() | ||
| if not value | ||
| ] | ||
| if missing: | ||
| raise ValueError(f"Missing required LLM config keys: {missing}") |
| kwargs: dict[str, Any] = { | ||
| "api_key": cfg.api_key, | ||
| "base_url": cfg.base_url, | ||
| "model": cfg.model, | ||
| "temperature": 0, | ||
| } |
…ct clutter when the input tabletop seems to be a complete one;
yuecideng
left a comment
There was a problem hiding this comment.
Review notes from the Unified scene agent pipeline pass. Copilot's existing inline comments still look valid, so I avoided duplicating most of those and focused these comments on additional runtime/design blockers.
| f"step end unified_scene status=ok output={unified_scene_path}" | ||
| ) | ||
| log.log_info("step start unified_scene_gen") | ||
| run_unified_scene_gen( |
There was a problem hiding this comment.
This returned state is ignored, and export_gym_config() runs unconditionally afterwards. Several generation nodes can return generation_status/table_fit_result as failed or skipped while still allowing control flow to continue, so the runner can fail late with missing GLBs/manifests or produce an invalid export. Please validate the returned unified-scene-gen state and only run gym export when the required table/object assets and table-fit manifest exist.
|
|
||
| def to_manifest(self) -> dict[str, Any]: | ||
| """Convert the spatial record to JSON-safe data.""" | ||
| return { |
There was a problem hiding this comment.
The unified scene manifest drops text table_constraints. Downstream generate_text_clutter_layout_node() reads spatial.table_constraints, but this serializer only emits anchor and relations, so explicit text prompts like an object being left/front/center on the table are lost before layout. Add table constraints to the unified schema/manifest, or make the text layout consume the per-object grid values created during unification.
|
|
||
| result_path = paths.resolve_scene_result(state["unified_scene_result_path"]) | ||
| update_unified_scene(unified_scene, table_result, object_results, output_root) | ||
| write_json(result_path, unified_scene) |
There was a problem hiding this comment.
update_unified_scene() and the step result are written even if table_result or any object_results have failed statuses. The node then reports generation_status: "ok", masking partial generation failures until later stages. Please aggregate table/object statuses here and return a failed generation status, or raise a structured error before mutating the unified scene as if generation succeeded.
| output_root=output_root, | ||
| output_dir=output_dir, | ||
| ) | ||
| WorkflowArtifactWriter(output_root, UNIFIED_SCENE_GEN_STEP).write_step_result( |
There was a problem hiding this comment.
fit_text_scene_table() can return {"status": "failed"} or {"status": "skipped"} without a manifest_path, but this node still writes it as the final table_fit_to_clutter result. gym_export() later assumes manifest_path exists. Please gate downstream export on an ok table-fit status or surface this as a pipeline failure with a clear message.
| raw_to_simready_glb_matrix = asset_result.transform_matrix | ||
| except Exception: | ||
| status_parts.append(f"simready_failed: {traceback.format_exc()}") | ||
| item_status = "ok" if not status_parts else "; ".join(status_parts) |
There was a problem hiding this comment.
When per-item transform/simready conversion fails, the item status records the failure but the outer status remains ok. That lets generate_image_assets_node() report success even though required meshes may be missing. Please propagate any per-table/per-object failed status into the workflow-level status before the runner proceeds.
| step_result = _read_json( | ||
| output_root / UNIFIED_SCENE_GEN_STEP / STEP_RESULT_FILENAME | ||
| ) | ||
| table_fit = step_result.get("table_fit_to_clutter") or {} |
There was a problem hiding this comment.
This assumes table_fit_to_clutter.manifest_path is present. If table fitting was skipped/failed, table_fit.get("manifest_path", "") resolves to output_root, and _read_json() raises a confusing directory/file error. Please validate the table-fit status and manifest path explicitly before reading, and return a pipeline-level error if export preconditions are not met.
| return client.session.post( | ||
| url, | ||
| data=request.to_form_data(), | ||
| files=[("image", (Path(request.image_path).name, _open_image_file(request.image_path)))] + mask_files, |
There was a problem hiding this comment.
The mask files are closed in finally, but the source image file opened by _open_image_file(request.image_path) is not tracked or closed. Use contextlib.ExitStack or open the image handle before the POST and close it alongside the mask handles.
| ) -> dict[str, object]: | ||
| """Filter all segment groups with VLM and return an updated state patch.""" | ||
| segment_groups = [] | ||
| attempt_count = state["attempt_count"] + 1 |
There was a problem hiding this comment.
The same attempt_count is shared across distinct VLM operations in the image-relations graph: initial segment filtering and spatial-layout extraction both increment it. Failures in the first phase reduce the retry budget for the second phase, which makes max_attempts harder to reason about and can prematurely stop spatial layout retries. Please split retry counters by stage or reset the counter before entering spatial layout.
Description
This PR adds a new
prompt2scenemodule underembodichain/gen_sim/that implements a text/image-to-3D-scene generation pipeline.The pipeline takes text descriptions or images as input and generates 3D simulation scenes, leveraging LLM-based workflows for scene understanding, geometry generation, and layout optimization.
Key components:
Type of change
Checklist
black .command to format the code base.🤖 Generated with Claude Code