Jobs refactor by sg-s · Pull Request #463 · deeporiginbio/do-dd-client

sg-s · 2026-03-05T21:24:32Z

changes

[ still in progress]

Copilot

Pull request overview

This PR implements a "jobs-centric" redesign of the drug discovery execution API. It introduces a composable class hierarchy (Execution base + QuoteMixin/SyncExecutableMixin/AsyncExecutableMixin mixins) and replaces the previous workflow-step pattern with standalone job classes (Docking, ABFE, PocketFinder). The old classes are preserved as docking_step.py/abfe_step.py for backward compatibility with the Complex class. The Results.get() API is also refactored to accept a raw filter_dict instead of named filter parameters.

Changes:

New standalone Docking, ABFE, and PocketFinder job classes with composable execution mixins (Execution, AsyncExecutableMixin, etc.)
Results.get() refactored to accept a filter_dict; a new _build_result_filter() helper handles construction of filter dicts
Old docking.py and abfe.py WorkflowStep classes renamed to docking_step.py and abfe_step.py

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
`src/drug_discovery/execution.py`	New base class with read-only id/estimate/cost properties and status transition logic
`src/drug_discovery/execution_mixins.py`	New mixins: QuoteMixin, SyncExecutableMixin, AsyncExecutableMixin, JupyterVizMixin
`src/drug_discovery/docking.py`	New standalone Docking class replacing old WorkflowStep-based one
`src/drug_discovery/abfe.py`	New standalone ABFE class replacing old WorkflowStep-based one
`src/drug_discovery/pocket_finder.py`	New standalone PocketFinder class
`src/drug_discovery/docking_step.py`	Old Docking WorkflowStep class (kept for Complex compatibility)
`src/drug_discovery/abfe_step.py`	Old ABFE WorkflowStep class (kept for Complex compatibility)
`src/drug_discovery/complex.py`	Updated imports to use renamed _step.py files
`src/platform/results.py`	Refactored `Results.get()` to accept `filter_dict`; added `_build_result_filter()` helper
`src/platform/constants.py`	Added `PlatformStatus` literal type and `ALLOWED_STATUS_TRANSITIONS` dict; bumped PocketFinder version
`src/drug_discovery/utils/__init__.py`	Simplified `_start_tool_run` signature (removed `output_dir_path`, `provider`)
`src/drug_discovery/__init__.py`	Exports new PocketFinder, Docking, ABFE, Execution, PlatformStatus
`src/jobs/__init__.py`	New backward-compatibility shim re-exporting job classes
`src/drug_discovery/constants.py`	Added PocketFinder to `valid_tools` and `tool_mapper`
`src/drug_discovery/structures/pocket.py`	Added `Pocket.from_id()` classmethod and `pocket_count/pocket_min_size` args to `from_result()`
`src/drug_discovery/structures/ligand.py`	Added `LigandSet.from_ids()` classmethod
`tests/test_jobs_base.py`	New tests for Execution base class, mixins, and PocketFinder
`tests/test_jobs_docking.py`	New tests for standalone Docking class
`tests/test_jobs_abfe.py`	New tests for standalone ABFE class
`tests/test_results.py`	Updated tests for refactored `Results.get()` API
`tests/test_functions_with_data_platform.py`	Updated call site to use new filter_dict API
`jobs-refactor-plan.md`	Design document describing the refactor

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot · 2026-03-06T18:20:44Z

src/jobs/__init__.py

+
+from deeporigin.drug_discovery.abfe import ABFE
+from deeporigin.drug_discovery.docking import Docking
+from deeporigin.drug_discovery.execution import Execution, PlatformStatus


PlatformStatus is imported from deeporigin.drug_discovery.execution in both src/jobs/__init__.py (line 10) and src/drug_discovery/__init__.py (line 66), but PlatformStatus is not defined or re-exported in src/drug_discovery/execution.py — it is only defined in src/platform/constants.py. This will raise an ImportError at runtime whenever PlatformStatus is accessed via either of these import paths. execution.py needs to either re-export PlatformStatus with from deeporigin.platform.constants import PlatformStatus, or both call sites need to import directly from deeporigin.platform.constants.

Suggested change

from deeporigin.drug_discovery.execution import Execution, PlatformStatus

from deeporigin.drug_discovery.execution import Execution

from deeporigin.platform.constants import PlatformStatus

Copilot · 2026-03-06T18:20:45Z

src/drug_discovery/abfe.py

+            for key, val in output_files.items():
+                if isinstance(val, dict) and "system_pdb_file_path" in val:
+                    pass
+


The elif isinstance(output_files, dict) branch at lines 147-150 is dead code — it enters the branch but only executes pass, discarding any dict-shaped output_files without extracting binding_xml or solvation_xml. This means if the API ever returns output_files as a dict (rather than a list), self.binding_xml_path and self.solvation_xml_path will remain None, causing quote() and start() to raise ValueError. Either extract the XML paths from the dict format or remove the branch entirely and document the expected format.

Suggested change

for key, val in output_files.items():

if isinstance(val, dict) and "system_pdb_file_path" in val:

pass

def _extract_xml_paths(obj) -> None:

nonlocal binding_xml, solvation_xml

if isinstance(obj, dict):

for v in obj.values():

_extract_xml_paths(v)

elif isinstance(obj, list):

for v in obj:

_extract_xml_paths(v)

elif isinstance(obj, str):

if obj.endswith("bsm_system.xml"):

binding_xml = obj

elif obj.endswith("solvation.xml"):

solvation_xml = obj

_extract_xml_paths(output_files)

Copilot · 2026-03-06T18:20:45Z

tests/test_jobs_docking.py

+def docking_with_smiles(protein, pocket, ligand_set, client):
+    """Create a Docking instance using a second LigandSet (was SMILES path)."""
+    return Docking(
+        protein=protein,
+        pocket=pocket,
+        ligands=ligand_set,


The docking_with_smiles fixture (lines 70-77) is identical to docking_with_ligands — both pass ligands=ligand_set and do not use the smiles_list argument. The associated test_with_smiles_list test (line 99) claims to test "Docking constructed with smiles_list converts to a LigandSet", but it never exercises the smiles_list code path. The fixture should instead pass smiles_list=["CCO", "CCCO"] (without ligand IDs pre-assigned, since smiles_list conversion happens internally) to correctly cover that constructor branch.

Suggested change

def docking_with_smiles(protein, pocket, ligand_set, client):

"""Create a Docking instance using a second LigandSet (was SMILES path)."""

return Docking(

protein=protein,

pocket=pocket,

ligands=ligand_set,

def docking_with_smiles(protein, pocket, client):

"""Create a Docking instance using a SMILES list."""

return Docking(

protein=protein,

pocket=pocket,

smiles_list=["CCO", "CCCO"],

jobs-refactor-plan.md

Copilot · 2026-03-06T18:20:46Z

src/drug_discovery/docking.py

+    def get_results(self) -> LigandSet | None:
+        """Retrieve docking results from the platform.

-        metadata = {
-            "protein_file": protein_basename,
-            "protein_hash": self.parent.protein.to_hash(),
-        }
+        Downloads result SDF files and returns a ``LigandSet``.

-        if batch_size is None and n_workers is None:
-            raise DeepOriginException(
-                "Either batch_size or n_workers must be specified."
-            ) from None
-        elif batch_size is not None and n_workers is not None:
-            print(
-                "Both batch_size and n_workers are specified. Using n_workers to determine batch_size..."
-            )
-
-        if n_workers is not None:
-            batch_size = math.ceil(len(self.parent.ligands) / n_workers)
-            print(f"Using a batch size of {batch_size}")
-
-        if pocket is not None:
-            box_size = float(2 * np.cbrt(pocket.volume))
-            box_size = [box_size, box_size, box_size]
-            pocket_center = pocket.get_center().tolist()
-
-        # Get all ligands and filter out already docked ones
-        all_ligands = list(self.parent.ligands)
-
-        df = self.get_jobs_df(pocket_center=pocket_center, box_size=box_size)
-
-        already_docked_smiles = set()
-
-        if not re_run:
-            for _, row in df.iterrows():
-                user_inputs = row.get("user_inputs", {})
-                # Handle both old format (smiles_list) and new format (ligands array)
-                if "smiles_list" in user_inputs:
-                    already_docked_smiles.update(user_inputs["smiles_list"])
-                elif "ligands" in user_inputs:
-                    already_docked_smiles.update(
-                        ligand.get("smiles", "") for ligand in user_inputs["ligands"]
-                    )
-
-        # Filter ligands to only include those not already docked
-        ligands_to_dock = [
-            ligand
-            for ligand in all_ligands
-            if ligand.smiles not in already_docked_smiles
-        ]
+        Returns:
+            A ``LigandSet`` of docked poses, or ``None`` if no results yet.
+        """
+        response = self.client.results.get_poses(protein_id=self.protein.id)
+        records = response.get("data", [])
+        remote_paths: list[str] = []
+        for record in records:
+            data = record.get("data", {})
+            fp = data.get("file_path")
+            if fp:
+                remote_paths.append(fp)

-        # Sort ligands by SMILES for consistent batching
-        ligands_to_dock = sorted(ligands_to_dock, key=lambda ligand: ligand.smiles)
-
-        job_ids = []
-
-        chunks = list(more_itertools.chunked(ligands_to_dock, batch_size))
-
-        def process_chunk(chunk):
-            params = {
-                "box_size": list(box_size),
-                "pocket_center": list(pocket_center),
-                "protein": {
-                    "file_path": self.parent.protein._remote_path,
-                },
-                "ligands": [
-                    {
-                        "smiles": ligand.smiles,
-                    }
-                    for ligand in chunk
-                ],
-            }
+        if not remote_paths:
+            return None

-            # Include protein ID if available
-            if self.parent.protein.id is not None:
-                params["protein"]["id"] = self.parent.protein.id
-
-            # Include ligand IDs if available
-            for i, ligand in enumerate(chunk):
-                if ligand.id is not None:
-                    params["ligands"][i]["id"] = ligand.id
-
-            # Include pocket_id if pocket is provided and has an ID
-            if pocket is not None and hasattr(pocket, "id") and pocket.id is not None:
-                params["pocket_id"] = pocket.id
-
-            execution_dto = utils._start_tool_run(
-                params=params,
-                metadata=metadata,
-                tool="Docking",
-                tool_version=self.tool_version,
-                client=self.parent.client,
-                output_dir_path=output_dir_path,
-                approve_amount=approve_amount,
-            )
-            return execution_dto
-
-        if len(ligands_to_dock) > 0:
-            if use_parallel:
-                with concurrent.futures.ThreadPoolExecutor() as executor:
-                    # Submit all chunks to the executor
-                    future_to_chunk = {
-                        executor.submit(process_chunk, chunk): chunk for chunk in chunks
-                    }
-
-                    # Process results with progress bar
-                    for future in tqdm(
-                        concurrent.futures.as_completed(future_to_chunk),
-                        total=len(chunks),
-                        desc="Starting docking jobs",
-                    ):
-                        execution_dto = future.result()
-                        if execution_dto is not None:
-                            job_ids.append(execution_dto)
-            else:
-                for chunk in tqdm(
-                    chunks, total=len(chunks), desc="Starting docking jobs"
-                ):
-                    execution_dto = process_chunk(chunk)
-                    if execution_dto is not None:
-                        job_ids.append(execution_dto)
-        else:
-            raise DeepOriginException(
-                title="Cannot run Docking: no new ligands to dock",
-                message="No new ligands to dock. All ligands have already been docked.",
-                fix=" If you want to re-run the docking, set <code>re_run=True</code>.",
-                level="warning",
-            ) from None
-
-        jobs = [
-            Job.from_dto(execution_dto, client=self.parent.client)
-            for execution_dto in job_ids
-        ]
+        local_paths = self.client.files.download_files(files=remote_paths, lazy=True)

-        self.jobs = JobList(jobs)
+        result = LigandSet()
+        for path in local_paths:
+            result.ligands += LigandSet.from_sdf(path).ligands

-        return self.jobs
+        return result


According to the design specification in jobs-refactor-plan.md (line 400), calling get_results() before start() (when there is no execution ID) should "raise an error". However, the current implementation of get_results() in Docking does not check whether self.id is None before querying the results API — it simply queries by protein_id and may return results from any previous docking run for that protein, not necessarily the one started by this instance. Compare this with ABFE.get_results(), which correctly raises ValueError("No execution has been started...") when self.id is None. Consider adding a similar guard here.

Copilot · 2026-03-06T18:20:46Z

src/drug_discovery/docking_step.py

+                tool="Docking",
+                tool_version=self.tool_version,
+                client=self.parent.client,
+                output_dir_path=output_dir_path,


The _start_tool_run function signature was refactored to remove the output_dir_path and provider parameters. However, both docking_step.py (line 465) and abfe_step.py (line 388) still pass output_dir_path=output_dir_path as a keyword argument to _start_tool_run. Since @beartype is applied to _start_tool_run and the parameter no longer exists in the signature, calling Complex.docking.run(...) or Complex.abfe.run(...) will raise a TypeError at runtime, breaking the existing Complex-based workflow.

These callers need to be updated: instead of passing output_dir_path to _start_tool_run, they should either pass the outputs as a pre-built outputs dict or remove the parameter entirely (since the new _start_tool_run now accepts an outputs dict directly).

Suggested change

output_dir_path=output_dir_path,

Copilot · 2026-03-06T18:20:46Z

src/drug_discovery/abfe_step.py

+            execution_dto = utils._start_tool_run(
+                metadata=metadata,
+                params=params,
+                tool="ABFE",
+                tool_version=self.tool_version,
+                client=self.parent.client,
+                output_dir_path=output_dir_path,


Same issue as in docking_step.py: abfe_step.py still passes output_dir_path=output_dir_path (line 388) to _start_tool_run, which no longer accepts that parameter. This will raise a TypeError at runtime when calling Complex.abfe.run(...). The call needs to be updated to construct and pass an outputs dict directly instead.

Suggested change

execution_dto = utils._start_tool_run(

metadata=metadata,

params=params,

tool="ABFE",

tool_version=self.tool_version,

client=self.parent.client,

output_dir_path=output_dir_path,

outputs = {"output_dir_path": output_dir_path}

execution_dto = utils._start_tool_run(

metadata=metadata,

params=params,

tool="ABFE",

tool_version=self.tool_version,

client=self.parent.client,

outputs=outputs,

Copilot · 2026-03-06T18:20:46Z

src/drug_discovery/docking.py

+
+        params = {
+            "box_size": box_size_list,
+            "pocket_center": pocket_center,
+            "protein": {
+                "id": self.protein.id,
+                "file_path": self.protein._remote_path,
+            },
+            "ligands": [
                {
-                    "id": record.get("id"),
-                    "tool_id": record.get("tool_id"),
-                    "tool_version": record.get("tool_version"),
-                    "ligand_id": data.get("ligand_id"),
-                    "protein_id": data.get("protein_id"),
-                    "binding_energy": data.get("binding_energy"),
-                    "pose_score": data.get("pose_score"),
-                    "file_path": data.get("file_path"),
+                    "id": lig.id,
+                    "smiles": lig.smiles,
                }
-            )
+                for lig in ligands
+            ],
+        }

-        return pd.DataFrame(rows)
+        if self.protein.id is not None:
+            params["protein"]["id"] = self.protein.id

-    @beartype
-    def get_jobs_df(
-        self,
-        *,
-        pocket_center: Optional[tuple[Number, Number, Number] | list[Number]] = None,
-        box_size: Optional[tuple[Number, Number, Number] | list[Number]] = None,
-    ):
-        """search for all jobs that match this protein and ligands in the Job DB, and return a dataframe of the results
+        for i, lig in enumerate(ligands):
+            if lig.id is not None:
+                params["ligands"][i]["id"] = lig.id

-        Args:
-            pocket_center: Optional tuple of (x, y, z) coordinates to filter by pocket center
-            box_size: Optional tuple of (x, y, z) dimensions to filter by box size
-        """
-        # Use parent method with docking-specific parameters
-        df = super().get_jobs_df(
-            include_metadata=True,
-            include_inputs=True,
-            include_outputs=True,
-            status=list(NON_FAILED_STATES),
-        )
+        if hasattr(self.pocket, "id") and self.pocket.id is not None:
+            params["pocket_id"] = self.pocket.id


The _build_tool_inputs method contains redundant code. The protein ID is set twice: first in the params dict literal at line 291 ("id": self.protein.id), then again conditionally at lines 303-304. Similarly, ligand IDs are already included in the list comprehension at lines 294-299 ("id": lig.id), but then set again in the loop at lines 306-308. The constructor already enforces that protein, pocket, and all ligands have IDs, so the conditional checks at lines 303-304 and the re-assignment loop at lines 306-308 are unreachable dead code. These redundant sections should be removed.

Copilot

Pull request overview

Copilot reviewed 29 out of 30 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot · 2026-03-07T21:03:08Z

src/drug_discovery/docking.py

+        if self.protein.id is not None:
+            params["protein"]["id"] = self.protein.id

-        if len(df) == 0:
-            return df
+        for i, lig in enumerate(ligands):
+            if lig.id is not None:
+                params["ligands"][i]["id"] = lig.id


The _build_tool_inputs method contains redundant code that sets the same fields twice. protein.id is set in the initial params dict at line 291, then conditionally re-set at lines 303-304. Ligand IDs are set in the list comprehension at lines 294-299, then conditionally re-set in the loop at lines 306-308. Since __init__ already validates that protein.id, all ligand IDs, and pocket.id are not None, these duplicate assignments are always executed and the conditionals are always true. The duplicate assignments (lines 303-308) should be removed to avoid confusion.

Copilot · 2026-03-07T21:03:09Z

src/platform/entities.py

+        while True:
+            response = self.search(
+                "ligands",
+                cursor=cursor,
+                filter_dict=filter_dict,
+                limit=limit,
+                offset=offset,
+                select=select,
+                sort=sort,
+            )
+            all_data.extend(response.get("data", []))
+
+            cursor = response.get("meta", {}).get("nextCursor")
+            if not cursor:
+                break


In the search_ligands pagination loop, the offset parameter is passed on every iteration including subsequent pages. When cursor-based pagination is active, using offset on subsequent page requests (where a cursor is already set) may cause incorrect pagination behavior — double-skipping records. The offset should only be applied on the first request and omitted once a cursor is available.

Copilot · 2026-03-07T21:03:09Z

src/drug_discovery/abfe.py

+ABFE_TOOL_KEY = "deeporigin.abfe-end-to-end"
+ABFE_TOOL_VERSION = "0.2.19"


ABFE_TOOL_KEY and ABFE_TOOL_VERSION are defined as local constants in abfe.py rather than being imported from platform/constants.py like DOCKING_TOOL_KEY and DOCKING_TOOL_VERSION are in docking.py. The same ABFE tool key already exists in drug_discovery/constants.py (tool_mapper["ABFE"]). This duplication risks divergence if the key or version is ever updated in one place but not the other.

Copilot

Pull request overview

Copilot reviewed 29 out of 30 changed files in this pull request and generated 8 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot · 2026-03-09T00:22:12Z

src/drug_discovery/docking.py

-            for ligand in all_ligands
-            if ligand.smiles not in already_docked_smiles
-        ]
+        response = self.client.results.get_poses(protein_id=self.protein.id)


In Docking.get_results, the method queries get_poses(protein_id=self.protein.id) which returns ALL pose records for that protein — not just those belonging to this specific execution (identified by self.id). If multiple docking runs have been performed on the same protein, this method will return results from all of them, not just from the current execution. The filter should also include compute_job_id=self.id to scope results to this specific execution.

Suggested change

response = self.client.results.get_poses(protein_id=self.protein.id)

response = self.client.results.get_poses(

protein_id=self.protein.id,

compute_job_id=self.id,

)

Copilot · 2026-03-09T00:22:13Z

src/drug_discovery/execution_mixins.py

+        instance = object.__new__(cls)
+
+        instance.client = client
+        instance._id = dto["executionId"]
+        instance._estimate = None
+        instance._cost = None
+        instance.tool_key = tool_info["key"]
+        instance.tool_version = tool_info["version"]
+
+        instance.status = dto.get("status")
+        instance.progress = dto.get("progressReport")
+        instance._execution_dto = dto


In AsyncExecutableMixin.from_id, the method uses object.__new__(cls) which bypasses __init__, so Execution.__init__ is never called. This means _id, _estimate, and _cost are set manually on instance (lines 160-162), but the mixin code itself sets instance._estimate = None and instance._cost = None rather than via the property-backed attributes. If a future subclass adds additional state initialized in __init__, from_id would silently skip that initialization. A comment documenting this responsibility or a pattern for subclasses to hook into would improve maintainability.

Copilot · 2026-03-09T00:22:13Z

src/drug_discovery/docking.py

+                inputs["protein"]["id"],
+                client=instance.client,
+            )
+            fut_ligands = executor.submit(
+                LigandSet.from_ids,
+                [lig["id"] for lig in inputs["ligands"]],


In Docking.from_id, the code accesses inputs["protein"]["id"] and inputs["ligands"] directly (without .get()), which will raise a KeyError with an unhelpful message if these fields are absent (e.g., for executions created by an older version of the code that used a different input schema). Consider using .get() with appropriate error messages, e.g., inputs.get("protein", {}).get("id") and inputs.get("ligands", []), similar to how pocket_id is handled.

Copilot · 2026-03-09T00:22:13Z

tests/test_jobs_base.py

+    def test_terminal_state_blocks_further_transitions(self):
+        """Once in Succeeded, no further transitions are allowed."""
+        job = AsyncJob("x")
+        job._set_status("Created")
+        job._set_status("Running")
+        job._set_status("Succeeded")
+        with pytest.raises(ValueError, match="Invalid status transition"):
+            job._set_status("Running")


In test_terminal_state_blocks_further_transitions, the test transitions from Created directly to Running (skipping Queued) at line 153. According to ALLOWED_STATUS_TRANSITIONS, "Created" allows transitioning to {"Queued", "Running", "Failed", "Cancelled"}, so Created → Running is valid. This is a correct test. However, the test docstring says "None -> Created -> Queued -> Running -> Succeeded" but the actual path is None -> Created -> Running -> Succeeded (skipping Queued). The docstring is misleading.

Copilot · 2026-03-09T00:22:14Z

src/drug_discovery/abfe_step.py

+    Objects instantiated here are meant to be used within the Complex class."""
+
+    """tool version to use for ABFE"""
+    tool_version = "0.2.19"


The abfe_step.py file uses the hardcoded string "0.2.19" for tool_version (line 29) instead of using the newly added ABFE_TOOL_VERSION constant from src/platform/constants.py. This creates a maintenance risk as these two values could diverge. The new standalone abfe.py correctly uses ABFE_TOOL_VERSION, so abfe_step.py should be updated to match.

Copilot · 2026-03-09T00:22:14Z

src/drug_discovery/structures/pocket.py

+        pocket_data["id"] = record["id"]
+        remote_path = pocket_data["file_path"]
+        pocket_data["file_path"] = client.files.download_file(
+            remote_path=remote_path,


In Pocket.from_id, the file is downloaded without lazy=True (line 383-385), whereas other download calls in the same module (e.g., _from_platform_record in ligand.py) use lazy=True. This inconsistency means the pocket file is eagerly downloaded even if it might not be immediately needed. Consider passing lazy=True here to be consistent with the rest of the codebase.

Suggested change

remote_path=remote_path,

remote_path=remote_path,

lazy=True,

Copilot · 2026-03-09T00:22:14Z

src/platform/results.py

+    if compute_job_id is not None:
+        filter_dict["compute_job_id"] = compute_job_id


In _build_result_filter, the compute_job_id is added to the filter dict without an operator wrapper (filter_dict["compute_job_id"] = compute_job_id), while every other field uses {"eq": value}. In the mock server's _apply_search_filters, the else branch (results = [r for r in results if r.get(key) == value]) would handle the raw string comparison correctly; however, the real platform API may expect the same {"eq": ...} operator structure for consistency. The inconsistency between compute_job_id and all other fields (which use {"eq": ...}) is a potential source of bugs against the real API.

Copilot · 2026-03-09T00:22:15Z

src/drug_discovery/docking.py

+        for ligand in ligands:
+            if ligand.id is None:
+                raise ValueError("Ligands must have an ID.")



In Docking.__init__, there is a check that requires all ligands to have an ID (if ligand.id is None: raise ValueError("Ligands must have an ID.")). However, the run() sync path uses the functions API and likely works fine with ligands that have no platform ID. This overly strict constraint prevents users from using the sync run() path for ligands that haven't been synced to the platform yet, which breaks the intended workflow described in the docstring ("Sync path: uses the functions API for small ligand sets").

Suggested change

for ligand in ligands:

if ligand.id is None:

raise ValueError("Ligands must have an ID.")

zori-deep-origin · 2026-03-09T09:20:55Z

src/drug_discovery/execution_mixins.py

+                "Cannot cancel: no execution has been started (id is None)."
+            )
+
+        cancellable = {"Created", "Queued", "Running"}


Is quoted non cancelable state?

the only transition from quoted is to "running" (well technically, it can go to queued...)

src/drug_discovery/pocket_finder.py

Copilot

Pull request overview

Copilot reviewed 29 out of 30 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot · 2026-03-09T14:01:32Z

src/drug_discovery/execution_mixins.py

+    @classmethod
+    def from_id(cls, id: str, *, client: DeepOriginClient | None = None) -> Self:
+        """Construct an instance from an existing platform execution ID.
+
+        Creates a bare instance via ``object.__new__`` (bypassing
+        ``__init__``), fetches the execution DTO, and populates the
+        common execution fields.  Subclasses should call
+        ``super().from_id()`` then rehydrate domain-specific fields
+        from ``instance._execution_dto["userInputs"]``.
+
+        Args:
+            id: Platform execution ID.
+            client: Optional API client. Uses the default if not provided.
+
+        Returns:
+            A partially-hydrated instance with common fields populated.
+        """
+        if client is None:
+            client = DeepOriginClient.get()
+
+        dto = client.executions.get_execution(execution_id=id)
+        tool_info = dto["tool"]
+
+        instance = object.__new__(cls)
+
+        instance.client = client
+        instance._id = dto["executionId"]
+        instance._estimate = None
+        instance._cost = None
+        instance.tool_key = tool_info["key"]
+        instance.tool_version = tool_info["version"]
+
+        instance.status = dto.get("status")
+        instance.progress = dto.get("progressReport")
+        instance._execution_dto = dto
+
+        quotation = dto.get("quotationResult", {})
+        successful = quotation.get("successfulQuotations", [])
+        if successful:
+            price = successful[0].get("priceTotal")
+            if price is not None:
+                instance._estimate = float(price)
+            if instance.status == "Succeeded" and price is not None:
+                instance._cost = float(price)
+
+        return instance


In AsyncExecutableMixin.from_id, object.__new__(cls) is used to bypass __init__, but the resulting instance is missing the _execution_dto attribute initialization that normally happens in AsyncExecutableMixin.__init__ (line 72). The code then sets instance._execution_dto = dto on line 168, so it does get set — but the _estimate and _cost attributes that come from Execution.__init__ (lines 41-42) are set manually here on lines 161-162. However, Execution.client is set on line 159, which is correct. The concern is that any subclass __init__ state beyond what's explicitly set here (e.g., _protein, _pocket, _ligands for Docking) is entirely absent from the bare instance, and the code relies on subclasses to set these in overrides of from_id. This pattern is fragile: if a new subclass overrides from_id incorrectly or forgets a field, the instance will silently have missing attributes without an AttributeError at construction time.

Copilot · 2026-03-09T14:01:33Z

src/drug_discovery/docking.py

+        pocket_id = inputs.get("pocket_id")
+        if pocket_id is None:
+            raise ValueError(
+                "Missing 'pocket_id' in execution userInputs; "
+                "this execution may have been created without an associated pocket."
+            )


In Docking.from_id, the userInputs of the execution DTO from the tools API (via client.executions.get_execution) has the structure {"box_size": ..., "ligands": [...], "pocket_id": ..., "protein": {...}} as set by the mock server ("userInputs": body.get("inputs", {})). The code correctly accesses inputs["protein"]["id"], inputs["ligands"], and inputs.get("pocket_id"). However, if a docking job was submitted with the old docking_step.py format (which uses smiles_list and doesn't include pocket_id in userInputs), this code will raise a ValueError on line 331 saying pocket_id is missing. The error message on lines 332-334 acknowledges this case, which is good, but users with old async docking jobs can never reconstruct them via from_id. Consider documenting this limitation explicitly in the docstring.

Copilot · 2026-03-09T14:01:33Z

tests/test_jobs_base.py

+    def test_cancel_wrong_state_raises(self):
+        """cancel() from a non-cancellable state raises ValueError."""
+        job = AsyncJob("x")
+        job._id = "some-id"
+        job.status = "Succeeded"
+        with pytest.raises(ValueError, match="Cannot cancel"):
+            job.cancel()


The test_cancel_wrong_state_raises test (line 224) directly sets job.status = "Succeeded" to simulate a terminal state for testing cancel(). However, if status were enforced as read-only in the future (or via the _set_status guard), this direct assignment would silently bypass validation. This is fine as a test implementation detail now, but it's inconsistent with how the production code uses _set_status. Using job._set_status("Created"); job._set_status("Running"); job._set_status("Succeeded") would be more robust and consistent with the other status transition tests in the same test class.

Copilot · 2026-03-09T14:01:34Z

src/platform/entities.py

+        all_data: list[dict[str, Any]] = []
+        cursor: str | None = None
+        is_first_page = True
+
+        while True:
+            response = self.search(
+                "ligands",
+                cursor=cursor,
+                filter_dict=filter_dict,
+                limit=limit,
+                offset=offset if is_first_page else None,
+                select=select,
+                sort=sort,
+            )
+            all_data.extend(response.get("data", []))
+            is_first_page = False
+
+            cursor = response.get("meta", {}).get("nextCursor")
+            if not cursor:
+                break
+
+        response["data"] = all_data
+        return response


In search_ligands, if the loop executes at least one iteration, response holds the last page response and response["data"] is then replaced with all_data (line 272). However, if no pages are returned (empty result, meaning response.get("data", []) is empty on the first call and nextCursor is absent), response still holds the last response from self.search(...), and response["data"] is set to all_data (an empty list). This works correctly. But response is referenced on line 272 outside the while loop body, meaning it's set from the last iteration — if the while True body never executes, Python would raise UnboundLocalError. Practically the loop always executes at least once (it's while True), so this is not a runtime risk, but it would be clearer to initialize response = {} before the loop.

Copilot · 2026-03-09T14:01:34Z

src/drug_discovery/abfe.py

+    def prepare(
        self,
        *,
-        ligands: list[Ligand] | LigandSet,
-        re_run: bool,
-    ) -> list[Ligand]:
-        """Helper method to determine which ligands need to be run based on already run jobs and re_run flag."""
+        padding: float = 1.0,
+        retain_waters: bool = False,
+        add_H_atoms: bool = False,
+        protonate_protein: bool = False,
+    ) -> Protein:
+        """Run system preparation for the protein-ligand pair.

-        if isinstance(ligands, LigandSet):
-            ligands = ligands.ligands
+        This step produces the solvation and binding XML files required
+        by ``start()``. The resulting prepared system PDB is returned as
+        a ``Protein`` object and can be visualised.

-        if re_run:
-            # we're re-running, so we need to re-run all ligands
-            return ligands
-
-        jobs = JobList.list(
-            client=self.parent.client,
-        )
-        df = jobs.filter(
-            tool_key=tool_mapper["ABFE"],
-            status=["Succeeded", "Running", "Queued", "Created"],
-            require_metadata=True,
-        ).to_dataframe(
-            include_metadata=True,
-            resolve_user_names=False,
-            client=self.parent.client,
-        )
-
-        # Build set of ligand names that have already been run
-        if len(df) > 0:
-            ligand_hashes_already_run = {
-                ligand_hash
-                for ligand_hash in df["metadata"].apply(
-                    lambda d: d.get("ligand_hash") if isinstance(d, dict) else None
-                )
-                if isinstance(ligand_hash, str) and ligand_hash
-            }
-        else:
-            ligand_hashes_already_run = set()
-
-        # no re-run, remove already run ligands
-        ligands_to_run = [
-            ligand
-            for ligand in ligands
-            if ligand.to_hash() not in ligand_hashes_already_run
-        ]
-        return ligands_to_run
-
-    @beartype
-    def check_dt(self):
-        """Validate that every "dt" in params is numeric and within allowed bounds.
+        Args:
+            padding: Padding around the system box in nm. Defaults to 1.0.
+            retain_waters: Keep water molecules from the crystal structure.
+            add_H_atoms: Add hydrogen atoms to the ligand.
+            protonate_protein: Protonate the protein.

-        Traverses the nested parameters dictionary and validates that each
-        occurrence of a key named "dt" has a numeric value within the
-        inclusive range [min_dt, max_dt]. If any non-numeric or out-of-range
-        values are found, an error is raised listing all offending paths.
+        Returns:
+            A ``Protein`` built from the prepared system PDB.

        Raises:
-            DeepOriginException: If any "dt" value is non-numeric or outside
-                the allowed range.
+            ValueError: If the ligand is charged (ABFE does not support charged ligands).
        """
-        min_dt = 0.001
-        max_dt = 0.004
-
-        def is_number(value) -> bool:
-            return isinstance(value, (int, float))
-
-        def find_dt_violations(obj, path: list[str]) -> list[str]:
-            """Return list of JSON-like paths with invalid dt values."""
-            violations: list[str] = []
-            if isinstance(obj, dict):
-                for key, value in obj.items():
-                    next_path = path + [str(key)]
-                    if key == "dt" and (
-                        not is_number(value) or not (min_dt <= float(value) <= max_dt)
-                    ):
-                        violations.append(".".join(next_path))
-                    # Recurse into nested structures
-                    if isinstance(value, (dict, list, tuple)):
-                        violations.extend(find_dt_violations(value, next_path))
-            elif isinstance(obj, (list, tuple)):
-                for idx, value in enumerate(obj):
-                    next_path = path + [str(idx)]
-                    if isinstance(value, (dict, list, tuple)):
-                        violations.extend(find_dt_violations(value, next_path))
-            return violations
-
-        violations = find_dt_violations(self._params, ["_params"])
-        if violations:
-            paths = ", ".join(violations)
-            raise DeepOriginException(
-                f"Found invalid dt values; must be numeric and within range [{min_dt}, {max_dt}]. Offending paths: {paths}"
-            ) from None
+        raise NotImplementedError("ABFE preparation is not supported yet.")


In ABFE.prepare(), the method declares parameters (padding, retain_waters, add_H_atoms, protonate_protein) and documents that it returns a Protein, but the body unconditionally raises NotImplementedError. The declared return type -> Protein is misleading since the function never returns. It should either be -> Protein with a raise NotImplementedError that is reflected in the docstring as the sole behavior (removing the misleading parameter docs), or have its return type annotated as NoReturn.

sg-s added 2 commits March 5, 2026 14:22

feat: initial work on new jobs classes

107e26e

feat: somewhat working pf

c386853

sg-s requested a review from a team as a code owner March 5, 2026 21:24

sg-s had a problem deploying to prod March 5, 2026 21:24 — with GitHub Actions Failure

sg-s had a problem deploying to staging March 5, 2026 21:24 — with GitHub Actions Failure

sg-s temporarily deployed to dev March 5, 2026 21:24 — with GitHub Actions Inactive

feat: moved classes to dd

500e469

sg-s temporarily deployed to dev March 6, 2026 13:14 — with GitHub Actions Inactive

sg-s had a problem deploying to staging March 6, 2026 13:14 — with GitHub Actions Failure

sg-s had a problem deploying to prod March 6, 2026 13:14 — with GitHub Actions Failure

feat: better object rehydration

1b71317

sg-s had a problem deploying to prod March 6, 2026 17:50 — with GitHub Actions Failure

sg-s had a problem deploying to staging March 6, 2026 17:50 — with GitHub Actions Failure

sg-s temporarily deployed to dev March 6, 2026 17:50 — with GitHub Actions Inactive

sg-s requested a review from Copilot March 6, 2026 18:13

Merge branch 'main' into jobs-refactor

e97aed7

sg-s had a problem deploying to staging March 6, 2026 18:14 — with GitHub Actions Failure

sg-s had a problem deploying to prod March 6, 2026 18:14 — with GitHub Actions Failure

sg-s temporarily deployed to dev March 6, 2026 18:14 — with GitHub Actions Inactive

Copilot started reviewing on behalf of sg-s March 6, 2026 18:14 View session

Copilot AI reviewed Mar 6, 2026

View reviewed changes

fix: tests pass on local

08a337b

sg-s had a problem deploying to prod March 7, 2026 01:13 — with GitHub Actions Failure

sg-s temporarily deployed to dev March 7, 2026 01:13 — with GitHub Actions Inactive

sg-s had a problem deploying to staging March 7, 2026 01:13 — with GitHub Actions Failure

feat: O(1) ligand lookup

2183d57

sg-s had a problem deploying to docs March 7, 2026 01:21 — with GitHub Actions Failure

sg-s had a problem deploying to prod March 7, 2026 01:21 — with GitHub Actions Failure

sg-s had a problem deploying to staging March 7, 2026 01:21 — with GitHub Actions Failure

sg-s temporarily deployed to dev March 7, 2026 01:21 — with GitHub Actions Inactive

sg-s had a problem deploying to prod March 7, 2026 20:48 — with GitHub Actions Failure

sg-s had a problem deploying to staging March 7, 2026 20:48 — with GitHub Actions Failure

sg-s temporarily deployed to dev March 7, 2026 20:48 — with GitHub Actions Inactive

fix: fixed all stale fixtures

f63a08c

sg-s had a problem deploying to docs March 7, 2026 20:51 — with GitHub Actions Failure

sg-s had a problem deploying to prod March 7, 2026 20:51 — with GitHub Actions Failure

sg-s had a problem deploying to staging March 7, 2026 20:51 — with GitHub Actions Failure

sg-s temporarily deployed to dev March 7, 2026 20:51 — with GitHub Actions Inactive

sg-s requested a review from Copilot March 7, 2026 20:57

Copilot started reviewing on behalf of sg-s March 7, 2026 20:57 View session

fix: removed a hard-coded version

1529246

sg-s had a problem deploying to staging March 7, 2026 21:00 — with GitHub Actions Failure

sg-s temporarily deployed to dev March 7, 2026 21:00 — with GitHub Actions Inactive

sg-s had a problem deploying to docs March 7, 2026 21:00 — with GitHub Actions Failure

sg-s had a problem deploying to prod March 7, 2026 21:00 — with GitHub Actions Failure

Copilot AI reviewed Mar 7, 2026

View reviewed changes

fix: review comments

8b2ee70

sg-s requested a review from Copilot March 9, 2026 00:17

sg-s had a problem deploying to staging March 9, 2026 00:17 — with GitHub Actions Failure

sg-s temporarily deployed to dev March 9, 2026 00:17 — with GitHub Actions Inactive

sg-s had a problem deploying to prod March 9, 2026 00:17 — with GitHub Actions Failure

sg-s had a problem deploying to docs March 9, 2026 00:17 — with GitHub Actions Failure

Copilot started reviewing on behalf of sg-s March 9, 2026 00:17 View session

Copilot AI reviewed Mar 9, 2026

View reviewed changes

zori-deep-origin reviewed Mar 9, 2026

View reviewed changes

sg-s requested a review from Copilot March 9, 2026 13:54

Copilot started reviewing on behalf of sg-s March 9, 2026 13:54 View session

Copilot AI reviewed Mar 9, 2026

View reviewed changes

fix: adress comments

a7ef5f8

sg-s had a problem deploying to docs March 9, 2026 14:18 — with GitHub Actions Failure

	from deeporigin.drug_discovery.execution import Execution, PlatformStatus
	from deeporigin.drug_discovery.execution import Execution
	from deeporigin.platform.constants import PlatformStatus

-            for key, val in output_files.items():
-                if isinstance(val, dict) and "system_pdb_file_path" in val:
-                    pass
+            def _extract_xml_paths(obj) -> None:
+                nonlocal binding_xml, solvation_xml
+                if isinstance(obj, dict):
+                    for v in obj.values():
+                        _extract_xml_paths(v)
+                elif isinstance(obj, list):
+                    for v in obj:
+                        _extract_xml_paths(v)
+                elif isinstance(obj, str):
+                    if obj.endswith("bsm_system.xml"):
+                        binding_xml = obj
+                    elif obj.endswith("solvation.xml"):
+                        solvation_xml = obj
+            _extract_xml_paths(output_files)

		ABFE_TOOL_KEY = "deeporigin.abfe-end-to-end"
		ABFE_TOOL_VERSION = "0.2.19"

		if compute_job_id is not None:
		filter_dict["compute_job_id"] = compute_job_id

	for ligand in ligands:
	if ligand.id is None:
	raise ValueError("Ligands must have an ID.")

Conversation

sg-s commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

zori-deep-origin Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

sg-s Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

sg-s Mar 9, 2026

sg-s commented Mar 5, 2026 •

edited

Loading