-
Notifications
You must be signed in to change notification settings - Fork 248
feat(pathfinder): add CTK root canary probe for non-standard-path libs #1595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
cpcloud
wants to merge
11
commits into
NVIDIA:main
Choose a base branch
from
cpcloud:nvvm-discovery
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
8ab72a9
feat(pathfinder): add CTK root canary probe for non-standard-path libs
cpcloud 30bc53d
style(pathfinder): update copyright header date in test file
cpcloud 718cfa7
refactor(pathfinder): use pytest-mock instead of unittest.mock in tests
cpcloud b2b7817
chore: fix typing
cpcloud 52a9cca
fix(pathfinder): make CTK root discovery tests platform-aware
cpcloud ce7da17
chore: style
cpcloud 038a4cb
fix(pathfinder): normalize paths from _find_lib_dir_using_anchor_point
cpcloud 65997b9
refactor(pathfinder): isolate CTK canary probe in subprocess
cpcloud a1d2ebc
fix(pathfinder): satisfy pre-commit typing for canary probe
cpcloud a4ea4b7
fix(pathfinder): use spawn isolation for CTK canary probing
cpcloud fbdfc5a
fix(pathfinder): satisfy pre-commit for spawned runner utilities
cpcloud File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
40 changes: 40 additions & 0 deletions
40
cuda_pathfinder/cuda/pathfinder/_dynamic_libs/canary_probe_subprocess.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,40 @@ | ||
| #!/usr/bin/env python | ||
| # SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| import json | ||
| import sys | ||
|
|
||
| from cuda.pathfinder._dynamic_libs.load_dl_common import LoadedDL | ||
| from cuda.pathfinder._utils.platform_aware import IS_WINDOWS | ||
|
|
||
| if IS_WINDOWS: | ||
| from cuda.pathfinder._dynamic_libs.load_dl_windows import load_with_system_search | ||
| else: | ||
| from cuda.pathfinder._dynamic_libs.load_dl_linux import load_with_system_search | ||
|
|
||
|
|
||
| def _probe_canary_abs_path(libname: str) -> str | None: | ||
| loaded: LoadedDL | None = load_with_system_search(libname) | ||
| if loaded is None: | ||
| return None | ||
| abs_path = loaded.abs_path | ||
| if not isinstance(abs_path, str): | ||
| return None | ||
| return abs_path | ||
|
|
||
|
|
||
| def probe_canary_abs_path_and_print_json(libname: str) -> None: | ||
| print(json.dumps(_probe_canary_abs_path(libname))) # noqa: T201 | ||
|
|
||
|
|
||
| def main(argv: list[str] | None = None) -> int: | ||
| args = sys.argv[1:] if argv is None else argv | ||
| if len(args) != 1: | ||
| return 2 | ||
| probe_canary_abs_path_and_print_json(args[0]) | ||
| return 0 | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| raise SystemExit(main()) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
133 changes: 133 additions & 0 deletions
133
cuda_pathfinder/cuda/pathfinder/_utils/spawned_process_runner.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,133 @@ | ||
| # SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| import multiprocessing | ||
| import queue # for Empty | ||
| import sys | ||
| import traceback | ||
| from collections.abc import Callable, Sequence | ||
| from dataclasses import dataclass | ||
| from io import StringIO | ||
| from typing import Any | ||
|
|
||
| PROCESS_KILLED = -9 | ||
| PROCESS_NO_RESULT = -999 | ||
|
|
||
|
|
||
| # Similar to https://docs.python.org/3/library/subprocess.html#subprocess.CompletedProcess | ||
| # (args, check_returncode() are intentionally not supported here.) | ||
| @dataclass | ||
| class CompletedProcess: | ||
| returncode: int | ||
| stdout: str | ||
| stderr: str | ||
|
|
||
|
|
||
| class ChildProcessWrapper: | ||
| def __init__( | ||
| self, | ||
| result_queue: Any, | ||
| target: Callable[..., None], | ||
| args: Sequence[Any] | None, | ||
| kwargs: dict[str, Any] | None, | ||
| ) -> None: | ||
| self.target = target | ||
| self.args = () if args is None else args | ||
| self.kwargs = {} if kwargs is None else kwargs | ||
| self.result_queue = result_queue | ||
|
|
||
| def __call__(self) -> None: | ||
| # Capture stdout/stderr | ||
| old_stdout = sys.stdout | ||
| old_stderr = sys.stderr | ||
| sys.stdout = StringIO() | ||
| sys.stderr = StringIO() | ||
|
|
||
| try: | ||
| self.target(*self.args, **self.kwargs) | ||
| returncode = 0 | ||
| except SystemExit as e: # Handle sys.exit() | ||
| returncode = e.code if isinstance(e.code, int) else 0 | ||
| except BaseException: | ||
| traceback.print_exc() | ||
| returncode = 1 | ||
| finally: | ||
| # Collect outputs and restore streams | ||
| stdout = sys.stdout.getvalue() | ||
| stderr = sys.stderr.getvalue() | ||
| sys.stdout = old_stdout | ||
| sys.stderr = old_stderr | ||
| try: # noqa: SIM105 | ||
| self.result_queue.put((returncode, stdout, stderr)) | ||
| except Exception: # noqa: S110 | ||
| # If the queue is broken (e.g., parent gone), best effort logging | ||
| pass | ||
|
|
||
|
|
||
| def run_in_spawned_child_process( | ||
| target: Callable[..., None], | ||
| *, | ||
| args: Sequence[Any] | None = None, | ||
| kwargs: dict[str, Any] | None = None, | ||
| timeout: float | None = None, | ||
| rethrow: bool = False, | ||
| ) -> CompletedProcess: | ||
| """Run `target` in a spawned child process, capturing stdout/stderr. | ||
|
|
||
| The provided `target` must be defined at the top level of a module, and must | ||
| be importable in the spawned child process. Lambdas, closures, or interactively | ||
| defined functions (e.g., in Jupyter notebooks) will not work. | ||
|
|
||
| If `rethrow=True` and the child process exits with a nonzero code, | ||
| raises ChildProcessError with the captured stderr. | ||
| """ | ||
| ctx = multiprocessing.get_context("spawn") | ||
| result_queue = ctx.Queue() | ||
| process = ctx.Process(target=ChildProcessWrapper(result_queue, target, args, kwargs)) | ||
| process.start() | ||
|
|
||
| try: | ||
| process.join(timeout) | ||
| if process.is_alive(): | ||
| process.terminate() | ||
| process.join() | ||
| result = CompletedProcess( | ||
| returncode=PROCESS_KILLED, | ||
| stdout="", | ||
| stderr=f"Process timed out after {timeout} seconds and was terminated.", | ||
| ) | ||
| else: | ||
| try: | ||
| returncode, stdout, stderr = result_queue.get(timeout=1.0) | ||
| except (queue.Empty, EOFError): | ||
| result = CompletedProcess( | ||
| returncode=PROCESS_NO_RESULT, | ||
| stdout="", | ||
| stderr="Process exited or crashed before returning results.", | ||
| ) | ||
| else: | ||
| result = CompletedProcess( | ||
| returncode=returncode, | ||
| stdout=stdout, | ||
| stderr=stderr, | ||
| ) | ||
|
|
||
| if rethrow and result.returncode != 0: | ||
| raise ChildProcessError( | ||
| f"Child process exited with code {result.returncode}.\n" | ||
| "--- stderr-from-child-process ---\n" | ||
| f"{result.stderr}" | ||
| "<end-of-stderr-from-child-process>\n" | ||
| ) | ||
|
|
||
| return result | ||
|
|
||
| finally: | ||
| try: | ||
| result_queue.close() | ||
| result_queue.join_thread() | ||
| except Exception: # noqa: S110 | ||
| pass | ||
| if process.is_alive(): | ||
| process.kill() | ||
| process.join() |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I have been trapped in other fires and was unable to provide feedbacks timely 😢
Loading
cudartin a subprocess is safer than loading it in the main process. That said, my recollection from the Feb-05 meeting was that we'd use other anchor points such as nvJitLink (basically, #1038). Does this mean we changed our mind a bit and decided to use cudart instead?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvjitlink is fine to start with as well. I'm honestly not sure why one would be preferable over another. That said, as long as we're not choosing something super niche, it doesn't seem like it's worth spending too much time on and can be changed in a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Phillip. I do not have any objection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main idea behind 1038 is that the pivot library has two roles: 1. provide an anchor point to find other libraries from (nvvm is the only case that needs it I think), 2. more importantly actually, limit the scope of future searches (via an object that remembers the pivot library).
This PR doesn't have the concept of a search scope, it's essentially only a trick to find nvvm independently, like any other independently found library.
I was a bit surprised when Phillip sent this PR, but I think it's useful, because I believe that we'll have the independent and scoped searches side by side indefinitely. This PR will make 1038 less important, mainly for safety / internal consistency, and to guide users to a consistent setup via helpful error messages.
Re cudart vs nvjitlink:
For the scoped search, the choice of the pivot library is up to the user.
For the independent search (in an isolated process), cudart is the better choice, because it's much smaller than nvjitlink:
740 vs 96964 KiB