Skip to content

fix: skip non-regular files in scan_compiled_extensions#1002

Closed
mikedep333 wants to merge 1 commit intopython-wheel-build:mainfrom
mikedep333:fix-scan-symlink-dirs
Closed

fix: skip non-regular files in scan_compiled_extensions#1002
mikedep333 wants to merge 1 commit intopython-wheel-build:mainfrom
mikedep333:fix-scan-symlink-dirs

Conversation

@mikedep333
Copy link
Copy Markdown
Member

Fixes:

Pull Request Description

What

pathlib.Path.walk() lists symlinks to directories as filenames rather than dirnames. This causes scan_compiled_extensions to attempt to open() them, resulting in IsADirectoryError. Add an is_file() check to skip non-regular files before attempting to read their contents.

Why

#1001

pathlib.Path.walk() lists symlinks to directories as filenames rather
than dirnames. This causes scan_compiled_extensions to attempt to
open() them, resulting in IsADirectoryError. Add an is_file() check
to skip non-regular files before attempting to read their contents.

Closes: python-wheel-build#1001
Signed-off-by: Mike DePaulo <mikedep333@redhat.com>
Co-Authored-By: Claude Opus 4.6 <claude@anthropic.com>
@mikedep333 mikedep333 requested a review from a team as a code owner March 31, 2026 21:03
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 31, 2026

📝 Walkthrough

Walkthrough

The scan_compiled_extensions function in src/fromager/sources.py now includes a guard clause that skips entries where the constructed filepath is not a regular file. This prevents operations like accessing file metadata (suffix) or reading magic headers on non-file filesystem entries such as directories or symlinks.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adding a file check to skip non-regular files in the scan_compiled_extensions function.
Description check ✅ Passed The description is directly related to the changeset, explaining the problem (symlinks to directories causing IsADirectoryError) and the solution (adding an is_file() check).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/fromager/sources.py (1)

862-863: Add a regression test for symlinked directories.

This fix is solid, but current tests (see tests/test_sources.py:304-313) only cover regular files, so this bug can regress silently.

✅ Suggested test addition
diff --git a/tests/test_sources.py b/tests/test_sources.py
@@
 def test_scan_compiled_extensions(
     filename: str, content: bytes, hit: bool, tmp_path: pathlib.Path
 ) -> None:
@@
     else:
         assert matches == []
+
+
+def test_scan_compiled_extensions_skips_symlinked_directory(tmp_path: pathlib.Path) -> None:
+    target_dir = tmp_path / "real_dir"
+    target_dir.mkdir()
+    (target_dir / "module.so").write_bytes(b"\x7fELFdummy")
+
+    link_path = tmp_path / "linked_dir"
+    try:
+        link_path.symlink_to(target_dir, target_is_directory=True)
+    except (OSError, NotImplementedError):
+        pytest.skip("Symlinks are not supported in this test environment")
+
+    matches = sources.scan_compiled_extensions(tmp_path)
+    # Main assertion: no crash; and we only report real files reachable by walk logic.
+    assert isinstance(matches, list)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/fromager/sources.py` around lines 862 - 863, Tests currently only cover
regular files so the check using filepath.is_file() in src/fromager/sources.py
can regress for symlinked directories; add a regression test in
tests/test_sources.py (near the existing test at lines ~304-313) that creates a
symlinked directory pointing to a real directory with files and asserts the same
behavior as for regular directories/files (i.e., the code should follow the
symlink and include files under the target); ensure the test sets up and tears
down the symlink and uses the same helper functions used by other tests to
verify that the module handling (the code path involving filepath.is_file())
correctly processes symlinked directories.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/fromager/sources.py`:
- Around line 862-863: Tests currently only cover regular files so the check
using filepath.is_file() in src/fromager/sources.py can regress for symlinked
directories; add a regression test in tests/test_sources.py (near the existing
test at lines ~304-313) that creates a symlinked directory pointing to a real
directory with files and asserts the same behavior as for regular
directories/files (i.e., the code should follow the symlink and include files
under the target); ensure the test sets up and tears down the symlink and uses
the same helper functions used by other tests to verify that the module handling
(the code path involving filepath.is_file()) correctly processes symlinked
directories.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7b534e36-8dce-4c1d-a71e-5d49025019be

📥 Commits

Reviewing files that changed from the base of the PR and between aec9c9c and 89efea9.

📒 Files selected for processing (1)
  • src/fromager/sources.py

Comment thread src/fromager/sources.py
Comment on lines +862 to +863
if not filepath.is_file():
continue
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_file use a syscall. I have written the code in a way that reduces the amount of syscalls and I/O activity.

Can you move right before filepath.open and add a comment why it is needed?

Copy link
Copy Markdown
Member Author

@mikedep333 mikedep333 Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update this commit to improve vs #1004 , which was merged

@tiran tiran added bug Something isn't working important Important task labels Apr 1, 2026
@mikedep333
Copy link
Copy Markdown
Member Author

Improved in #1011

@mikedep333 mikedep333 closed this Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working important Important task

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants