Skip to content

[mypyc] Fix missing cross-group header deps in incremental builds#21490

Merged
p-sawicki merged 6 commits into
python:masterfrom
VaggelisD:fix-mypyc-incremental-cross-group-header-deps
May 20, 2026
Merged

[mypyc] Fix missing cross-group header deps in incremental builds#21490
p-sawicki merged 6 commits into
python:masterfrom
VaggelisD:fix-mypyc-incremental-cross-group-header-deps

Conversation

@georgesittas
Copy link
Copy Markdown
Contributor

@georgesittas georgesittas commented May 15, 2026

Problem

In separate=True mode, each group's __native_internal_<mod>.h reaches into a sibling group's export-table header via an angled include:

// __native_internal_caller.h
#include <other_group/__native_other.h>

The consumer's .o file bakes in byte offsets from that struct at C compile time. If the sibling group's struct layout shifts (e.g. a class is inserted earlier in the source, shifting all subsequent offsets), the consumer's .o must be recompiled, otherwise it silently resolves offsets to the wrong object at runtime.

Two bugs in the old get_header_deps combined to hide this dep from setuptools' Extension.depends:

  1. No transitive walk. The cross-group include lives inside __native_internal_<mod>.h, not in the .c file itself. The old code only scanned the .c file's direct includes and never opened the resolved headers.

  2. Single-pass resolution. Deps were resolved per-group before sibling groups had written their headers to disk, so the cross-group header didn't yet exist even if we had looked.

Fix

  • _extract_includes / _INCLUDE_RE: distinguishes #include "foo" (quoted) from #include <foo> (angled), matching the preprocessor's own search rules — quoted form searches the includer's directory first; angled form searches -I paths only.
  • resolve_cfile_deps: transitive walker that opens each resolved header and follows its #include directives, bounded by a visited set. Headers that don't exist under target_dir (lib-rt headers like <Python.h>) are dropped since they never change between builds.
  • Two-pass loop in mypyc_build: first pass writes all groups' files to disk; second pass resolves deps so every group's headers are present when cross-group includes are looked up.

Impact on non-incremental builds

None. Extension.depends is only consulted by setuptools when a .o already exists. Cold builds compile everything unconditionally.

@georgesittas
Copy link
Copy Markdown
Contributor Author

georgesittas commented May 15, 2026

Hey folks,

I just saw #21488 and it's fully understandable. I support this stance and there's no problem with closing both this PR and #21481, given they were mostly written by Claude, as I'm still trying to get a grasp on the codebase. Up to you.

Some context: my colleague @VaggelisD has recently contributed patches related to incremental compilation. We're now trying to implement a caching layer for CI jobs that leverage this feature and have already stumbled upon a couple of bugs, which these two PRs aim to fix.

Given incremental compilation more-or-less didn't work until Vaggelis' patches were merged (AFAICT), and also that these two PRs are relatively small, I figured it'd be ok to hand them over to you and let you decide whether you want to spend cycles to review them and, if they're reasonable, get them in.

In any case, thank you all for the amazing work on this project. Its impact on sqlglot's performance so far has been incredible.

EDIT: if you prefer, I can also open issues for these bugs. The examples reproducing them are quite small.

@ilevkivskyi
Copy link
Copy Markdown
Member

@georgesittas Thanks, no worries! I will leave this up to @JukkaL and/or @p-sawicki

Copy link
Copy Markdown
Collaborator

@p-sawicki p-sawicki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be possible to add a run test similar to the one in #21481 to test the issue with the incorrect offsets into the exports table?

Comment thread mypyc/test/test_misc.py Outdated
Comment thread mypyc/build.py Outdated
Comment thread mypyc/build.py Outdated
@georgesittas
Copy link
Copy Markdown
Contributor Author

Appreciate the review @p-sawicki, I'll clean this up shortly.

@georgesittas georgesittas requested a review from p-sawicki May 19, 2026 15:03
@p-sawicki
Copy link
Copy Markdown
Collaborator

thanks for the changes, i don't have any more comments about the code. did you look into creating a run test?

would it be possible to add a run test similar to the one in #21481 to test the issue with the incorrect offsets into the exports table?

if this can't be reproduced with the test infra then we can merge as is. but i think it should be possible to create a mutlimodule test that crashes without the changes in this PR when compiled in separate mode and works with the changes.

@georgesittas
Copy link
Copy Markdown
Contributor Author

thanks for the changes, i don't have any more comments about the code. did you look into creating a run test?

would it be possible to add a run test similar to the one in #21481 to test the issue with the incorrect offsets into the exports table?

if this can't be reproduced with the test infra then we can merge as is. but i think it should be possible to create a mutlimodule test that crashes without the changes in this PR when compiled in separate mode and works with the changes.

Yep, I noticed earlier that I didn't push this change and I'm doing it now.

georgesittas and others added 5 commits May 20, 2026 15:06
In separate=True mode, each compiled group writes a set of C files and
headers under the build directory. The C files for one group (the
"consumer") reach into another group's export-table header via an
angled include inside their own __native_internal_<mod>.h:

    __native_internal_caller.h:
        #include <other_group/__native_other.h>

The consumer's .o file bakes in byte offsets from that struct at C
compile time. If the other group's struct layout shifts (e.g. a new
class is inserted earlier in the source), the consumer's .o must be
recompiled to pick up the new layout — otherwise it silently resolves
offsets to the wrong class or function at runtime.

The old get_header_deps used a regex that collapsed both #include "..."
and #include <...> into plain names, then resolved them all relative to
target_dir. Two bugs combined to hide the cross-group dep:

1. The resolver only looked at the .c file's direct includes and never
   opened the resolved headers to follow their own #include directives.
   The angled cross-group include lives inside __native_internal_<mod>.h,
   not in the .c file itself, so it was never seen.

2. The deps were resolved in a single pass per group, before sibling
   groups had written their headers to disk, so the cross-group header
   file didn't yet exist even if the resolver had looked for it.

The fix introduces:

* _INCLUDE_RE / _extract_includes: a regex that distinguishes the two
  include forms and returns (is_angled, name) pairs, matching the C
  preprocessor's own search rules (#include "foo" searches the
  includer's directory first; #include <foo> searches -I paths only).

* resolve_cfile_deps: a transitive walker that opens each resolved
  header and follows its own #include directives, bounded by a visited
  set. Only headers that exist under target_dir (or, for quoted form,
  the includer's directory) are included; lib-rt headers like <Python.h>
  are dropped because they never change between builds.

* A two-pass loop in mypyc_build: the first pass writes all groups'
  C files to disk; the second pass resolves deps so that every group's
  headers are available when we look for cross-group includes.

After the fix, editing a dep module in a way that shifts its
export_table struct correctly triggers a recompile of the consumer's
.o, preventing silent runtime corruption.
…ental builds

In separate=True mode, when a group is fully cached (generate_c returns
empty ctext for all its files), get_header_deps was called with empty
content, returning no includes.  Extension.depends therefore stayed empty
for that group, so cross-group export-table header changes — caused by
inserting a new class that shifts struct offsets — never triggered a
recompile of the cached consumer's .o.  The stale .o then baked in the
old offsets, silently resolving them to wrong classes at runtime.

Fix: when ctext is empty and the .c file exists on disk (written by the
previous build), read it before calling get_header_deps so the dep
resolver can walk the transitive header chain and include cross-group
headers in Extension.depends.

Also adds a test that directly demonstrates the before/after: calling
get_header_deps with empty content produces no deps, while calling it
with the on-disk content surfaces the cross-group header.
This patches `fudge_dir_mtimes` to skip linker outputs (.so/.pyd/.o).
The existing -1 fudge on every step pushed `.so` mtimes back, which
combined with `write_file`'s +1 mtime bump on `.c` files made
`newer_group` always trigger a rebuild regardless of
`Extension.depends`, masking this class of bugs in the test framework.
@georgesittas georgesittas force-pushed the fix-mypyc-incremental-cross-group-header-deps branch 2 times, most recently from 28fa63b to 9cb3320 Compare May 20, 2026 12:33
@georgesittas georgesittas force-pushed the fix-mypyc-incremental-cross-group-header-deps branch from 9cb3320 to ca6b291 Compare May 20, 2026 12:35
Comment thread mypyc/test/testutil.py
Comment on lines 234 to 245
def fudge_dir_mtimes(dir: str, delta: int) -> None:
# Skip linker outputs. Pushing them back combines with write_file's
# +1 sec bump on .c files to make .c always newer than .so, forcing
# an unconditional rebuild that would mask Extension.depends bugs.
# See setuptools/_distutils/command/build_ext.py:`build_extension`.
for dirpath, _, filenames in os.walk(dir):
for name in filenames:
if name.endswith((".so", ".pyd", ".o", ".obj")):
continue
path = os.path.join(dirpath, name)
new_mtime = os.stat(path).st_mtime + delta
os.utime(path, times=(new_mtime, new_mtime))
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this change, the test framework masks the Extension.depends bug. The interaction is:

  1. In step 1: write_file bumps .c mtimes forward by 1 second after a write
  2. Between steps: fudge_dir_mtimes(WORKDIR, -1) shifted every file under build/ back by 1 second

This meant that at the start of step 2, every .c would be strictly newer than its .so. Setuptools' build_extension checks newer_group(sources + ext.depends, ext_path, 'newer'), so if a .c is newer than the corresponding .so, it returns True and triggers a recompilation, regardless of whether Extension.depends is correct.

So this fudge worked as a third "rebuild trigger" sitting on top of setuptools' real (a) source-changed and (b) declared-dependency-changed checks. This resulted in hiding the bug in the dependency tracking. Skipping linker outputs in the fudge leaves the real setuptools incremental decision intact, which is what the test needs.

If this change feels too broad, we can perhaps scope it to just the test I added. For example, we could define a marker that tests carry (e.g., # preserve_artifact_mtimes) and in run_case_inner we'd parse the marker from testcase.input once before the incremental loop and pass a flag through to a wrapper around fudge_dir_mtimes:

  skip_artifacts = re.search(r"^# preserve_artifact_mtimes\b", text, re.MULTILINE)
  ...
  fudge_dir_mtimes(WORKDIR, -1, skip_artifacts=skip_artifacts)

Happy to refactor if needed.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the issue is a little different. if i understand correctly the discrepancy between the test and real compilation comes from the fact that native.py gets recompiled in step 2 instead of being fully cached.

that shouldn't be because of the mtime bump in write_file because native.py stays the same in both steps so there's no update of its C files. but if the initial mtime of its C files is X and the mtime of the object files is X+d, the fudge between step 1 and 2 shifts the object files to X+d-1 so they seem older than the C files and get recompiled if d < 1.

so the issue is that we fudge the mtimes of artifacts from files that don't change between steps. i'm not sure if we can easily fudge the mtimes of just the artifacts generated from specific python files though.

skipping it for all object files might be the best solution, though i wonder what's left to fudge after skipping them? maybe we could just remove the fudge_dir_mtimes function altogether?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thank you for the detailed analysis. This makes sense. I'll see if we can remove fudge_dir_mtimes altogether as you suggested and get back to you soon.

Copy link
Copy Markdown
Contributor Author

@georgesittas georgesittas May 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I tried removing fudge_dir_mtimes entirely. Tests are flaky under xdist but they flake the same way on the unmodified baseline (4 timeouts) and on the targeted-skip version (7). Based on Claude's analysis, the intermittent failures were due to PyInit being CPU-starved when 14 xdist workers are all running clang, not anything fudge-related, but it couldn't rule it out.

Click this to see Claude's analysis, based on git history

Nov 2019 (PR #7887, sully): fudge_dir_mtimes(-1) introduced together with incremental-compilation support for mypyc. Comment from that commit:

To make sure that any new changes get picked up as being new by distutils, shift the mtime of all of the generated artifacts back by a second.

The problem it solved: mtime has 1-second resolution on many filesystems (ext4, HFS+). A test could run step 1 and step 2 within the same wall-clock second, leaving step 2's outputs at the same mtime as step 1's. setuptools' newer_group would then see "not newer" and skip rebuilds that should fire.

Sept 2022 (PR #13593, Jukka): added time.sleep(1.0) on Linux only, because the fudge alone wasn't sufficient on Ubuntu CI. Comment said "Temporary workaround" to issue #13572.

Nov 2023 (PR #16520, Levkivskyi): generalized the sleep to all platforms after macOS started flaking too. Quote from that PR description:

contrary to the comment, my Linux Mint works perfectly (i.e. test passed 20 times even after I removed the sleep()). So it is not really Mac vs Linux issue.

So the situation today is:

  • Original intent of fudge: avoid same-second collisions between step 1 outputs and step 2 inputs, on filesystems with 1-second mtime resolution.
  • Sleep was added later as additional safety when fudge alone wasn't reliable.
  • With sleep + fudge in place, there are ~2 seconds of headroom between step 1's .c writes and step 2's writes (after accounting for write_file's +1 bump). Without fudge, only ~1 second, which is exactly at the edge of resolution and could race on slow filesystems.

So the fudge isn't strictly necessary for correctness given the sleep, but it adds a second of safety margin. Removing the fudge removes that margin and could re-introduce flakes that the layered approach has been masking for years.

Given the above, I'm leaning towards keeping fudge_dir_mtimes since it doesn't hurt keeping it around.

skipping it for all object files might be the best solution, though i wonder what's left to fudge after skipping them?

I believe .c and .h files will still be fudged. Traced the files visited during the walk for a single test and got build/setup.py, build/__native_other_a.c, build/__native_other_a.h, etc. Does this cover your question or did you mean something else?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok thanks for trying that. i realize now that my analysis wasn't correct because i missed that write_file actually does bump the mtime of the file on its first creation. this makes the mtimes of the C files created for native.py X+1 initially. and this means that the object files created from those C files might have mtimes in range [X, X+1] because those aren't bumped.

so the change in fudge_dir_mtimes counteracts that and brings the mtimes of C files before those of the object files. we could also change write_file to skip the bump when creating new files but i think the effect would be identical so it's fine to keep as is.

@p-sawicki p-sawicki merged commit ab8e4bf into python:master May 20, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants