[mypyc] Fix missing cross-group header deps in incremental builds#21490
Conversation
|
Hey folks, I just saw #21488 and it's fully understandable. I support this stance and there's no problem with closing both this PR and #21481, given they were mostly written by Claude, as I'm still trying to get a grasp on the codebase. Up to you. Some context: my colleague @VaggelisD has recently contributed patches related to incremental compilation. We're now trying to implement a caching layer for CI jobs that leverage this feature and have already stumbled upon a couple of bugs, which these two PRs aim to fix. Given incremental compilation more-or-less didn't work until Vaggelis' patches were merged (AFAICT), and also that these two PRs are relatively small, I figured it'd be ok to hand them over to you and let you decide whether you want to spend cycles to review them and, if they're reasonable, get them in. In any case, thank you all for the amazing work on this project. Its impact on sqlglot's performance so far has been incredible. EDIT: if you prefer, I can also open issues for these bugs. The examples reproducing them are quite small. |
|
@georgesittas Thanks, no worries! I will leave this up to @JukkaL and/or @p-sawicki |
|
Appreciate the review @p-sawicki, I'll clean this up shortly. |
|
thanks for the changes, i don't have any more comments about the code. did you look into creating a run test?
if this can't be reproduced with the test infra then we can merge as is. but i think it should be possible to create a mutlimodule test that crashes without the changes in this PR when compiled in separate mode and works with the changes. |
Yep, I noticed earlier that I didn't push this change and I'm doing it now. |
In separate=True mode, each compiled group writes a set of C files and
headers under the build directory. The C files for one group (the
"consumer") reach into another group's export-table header via an
angled include inside their own __native_internal_<mod>.h:
__native_internal_caller.h:
#include <other_group/__native_other.h>
The consumer's .o file bakes in byte offsets from that struct at C
compile time. If the other group's struct layout shifts (e.g. a new
class is inserted earlier in the source), the consumer's .o must be
recompiled to pick up the new layout — otherwise it silently resolves
offsets to the wrong class or function at runtime.
The old get_header_deps used a regex that collapsed both #include "..."
and #include <...> into plain names, then resolved them all relative to
target_dir. Two bugs combined to hide the cross-group dep:
1. The resolver only looked at the .c file's direct includes and never
opened the resolved headers to follow their own #include directives.
The angled cross-group include lives inside __native_internal_<mod>.h,
not in the .c file itself, so it was never seen.
2. The deps were resolved in a single pass per group, before sibling
groups had written their headers to disk, so the cross-group header
file didn't yet exist even if the resolver had looked for it.
The fix introduces:
* _INCLUDE_RE / _extract_includes: a regex that distinguishes the two
include forms and returns (is_angled, name) pairs, matching the C
preprocessor's own search rules (#include "foo" searches the
includer's directory first; #include <foo> searches -I paths only).
* resolve_cfile_deps: a transitive walker that opens each resolved
header and follows its own #include directives, bounded by a visited
set. Only headers that exist under target_dir (or, for quoted form,
the includer's directory) are included; lib-rt headers like <Python.h>
are dropped because they never change between builds.
* A two-pass loop in mypyc_build: the first pass writes all groups'
C files to disk; the second pass resolves deps so that every group's
headers are available when we look for cross-group includes.
After the fix, editing a dep module in a way that shifts its
export_table struct correctly triggers a recompile of the consumer's
.o, preventing silent runtime corruption.
…ental builds In separate=True mode, when a group is fully cached (generate_c returns empty ctext for all its files), get_header_deps was called with empty content, returning no includes. Extension.depends therefore stayed empty for that group, so cross-group export-table header changes — caused by inserting a new class that shifts struct offsets — never triggered a recompile of the cached consumer's .o. The stale .o then baked in the old offsets, silently resolving them to wrong classes at runtime. Fix: when ctext is empty and the .c file exists on disk (written by the previous build), read it before calling get_header_deps so the dep resolver can walk the transitive header chain and include cross-group headers in Extension.depends. Also adds a test that directly demonstrates the before/after: calling get_header_deps with empty content produces no deps, while calling it with the on-disk content surfaces the cross-group header.
for more information, see https://pre-commit.ci
This patches `fudge_dir_mtimes` to skip linker outputs (.so/.pyd/.o). The existing -1 fudge on every step pushed `.so` mtimes back, which combined with `write_file`'s +1 mtime bump on `.c` files made `newer_group` always trigger a rebuild regardless of `Extension.depends`, masking this class of bugs in the test framework.
28fa63b to
9cb3320
Compare
9cb3320 to
ca6b291
Compare
| def fudge_dir_mtimes(dir: str, delta: int) -> None: | ||
| # Skip linker outputs. Pushing them back combines with write_file's | ||
| # +1 sec bump on .c files to make .c always newer than .so, forcing | ||
| # an unconditional rebuild that would mask Extension.depends bugs. | ||
| # See setuptools/_distutils/command/build_ext.py:`build_extension`. | ||
| for dirpath, _, filenames in os.walk(dir): | ||
| for name in filenames: | ||
| if name.endswith((".so", ".pyd", ".o", ".obj")): | ||
| continue | ||
| path = os.path.join(dirpath, name) | ||
| new_mtime = os.stat(path).st_mtime + delta | ||
| os.utime(path, times=(new_mtime, new_mtime)) |
There was a problem hiding this comment.
Without this change, the test framework masks the Extension.depends bug. The interaction is:
- In step 1:
write_filebumps .c mtimes forward by 1 second after a write - Between steps:
fudge_dir_mtimes(WORKDIR, -1)shifted every file underbuild/back by 1 second
This meant that at the start of step 2, every .c would be strictly newer than its .so. Setuptools' build_extension checks newer_group(sources + ext.depends, ext_path, 'newer'), so if a .c is newer than the corresponding .so, it returns True and triggers a recompilation, regardless of whether Extension.depends is correct.
So this fudge worked as a third "rebuild trigger" sitting on top of setuptools' real (a) source-changed and (b) declared-dependency-changed checks. This resulted in hiding the bug in the dependency tracking. Skipping linker outputs in the fudge leaves the real setuptools incremental decision intact, which is what the test needs.
If this change feels too broad, we can perhaps scope it to just the test I added. For example, we could define a marker that tests carry (e.g., # preserve_artifact_mtimes) and in run_case_inner we'd parse the marker from testcase.input once before the incremental loop and pass a flag through to a wrapper around fudge_dir_mtimes:
skip_artifacts = re.search(r"^# preserve_artifact_mtimes\b", text, re.MULTILINE)
...
fudge_dir_mtimes(WORKDIR, -1, skip_artifacts=skip_artifacts)Happy to refactor if needed.
There was a problem hiding this comment.
i think the issue is a little different. if i understand correctly the discrepancy between the test and real compilation comes from the fact that native.py gets recompiled in step 2 instead of being fully cached.
that shouldn't be because of the mtime bump in write_file because native.py stays the same in both steps so there's no update of its C files. but if the initial mtime of its C files is X and the mtime of the object files is X+d, the fudge between step 1 and 2 shifts the object files to X+d-1 so they seem older than the C files and get recompiled if d < 1.
so the issue is that we fudge the mtimes of artifacts from files that don't change between steps. i'm not sure if we can easily fudge the mtimes of just the artifacts generated from specific python files though.
skipping it for all object files might be the best solution, though i wonder what's left to fudge after skipping them? maybe we could just remove the fudge_dir_mtimes function altogether?
There was a problem hiding this comment.
I see, thank you for the detailed analysis. This makes sense. I'll see if we can remove fudge_dir_mtimes altogether as you suggested and get back to you soon.
There was a problem hiding this comment.
Ok, I tried removing fudge_dir_mtimes entirely. Tests are flaky under xdist but they flake the same way on the unmodified baseline (4 timeouts) and on the targeted-skip version (7). Based on Claude's analysis, the intermittent failures were due to PyInit being CPU-starved when 14 xdist workers are all running clang, not anything fudge-related, but it couldn't rule it out.
Click this to see Claude's analysis, based on git history
Nov 2019 (PR #7887, sully): fudge_dir_mtimes(-1) introduced together with incremental-compilation support for mypyc. Comment from that commit:
To make sure that any new changes get picked up as being new by distutils, shift the mtime of all of the generated artifacts back by a second.
The problem it solved: mtime has 1-second resolution on many filesystems (ext4, HFS+). A test could run step 1 and step 2 within the same wall-clock second, leaving step 2's outputs at the same mtime as step 1's. setuptools' newer_group would then see "not newer" and skip rebuilds that should fire.
Sept 2022 (PR #13593, Jukka): added time.sleep(1.0) on Linux only, because the fudge alone wasn't sufficient on Ubuntu CI. Comment said "Temporary workaround" to issue #13572.
Nov 2023 (PR #16520, Levkivskyi): generalized the sleep to all platforms after macOS started flaking too. Quote from that PR description:
contrary to the comment, my Linux Mint works perfectly (i.e. test passed 20 times even after I removed the sleep()). So it is not really Mac vs Linux issue.
So the situation today is:
- Original intent of fudge: avoid same-second collisions between step 1 outputs and step 2 inputs, on filesystems with 1-second mtime resolution.
- Sleep was added later as additional safety when fudge alone wasn't reliable.
- With sleep + fudge in place, there are ~2 seconds of headroom between step 1's .c writes and step 2's writes (after accounting for write_file's +1 bump). Without fudge, only ~1 second, which is exactly at the edge of resolution and could race on slow filesystems.
So the fudge isn't strictly necessary for correctness given the sleep, but it adds a second of safety margin. Removing the fudge removes that margin and could re-introduce flakes that the layered approach has been masking for years.
Given the above, I'm leaning towards keeping fudge_dir_mtimes since it doesn't hurt keeping it around.
skipping it for all object files might be the best solution, though i wonder what's left to fudge after skipping them?
I believe .c and .h files will still be fudged. Traced the files visited during the walk for a single test and got build/setup.py, build/__native_other_a.c, build/__native_other_a.h, etc. Does this cover your question or did you mean something else?
There was a problem hiding this comment.
ok thanks for trying that. i realize now that my analysis wasn't correct because i missed that write_file actually does bump the mtime of the file on its first creation. this makes the mtimes of the C files created for native.py X+1 initially. and this means that the object files created from those C files might have mtimes in range [X, X+1] because those aren't bumped.
so the change in fudge_dir_mtimes counteracts that and brings the mtimes of C files before those of the object files. we could also change write_file to skip the bump when creating new files but i think the effect would be identical so it's fine to keep as is.
Problem
In
separate=Truemode, each group's__native_internal_<mod>.hreaches into a sibling group's export-table header via an angled include:The consumer's
.ofile bakes in byte offsets from that struct at C compile time. If the sibling group's struct layout shifts (e.g. a class is inserted earlier in the source, shifting all subsequent offsets), the consumer's.omust be recompiled, otherwise it silently resolves offsets to the wrong object at runtime.Two bugs in the old
get_header_depscombined to hide this dep from setuptools'Extension.depends:No transitive walk. The cross-group include lives inside
__native_internal_<mod>.h, not in the.cfile itself. The old code only scanned the.cfile's direct includes and never opened the resolved headers.Single-pass resolution. Deps were resolved per-group before sibling groups had written their headers to disk, so the cross-group header didn't yet exist even if we had looked.
Fix
_extract_includes/_INCLUDE_RE: distinguishes#include "foo"(quoted) from#include <foo>(angled), matching the preprocessor's own search rules — quoted form searches the includer's directory first; angled form searches-Ipaths only.resolve_cfile_deps: transitive walker that opens each resolved header and follows its#includedirectives, bounded by a visited set. Headers that don't exist undertarget_dir(lib-rt headers like<Python.h>) are dropped since they never change between builds.mypyc_build: first pass writes all groups' files to disk; second pass resolves deps so every group's headers are present when cross-group includes are looked up.Impact on non-incremental builds
None.
Extension.dependsis only consulted by setuptools when a.oalready exists. Cold builds compile everything unconditionally.