Skip to content

Fixes CLOUDAI-15: Updated copyright check#811

Merged
podkidyshev merged 2 commits intomainfrom
ipod/copyright-verify-3
Feb 17, 2026
Merged

Fixes CLOUDAI-15: Updated copyright check#811
podkidyshev merged 2 commits intomainfrom
ipod/copyright-verify-3

Conversation

@podkidyshev
Copy link
Contributor

Summary

Implements a new copyright check with the following requirements:

  • copyright years must cover all but only years when a file was changed
  • minimum year from the actual copyright in a given file is treated as ground truth
    • due to sometimes unpredictable git log --follow behavior

Test Plan

Additional Notes

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 17, 2026

📝 Walkthrough

Walkthrough

This PR updates copyright year metadata from 2024–2025/2024–2026 to 2025–2026 across the codebase and enhances the copyright header validation framework in tests with new utility functions for year-range formatting and git-based year extraction.

Changes

Cohort / File(s) Summary
Configuration Files - Copyright Update
conf/common/test_scenario/ucc_generator_test.toml, conf/experimental/test/deepep_low_latency.toml
Updated copyright year headers from 2024–2025 to 2025–2026.
Source Files - Copyright Update
src/cloudai/_core/base_reporter.py, src/cloudai/core.py, src/cloudai/models/scenario.py, src/cloudai/registration.py, src/cloudai/workloads/ai_dynamo/kubernetes_json_gen_strategy.py, src/cloudai/workloads/deepep/deepep.py, src/cloudai/workloads/deepep/report_generation_strategy.py, src/cloudai/workloads/deepep/slurm_command_gen_strategy.py, src/cloudai/workloads/nccl_test/prediction_report_generator.py
Updated copyright year headers from 2024–2025/2024–2026 to 2025–2026.
Test Files - Copyright Update
tests/test_git_repo_installer.py, tests/workloads/megatron_run/test_report_gen_strategy.py, tests/workloads/nccl_test/test_json_gen_strategy_kubernetes.py, tests/workloads/nccl_test/test_prediction_report_generator.py
Updated copyright year headers from 2024–2025/2024–2026 to 2025–2026.
Copyright Header Test Framework
tests/test_check_copyright_headers.py
Added year-range formatting utilities (_format_years_to_ranges, get_commit_years_from_git, get_commit_years_from_file, prepare_copyright_with_year); introduced _assert_copyright_in_file validation helper; added CURRENT_YEAR constant; refactored test logic to use new helper functions; expanded test coverage for range formatting.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 Hops through history with fresh-bound dates,
Copyright years now align with time's great gates,
New helpers born to validate with care,
Year ranges formatted everywhere!
A spring refresh for code so fair! 🌱

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: updating copyright headers across multiple files to reflect the new copyright year validation logic.
Description check ✅ Passed The description clearly relates to the changeset, explaining the copyright check implementation requirements and how it handles file modification years.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ipod/copyright-verify-3

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@podkidyshev podkidyshev marked this pull request as ready for review February 17, 2026 13:12
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 17, 2026

Greptile Summary

This PR implements a more robust copyright year checking system that validates copyright headers against actual git commit history. The new approach:

  • Uses git log --follow to track all years when a file was modified, including across renames
  • Treats the minimum year from the existing copyright header as ground truth (to handle unpredictable git log --follow behavior)
  • Formats consecutive years as ranges (e.g., "2024-2026") and non-consecutive years with commas (e.g., "2024, 2026")
  • Includes the current year for new files or files with uncommitted changes

The PR updates 16 files to correct their copyright years based on this new validation logic. Most changes remove 2024 from files that were actually created/modified starting in 2025, correcting previously inaccurate copyright headers.

Confidence Score: 5/5

  • This PR is safe to merge - it improves copyright header accuracy and adds comprehensive test coverage
  • The changes are purely administrative (copyright headers) with a well-tested validation system. The new logic is thoroughly tested with unit tests for the helper functions and parametrized tests for all source files. Verified that copyright year updates match actual git history.
  • No files require special attention

Important Files Changed

Filename Overview
tests/test_check_copyright_headers.py Implements new copyright check logic that uses git history to validate copyright years, with file minimum year as ground truth
src/cloudai/models/scenario.py Copyright year updated from 2024-2026 to 2025-2026, removing incorrect 2024 year from file history
src/cloudai/registration.py Copyright year updated from 2024-2026 to 2025-2026, removing incorrect 2024 year from file history
tests/workloads/megatron_run/test_report_gen_strategy.py Copyright year updated from 2025-2026 to 2026, matching file created in 2026

Flowchart

flowchart TD
    A[Start: File Copyright Check] --> B[Read copyright header from file]
    B --> C[Parse years from header<br/>get_commit_years_from_file]
    C --> D[Get min_actual_year from parsed years]
    D --> E[Get commit years from git<br/>git log --follow]
    E --> F{Git history found?}
    F -->|No| G[Use CURRENT_YEAR]
    F -->|Yes| H[Parse unique years from git]
    H --> I{File has uncommitted changes?}
    I -->|Yes| J[Add CURRENT_YEAR if not present]
    I -->|No| K[Keep git years as-is]
    J --> L[Filter: years >= min_actual_year]
    K --> L
    G --> L
    L --> M[Format years to ranges<br/>_format_years_to_ranges]
    M --> N[Generate expected copyright line]
    N --> O{Actual == Expected?}
    O -->|Yes| P[Pass]
    O -->|No| Q[Fail: Copyright year mismatch]
Loading

Last reviewed commit: 14c4d06

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

16 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/test_check_copyright_headers.py (1)

25-42: 🧹 Nitpick | 🔵 Trivial

The "2024" in HEADER (line 26) is unused but could mislead readers.

Since only HEADER_TAIL (lines 2+) is compared, and line 1 is validated dynamically against git-derived years, the 2024 in the template is never actually checked. Consider adding a brief comment noting this, or replacing it with a placeholder like YYYY to signal it's not compared.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_check_copyright_headers.py` around lines 25 - 42, The literal
"2024" in HEADER is misleading because tests only compare HEADER_TAIL
(HEADER_LINES[2:]) and the first-year line is validated dynamically from git;
update the HEADER template to make this clear by either replacing the hardcoded
year with a placeholder like "YYYY" or adding a brief inline comment near
HEADER/HEADER_LINES noting that the first line is validated dynamically and not
compared, so readers won't assume the literal year is used by the test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/test_check_copyright_headers.py`:
- Around line 94-121: In get_commit_years_from_git, check the result of the
subprocess.run call stored in status (i.e. status.returncode) and emit a warning
or log a message when git status fails (non-zero) before relying on
status.stdout; include context (path_str) and status.stderr in the log to aid
debugging so that the CURRENT_YEAR append logic isn't silently skipped when git
status errored.
- Around line 124-138: get_commit_years_from_file returns years in the original
parse order which can make using actual_years[0] as the minimum incorrect;
update the implementation so it returns a sorted list (e.g., sorted(years)) or
ensure the consumer computes the min (in _assert_copyright_in_file where
actual_years[0] is used) by replacing that usage with min(actual_years) — modify
either get_commit_years_from_file or the call site to guarantee the earliest
year is correctly determined.
- Around line 156-164: The filtering step can produce an empty expected_years
which later causes _format_years_to_ranges (via prepare_copyright_with_year) to
raise a confusing ValueError; update the logic after calling
get_commit_years_from_git(file) to detect when expected_years is empty and
handle it explicitly — either by setting expected_years = [min_actual_year]
(fallback to the header's min year) or by raising an AssertionError/ValueError
with a clear message referencing get_commit_years_from_git and the file header;
adjust the block around get_commit_years_from_file(actual_copyright_lines[1]) /
get_commit_years_from_git(file) to implement this guard.

---

Outside diff comments:
In `@tests/test_check_copyright_headers.py`:
- Around line 25-42: The literal "2024" in HEADER is misleading because tests
only compare HEADER_TAIL (HEADER_LINES[2:]) and the first-year line is validated
dynamically from git; update the HEADER template to make this clear by either
replacing the hardcoded year with a placeholder like "YYYY" or adding a brief
inline comment near HEADER/HEADER_LINES noting that the first line is validated
dynamically and not compared, so readers won't assume the literal year is used
by the test.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@tests/test_check_copyright_headers.py`:
- Around line 113-120: The git status subprocess call assigns its result to
status but never checks status.returncode, so failures are silently ignored and
CURRENT_YEAR may not be appended to years; update the code around the
subprocess.run invocation (the call that sets status) to check status.returncode
and, if non-zero, emit a warning log (including status.returncode and
status.stderr) before continuing, so callers can debug why status.stdout is
empty while leaving the existing logic that appends CURRENT_YEAR to years when
appropriate.
- Around line 124-138: get_commit_years_from_file currently returns years in the
parsed order which can be non-monotonic; change it so the returned list is
sorted (e.g., return sorted(years)) so callers that rely on the first element
being the minimum (e.g., actual_years[0]) are correct; update the function
get_commit_years_from_file to sort the years before returning (or alternatively
adjust call sites to use min(actual_years) where used).
- Around line 156-164: The filter on expected_years can yield an empty list
(when min_actual_year is newer than all git years) causing
_format_years_to_ranges([]) to raise a confusing ValueError; modify the logic
around get_commit_years_from_git and the min_actual_year handling so that if
expected_years becomes empty after filtering you fall back to using
[min_actual_year] (or append min_actual_year to expected_years) before calling
prepare_copyright_with_year, ensuring functions get_commit_years_from_file,
get_commit_years_from_git, prepare_copyright_with_year and
_format_years_to_ranges are fed a non-empty list.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

16 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/test_check_copyright_headers.py`:
- Around line 147-168: Add an inline comment in the _assert_copyright_in_file
function near the computation of min_actual_year (where actual_years and
expected_years are derived and expected_years is filtered by min_actual_year)
that explicitly documents the trade-off: we trust the file's own minimum year as
ground truth because git --follow is unreliable, which means the test will
silently accept reductions of a file's start year (e.g., dropping 2024 → 2025)
and thus cannot detect accidental inflation/removal of earlier years; mention
this is intentional and point maintainers to get_commit_years_from_file and
get_commit_years_from_git for context.
- Around line 106-110: When handling git failures in the block that checks
res.returncode (the one that allows 128 = no commits match), add a debug-level
log of res.stderr whenever res.returncode == 128 before you fall through to
computing lines/years; keep the current behavior of defaulting to [CURRENT_YEAR]
for empty lines. Reference the existing variables res, res.returncode, and
res.stderr and ensure the log call runs only for the 128 branch so other
non-zero return codes still raise the RuntimeError; leave the lines and years
logic (lines = [...], years = ...) unchanged.
- Around line 124-138: The get_commit_years_from_file function is fragile
because it relies on brittle str.replace calls and will produce unclear
ValueError on malformed lines; update get_commit_years_from_file to use a
regular expression that extracts the years segment (or individual
year/year-range tokens) from the line, validate the regex match and raise a
clear ValueError if the expected pattern is not found, then parse tokens
(splitting on commas, handling ranges like "YYYY-YYYY") into ints and return the
sorted list of years. Ensure you reference get_commit_years_from_file when
locating the code to change.
- Around line 166-167: Update the two assertions that check
actual_copyright_lines[0] and [1] to include the file path in their failure
messages (same style as the following assertion that uses {file}); specifically
modify the assertion messages for the checks against HEADER_LINES[0] and
expected_years_line so they incorporate the {file} identifier and provide
context, referencing the variables actual_copyright_lines, HEADER_LINES,
expected_years_line and the {file} placeholder.
- Line 154: Update the failing assertion message to include the current file
path so CI failures are easier to debug: in the test that uses
actual_copyright_lines and HEADER_LINES (look for the assertion using these
symbols), change the message string to include the file variable (e.g., include
file or path) so the assertion reads something like "Copyright is missing or
incomplete for {file}".
- Around line 50-73: _format_years_to_ranges assumes callers pass sorted, unique
years but doesn't enforce it; update the start of the function to defensively
normalize the input by converting years into a sorted unique sequence (e.g.,
years = sorted(set(years))) so duplicates and unsorted inputs are handled
correctly before the existing range-building logic runs, preserving the current
empty-list ValueError behavior and using the normalized list for
range_start/range_end and the loop.

---

Duplicate comments:
In `@tests/test_check_copyright_headers.py`:
- Around line 94-121: The git status call in get_commit_years_from_git currently
ignores non-zero exit codes; add a guard like the git log block: after calling
subprocess.run for ["git", "status", "--porcelain", "--", path_str] check if
status.returncode != 0 and raise a RuntimeError (including path_str and
status.stderr) so failures (not being in a repo, corrupt index, etc.) are
surfaced instead of silently degrading the computed years; keep the existing
logic that appends CURRENT_YEAR only when status.stdout indicates changes.

@podkidyshev podkidyshev merged commit a616d59 into main Feb 17, 2026
5 checks passed
@podkidyshev podkidyshev deleted the ipod/copyright-verify-3 branch February 17, 2026 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants