Add RAT license validation#47
Open
janhoy wants to merge 11 commits into
Open
Conversation
- Add .rat-excludes with exclusion patterns for config files, binaries,
docs, caches, empty __init__.py namespace markers, and IDE files
- Add `make rat` target that auto-downloads the RAT 0.18 JAR to
~/.cache/apache-rat/ on first use (requires curl + java)
- Add scripts/release-checks.sh (was missing despite being referenced
in the Makefile); currently runs the RAT audit
- Update `make release-checks` to invoke scripts/release-checks.sh
- Add license headers to 62 Python source files that were missing them:
- Files with first commit before fork point get the OSB-lineage header
(SPDX + "Modifications by Apache Solr" + OpenSearch Contributors)
- Empty __init__.py namespace markers are excluded from RAT instead
- Document `make rat` and `make release-checks` in DEVELOPER_GUIDE.md
- Makefile: replace deprecated RAT flags -E/-d with --input-exclude-file/-- (deprecated since RAT 0.17) - Makefile: verify SHA-512 checksum of downloaded RAT JAR - scripts/release-checks.sh: harden with set -euo pipefail and cd to project root via dirname guard - DEVELOPER_GUIDE.md: replace X.Y.Z+1 placeholder with concrete version example (0.9.2 / 0.9.3)
Apache's .sha512 file contains a bare hex hash with no filename, not the `<hash> <file>` format shasum -c expects. Fix by: - Downloading the tarball to a temp file first - Constructing the shasum -c input inline via echo - Extracting the JAR from the verified tarball - Cleaning up the temp tarball and .sha512 file
The 62 pre-fork files added in the RAT commit had only the short OSB snippet as a placeholder. Replace with a header that: - Credits original OpenSearch Contributors authorship - Notes the license header was absent in the original source - Includes the full ASF Apache-2.0 boilerplate Files that already carried the full Elasticsearch/ASF header are untouched.
- Replace curl+shasum shell gymnastics with scripts/download-rat.py: uses urllib.request + hashlib.sha512 + tarfile, no curl dependency - Move JAR cache from ~/.cache/apache-rat/ to ~/.solr-orbit/cache/apache-rat/ to keep all project runtime state under one directory - Add java availability check in the rat target with a clear error message - Update DEVELOPER_GUIDE.md to reflect the new cache path and Python requirement
Replace the overly broad **/__init__.py glob with specific patterns covering only the 0-byte package markers. Non-empty __init__.py files carry a full license header and will now be checked by RAT as intended.
…s 404 downloads.apache.org only hosts the current release; older pinned versions move to archive.apache.org. Add download_with_fallback() that retries on the archive mirror so 'make rat' keeps working after a new RAT release.
Apache .sha512 files are not bare hex strings; they use either GNU
coreutils format ("<hex> <filename>") or BSD format
("SHA512 (<file>) = <hex>"). Extract the hex token before comparing
so SHA-512 verification does not always fail on a fresh machine.
…able
With set -euo pipefail, assigning \$1 / \$2 when the script is called
without arguments aborts before the -z guard can print the usage message.
Use \${1:-} / \${2:-} so the shell treats missing args as empty strings
and lets the existing -z check handle the error path.
File is redundant — LICENSE and NOTICE already cover dependency attribution per ASF policy. Also had a duplicate row for solr-orbit.
Ruff (F541) flagged the f-string in download_with_fallback(); drop the f prefix since the string contains no interpolation.
janhoy
commented
May 31, 2026
| # | ||
| # Originally developed by OpenSearch Contributors; licensed under the Apache License, Version 2.0. | ||
| # License header was absent in the original source; added when adopted into Apache Solr Orbit. | ||
| # Modified by Apache Solr contributors; see git log for details. |
Contributor
Author
There was a problem hiding this comment.
Please help decide what we should use as license header for Python files that did not have any header from Opensearch Benchmark project, like this. I suggest this one, which has the normal Apache header but this three line notice on top to say the file was originally authored by OSB and then modified by Solr.
I felt it was overkill to add the full "The Opensearch project requires contributors... bla bla", and the Rally stuff. These three lines should be enough to let people know the origin of the file, and the git history reveals it all.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #8. Adds Apache RAT license audit infrastructure so we can satisfy ASF release policy.
Changes
make rat— downloads RAT 0.18 JAR to~/.solr-orbit/cache/apache-rat/viascripts/download-rat.py(pure Python, nocurl), verifies SHA-512, then runs the audit. Errors ifjavais not onPATH..rat-excludes— excludes files that legitimately carry no header (bytecode, config/data files, test fixtures, binaries, docs, CI metadata).make release-checks— callsscripts/release-checks.sh; runs RAT as the first pre-flight step.DEVELOPER_GUIDE.md— new Release section documenting the above.The header added to py files that had no previous header is this
How to review
make rat— should complete with 0 unapproved licenses..rat-excludesfor any patterns that seem too broad.solrorbit/aggregator.py,solrorbit/builder/launchers/docker_launcher.py) to confirm the header is correct.scripts/download-rat.pyfor the download + verify logic.