Skip to content

Use a short SPDX license header for LLM-centered files#1489

Open
Dev-iL wants to merge 1 commit intoapache:mainfrom
SummitSG-LLC:2602/spdx
Open

Use a short SPDX license header for LLM-centered files#1489
Dev-iL wants to merge 1 commit intoapache:mainfrom
SummitSG-LLC:2602/spdx

Conversation

@Dev-iL
Copy link
Collaborator

@Dev-iL Dev-iL commented Feb 22, 2026

Following the approach from apache/airflow#62073 and apache/airflow#62145, files intended for LLM/agent consumption (not distributed in releases) now use a minimal SPDX license identifier instead of the full Apache 2.0 header - for LLM token efficiency.

See also:
https://lists.apache.org/thread/j1tn63r2lf13v3d1tnnqff8fkcl4nx53

Changes

  • Mark the .github folder as export-ignore.
  • Add a short and long license templates.
  • Add pre-commit hooks to ensure the right license header exists in every file.
  • Add missing license headers to two PR templates.

How I tested this

  • Hooks pass locally.

Notes

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep)
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • Project documentation has been updated if adding/changing functionality.

Comment on lines -2 to -5
# Original work Copyright 2017 Palantir Technologies, Inc. #
# Original work licensed under the MIT License. #
# See ThirdPartyNotices.txt in the project root for license information. #
# All modifications Copyright (c) Open Law Library. All rights reserved. #
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file seems like a generic conftest, not sure why it had all of the above.

Comment on lines -2 to -3
# Copyright(c) Open Law Library. All rights reserved. #
# See ThirdPartyNotices.txt in the project root for additional notices. #
Copy link
Collaborator Author

@Dev-iL Dev-iL Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this might be incorrect in this case. Is this here because the code was vendored in from pygls?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I dont' recall. so maybe revert?

Copy link
Collaborator Author

@Dev-iL Dev-iL Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be moved to a NOTICE file if the code in question is licensed under ALv2 too? CC: @potiuk

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It should be placed in the NOTICE file https://infra.apache.org/licensing-howto.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why delete?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why delete?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines -26 to -28
- name: Check for missing Apache 2 license headers
run: python3 scripts/check_license_headers.py
working-directory: ${{ github.workspace }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove?

Copy link
Collaborator Author

@Dev-iL Dev-iL Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This answers all similar comments:

After the proposed change, license headers are being enforced by pre-commit hooks. This approach has several benefits:

  1. Contributors can tell there's an issue before getting to ci
  2. Coverage isn't lost since hooks should run on ci anyway as part of static checks
  3. No need to maintain license enforcement scripts
  4. Hooks were more thorough and detected missing licenses that the ci missed

Whereas the main downside is it's somewhat harder to customize if a specific file requires special treatment.

Following the approach from Apache Airflow PRs #62073 and #62145, files intended for LLM/agent consumption (not distributed in releases) now use a minimal SPDX license identifier instead of the full Apache 2.0 header - for LLM token efficiency.

See also:
https://lists.apache.org/thread/j1tn63r2lf13v3d1tnnqff8fkcl4nx53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants