Skip to content

feat(Sources): Generate Synthetic Git Repo#17

Open
Tonisal-byte wants to merge 16 commits intomicrosoft:mainfrom
Tonisal-byte:asalinas/synthetic-git-generator
Open

feat(Sources): Generate Synthetic Git Repo#17
Tonisal-byte wants to merge 16 commits intomicrosoft:mainfrom
Tonisal-byte:asalinas/synthetic-git-generator

Conversation

@Tonisal-byte
Copy link

These changes enable the automatic generation of a synthetic Git repository that merges a component’s upstream Git history with additional commits layered from the Azure Linux configuration repository. Commits pertaining to a component in the Azure Linux configuration repository are expected to have an explicit Affects: <component-name> to apply the upstream commit into the generated commit history for that component.

This pull request refactors and enhances the source preparation logic for components, focusing on overlay application and synthetic git history generation. The changes improve modularity, clarity, and reliability of overlay handling, and introduce better support for preserving git history when overlays are applied. Additionally, the pull request updates dependencies and adapts tests to the new interfaces.

Core logic refactoring and enhancements:

  • Refactored the overlay application process in sourceprep.go by splitting it into smaller, focused methods: overlays are now collected and applied in a defined order, and synthetic git history is generated in a dedicated step. Overlay application is now decoupled from git history generation, ensuring overlays are always applied, even if no git repository is present. [1] [2] [3]
  • Introduced preservation of the upstream .git directory when overlays are applied, enabling synthetic history generation for release numbering and delta builds.
  • Removed the large and complex postProcessSources method, replacing it with modular helpers for overlay collection, application, and spec path resolution. [1] [2]

Test and interface updates:

  • Updated tests and mocks to match the new FetchComponent interface, which now accepts variadic options to support features like .git directory preservation. [1] [2] [3] [4]

Copilot AI review requested due to automatic review settings March 19, 2026 18:06
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors component source preparation to support generating a synthetic Git history by preserving upstream .git directories when overlays are applied, and by deriving synthetic commits from project-repo commits annotated with Affects: <component-name>.

Changes:

  • Added functional options to FetchComponent / GetComponent to optionally preserve upstream .git directories.
  • Refactored overlay application into ordered collection + application, followed by optional synthetic history generation.
  • Introduced synthistory.go (and tests) to find Affects: commits and create synthetic commits in the upstream repo.

Reviewed changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
internal/providers/sourceproviders/sourceproviders_test/sourcemanager_mocks.go Updates gomock stubs for the new variadic FetchComponent options.
internal/providers/sourceproviders/sourcemanager.go Adds FetchComponentOption + wires options through SourceManager and providers.
internal/providers/sourceproviders/rpmcontentsprovider.go Adapts RPM provider to the new GetComponent(...opts) signature (opts ignored).
internal/providers/sourceproviders/fedorasourceprovider.go Implements .git preservation behavior based on resolved fetch options.
internal/projectconfig/configfile.go Exposes config file path/dir via accessors needed for repo discovery.
internal/app/azldev/core/sources/synthistory.go Adds synthetic history discovery/creation logic (Affects parsing, repo dirty check, commit creation).
internal/app/azldev/core/sources/synthistory_test.go Adds unit tests for affects discovery, dirty detection, and synthetic commit creation.
internal/app/azldev/core/sources/sourceprep.go Refactors overlay application flow and integrates synthetic history generation.
internal/app/azldev/core/sources/sourceprep_test.go Updates tests/mocks for new FetchComponent(...opts) call shape.
internal/app/azldev/core/componentbuilder/componentbuilder_test.go Updates builder tests for new variadic FetchComponent signature.
go.mod Adds go-git dependencies and bumps indirect deps (otel/proto/protobuf/etc.).
go.sum Records updated module sums for added/bumped dependencies.

You can also share your feedback on Copilot code review. Take the survey.

@ddstreet
Copy link

ddstreet commented Mar 20, 2026

todo: Also, after this is merged, we'll need to work on getting the release value correct - for %autorelease (and %autochangelog) pkgs it should 'just work', but for pkgs that still manually set the release number, this will have to increment it during building of the dist-git repo commits (and add changelog entries)

And, we'll need to settle on how to handle intermediate/uncommitted package release values (i.e. if using the --include-latest and/or --include-uncommitted params as mentioned above)

@ddstreet
Copy link

ddstreet commented Mar 20, 2026

thought: we'll need some way to integrate this functionality with packages where the Release: value is complex (and manually managed); maybe something like Affects-manual-release: <PKG> where we create a dist-git commit, but rely on the toml to actually perform the release value change (although we probably could still add a changelog entry?)

@ddstreet
Copy link

maybe something like Affects-manual-release:

or maybe Affects(norelease): or something like that, where keyword(s) inside the () could be used as flags to tweak our specific behavior

Copilot AI review requested due to automatic review settings March 20, 2026 22:38
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors component source preparation to support generating a synthetic git history that layers “Affects: ” commits from the project repo on top of upstream sources, and adds a --no-git escape hatch for workflows that run outside a git checkout.

Changes:

  • Introduces synthetic history utilities (FindAffectsCommits, CommitSyntheticHistory) and integrates them into source preparation.
  • Extends SourceManager / upstream providers to optionally preserve the upstream .git directory via functional options.
  • Adds --no-git to build/diff/prepare-sources commands and updates scenario artifacts + generated CLI docs.

Reviewed changes

Copilot reviewed 19 out of 20 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
scenario/internal/buildtest/buildtest.go Scenario helper updated to pass --no-git during component build.
scenario/snapshots/TestMCPServerMode_1.snap.json Snapshot updated for the new no-git flag.
internal/providers/sourceproviders/sourceproviders_test/sourcemanager_mocks.go Mock updated for variadic FetchComponentOption.
internal/providers/sourceproviders/sourcemanager.go Adds FetchComponentOption (e.g., preserve .git) and threads options through fetch paths.
internal/providers/sourceproviders/rpmcontentsprovider.go Accepts fetch options (ignored) to satisfy interface.
internal/providers/sourceproviders/fedorasourceprovider.go Preserves upstream .git directory when requested.
internal/projectconfig/configfile.go Exposes config file source path/dir accessors used by synthetic history logic.
internal/app/azldev/core/sources/synthistory.go New synthetic history implementation and project-repo discovery helpers.
internal/app/azldev/core/sources/synthistory_test.go Tests for synthetic history helpers.
internal/app/azldev/core/sources/sourceprep_test.go Updates mocks/signatures to match new interfaces.
internal/app/azldev/core/sources/sourceprep.go Refactors overlay flow and integrates synthetic history generation + --no-git option.
internal/app/azldev/core/componentbuilder/componentbuilder_test.go Updates mock FetchComponent signature usage.
internal/app/azldev/cmds/component/preparesources.go Adds --no-git flag wiring into the source preparer.
internal/app/azldev/cmds/component/diffsources.go Adds --no-git flag wiring into the source preparer.
internal/app/azldev/cmds/component/build.go Adds --no-git flag wiring into the source preparer.
go.mod / go.sum Adds go-git/go-billy and updates transitive dependencies.
docs/user/reference/cli/azldev_component_prepare-sources.md Generated CLI docs updated for --no-git.
docs/user/reference/cli/azldev_component_diff-sources.md Generated CLI docs updated for --no-git.
docs/user/reference/cli/azldev_component_build.md Generated CLI docs updated for --no-git.

@Tonisal-byte
Copy link
Author

Focusing this PR on just generating dist-git repos correctly. Will create a follow up PR that addresses the scenarios in which we could like include dev changes with --includes and release bumping for non-%autorelease specs

Copilot AI review requested due to automatic review settings March 20, 2026 23:22
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 20 changed files in this pull request and generated 6 comments.

Author: &object.Signature{
Name: "azldev",
Email: "azldev@microsoft.com",
When: time.Now().UTC(),
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initSourcesRepo uses time.Now().UTC() for the initial commit timestamp. Because git commit hashes include timestamps, this makes the synthetic repo history non-deterministic across runs, which can undermine reproducibility and any downstream logic that keys off commit IDs. Consider using a deterministic timestamp (e.g., Unix epoch, or a timestamp derived from the upstream commit / first overlay commit) for the initial synthetic commit.

Suggested change
When: time.Now().UTC(),
// Use a deterministic timestamp so the initial synthetic commit hash is reproducible.
When: time.Unix(0, 0).UTC(),

Copilot uses AI. Check for mistakes.
Comment on lines 102 to 113
sourceManager, err := sourceproviders.NewSourceManager(env, distro)
if err != nil {
return nil, fmt.Errorf("failed to create source manager:\n%w", err)
}

preparer, err := sources.NewPreparer(sourceManager, env.FS(), env, env)
var preparerOpts []sources.PreparerOption
if options.NoGitRepo {
preparerOpts = append(preparerOpts, sources.WithNoGitRepo())
}

preparer, err := sources.NewPreparer(sourceManager, env.FS(), env, env, preparerOpts...)
if err != nil {
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--no-git / NoGitRepo is wired into sources.NewPreparer(...), but DiffSources never calls trySyntheticHistory (it only fetches sources and applies overlays). As a result, this flag currently has no behavioral effect for component diff-sources, which is confusing and makes the CLI surface area larger than necessary. Either remove the flag from diff-sources, or have diff-sources exercise the same synthetic-history path (with appropriate .git diff exclusions).

Copilot uses AI. Check for mistakes.
Comment on lines +44 to +47
// FindAffectsCommits walks the git log from HEAD and returns metadata for all commits
// whose message contains "Affects: <componentName>". Results are sorted chronologically
// (oldest first).
func FindAffectsCommits(repo *gogit.Repository, componentName string) ([]CommitMetadata, error) {
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR introduces a new user workflow where project commits must include an Affects: <component-name> marker to be included in synthetic history. There doesn’t appear to be user-guide documentation explaining this convention (what it does, exact matching rules, and examples). Please add a short section under docs/user/ (likely a how-to or explanation page related to overlays/source preparation) describing how to use the Affects: marker and how it interacts with synthetic history generation.

Copilot uses AI. Check for mistakes.

var matches []CommitMetadata

re := regexp.MustCompile(affectsRegexPattern + regexp.QuoteMeta(componentName) + `\b`)
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FindAffectsCommits will match Affects: <component> as a prefix when the component name is followed by a non-word character (e.g. searching for curl matches a commit with Affects: curl-minimal because \b treats - as a boundary). This can cause commits intended for one component to be incorrectly applied to another. Consider requiring an exact component-name match (e.g., end-of-line or whitespace delimiter) rather than a word-boundary match.

Suggested change
re := regexp.MustCompile(affectsRegexPattern + regexp.QuoteMeta(componentName) + `\b`)
re := regexp.MustCompile(affectsRegexPattern + regexp.QuoteMeta(componentName) + `(?:$|\s|[,;])`)

Copilot uses AI. Check for mistakes.
Comment on lines +44 to +56
// FindAffectsCommits walks the git log from HEAD and returns metadata for all commits
// whose message contains "Affects: <componentName>". Results are sorted chronologically
// (oldest first).
func FindAffectsCommits(repo *gogit.Repository, componentName string) ([]CommitMetadata, error) {
head, err := repo.Head()
if err != nil {
return nil, fmt.Errorf("failed to get HEAD reference:\n%w", err)
}

commitIter, err := repo.Log(&gogit.LogOptions{From: head.Hash()})
if err != nil {
return nil, fmt.Errorf("failed to iterate commit log:\n%w", err)
}
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FindAffectsCommits walks the full project git history every time sources are prepared for a component. In multi-component builds, this becomes O(components × project-commits) work and can get expensive on large configuration repos. Consider caching the parsed "Affects" commit list per project repo (or pre-indexing once per run) and reusing it across components.

Copilot uses AI. Check for mistakes.
Comment on lines +168 to +170
// applyOverlaysToSources writes the macros file and then applies all overlays and
// records synthetic git history.
func (p *sourcePreparerImpl) applyOverlaysToSources(
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment on applyOverlaysToSources says it "records synthetic git history", but this helper currently only writes the macros file and applies overlays; synthetic history is generated separately in PrepareSources via trySyntheticHistory. Update the comment (or move the history generation here) so callers don’t assume history is being created.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants