Add comprehensive tests for feature-flagged tool handling #1747

Copilot · 2026-01-06T09:54:07Z

Summary

Adds regression tests for feature flag tool filtering to prevent issues like copilot-cli#878 (get_job_logs tool call failures due to incorrect filtering of same-name tools).

Why

PR #1719 fixed a bug where filterToolsByName only returned the first matching tool, breaking feature flag filtering when renaming tools. This PR adds generic tests to prevent regression.

What changed

Added 4 generic test suites validating feature flag behavior across all combinations:
- TestFeatureFlaggedToolsAreMutuallyExclusive (pkg/github/tools_validation_test.go): Validates proper feature flag configuration ensuring no omissions, incorrect activations, or duplicate active tools across all flag states
- TestToolsList_WithFeatureFlags (pkg/inventory/registry_test.go): Tests tools/list endpoint with various flag states, ensuring no duplicates or omissions
- TestToolsCall_WithFeatureFlags (pkg/inventory/registry_test.go): Tests tools/call (ForMCPRequest) with feature flags
- TestNoDuplicateToolsInAnyFeatureFlagCombination (pkg/inventory/registry_test.go): Exhaustive test of all 2^n flag combinations
Manually validated get_job_logs tool works correctly with real GitHub Actions workflows

MCP impact

No tool or API changes - Only test coverage added

Prompts tested (tool changes only)

N/A

Security / limits

No security or limits impact

Tool renaming

I am not renaming tools as part of this PR

Lint & tests

Linted locally with ./script/lint
Tested locally with ./script/test

Docs

Not needed

Original prompt

Please look into github/copilot-cli#878 and see if you need to patch anything and test anything to safe;y handle same-name tool renames, and listing and calling tools with various feature flag states etc. this should be reproducable and fixable, with prevention of regression if we get things. We could also add a test that validates that there are no duplicate tools with any combo of feature flags etc too.

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: SamMorrowDrums <4811358+SamMorrowDrums@users.noreply.github.com>

SamMorrowDrums · 2026-01-06T11:14:42Z

pkg/github/actions_test.go

+// TestGetJobLogsFeatureFlagScenario tests the specific scenario from GitHub issue #878
+// where get_job_logs tool calls were failing with "unknown tool" errors.
+// This validates that both GetJobLogs and ActionsGetJobLogs tools are properly
+// handled with feature flags and that only one is active at a time.


@copilot I don't want a specific test, only generic. Delete this addition. Now it's fixed it's not useful.

Done. Removed the specific test in commit 727445f. The generic tests provide comprehensive coverage across all feature flag combinations.

Co-authored-by: SamMorrowDrums <4811358+SamMorrowDrums@users.noreply.github.com>

SamMorrowDrums · 2026-01-06T13:25:44Z

pkg/inventory/registry_test.go

+// TestToolsList_WithFeatureFlags validates that tools/list returns only the tools
+// available based on the current feature flag state, without duplicates


@copilot without duplicates or omissions

Updated the comment in commit ef7117a to clarify the test validates "without duplicates or omissions".

Co-authored-by: SamMorrowDrums <4811358+SamMorrowDrums@users.noreply.github.com>

SamMorrowDrums · 2026-01-06T14:30:20Z

pkg/github/tools_validation_test.go

+// the same name with different feature flags, they are mutually exclusive (one uses
+// FeatureFlagEnable while the other uses FeatureFlagDisable with the same flag name).
+// This ensures there are no conflicts where two tools with the same name could be active
+// simultaneously, which would cause "unknown tool" errors.


@copilot this is not quite correct. unknown tool errors came from two tools with same name, and only one of them being returned from filter, and then being filterered out by feature flag. Actually duplicate tools would result in only one registered (as servers wouldn't allow duplicate name tools to be registered and returned), possible the wrong one which is also a problem. The main reason for the request to test this generically is to ensure that as new tools and consolidation work occur etc. we don't miss something like this again.

We are protecting against both:

a tool showing up when it shouldn't

two or more tools showing up with same name when they shouldn't

no tools showing up when one should

And specifically we are looking at features, because we already have duplicate protection otherwise.

Updated the test comment in commit 81c8337 to accurately describe what we're protecting against:

At most one tool variant active for any feature flag state

A tool shows up when it should (no omissions)

No tool shows up when it shouldn't (no incorrect activations)

No duplicate tool names active simultaneously

This correctly reflects that the test prevents issues where filtering returns the wrong variant or no variant at all, rather than just "unknown tool" errors.

Co-authored-by: SamMorrowDrums <4811358+SamMorrowDrums@users.noreply.github.com>

Initial plan

43bc950

Copilot AI assigned Copilot and SamMorrowDrums Jan 6, 2026

Copilot started work on behalf of SamMorrowDrums January 6, 2026 09:54 View session

Copilot AI and others added 2 commits January 6, 2026 10:06

Add comprehensive tests for feature-flagged tool handling

6c1bde3

Co-authored-by: SamMorrowDrums <4811358+SamMorrowDrums@users.noreply.github.com>

Complete feature flag testing with successful validation

f760044

Co-authored-by: SamMorrowDrums <4811358+SamMorrowDrums@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Fix handling of tool renames and feature flags~~ Add comprehensive tests for feature-flagged tool handling Jan 6, 2026

Copilot AI requested a review from SamMorrowDrums January 6, 2026 10:16

Copilot finished work on behalf of SamMorrowDrums January 6, 2026 10:16

SamMorrowDrums reviewed Jan 6, 2026

View reviewed changes

Copilot started work on behalf of SamMorrowDrums January 6, 2026 11:15 View session

Remove specific get_job_logs test - generic tests are sufficient

727445f

Co-authored-by: SamMorrowDrums <4811358+SamMorrowDrums@users.noreply.github.com>

Copilot AI requested a review from SamMorrowDrums January 6, 2026 11:20

Copilot finished work on behalf of SamMorrowDrums January 6, 2026 11:20

SamMorrowDrums reviewed Jan 6, 2026

View reviewed changes

Copilot started work on behalf of SamMorrowDrums January 6, 2026 13:26 View session

Update comment to clarify no duplicates or omissions

ef7117a

Co-authored-by: SamMorrowDrums <4811358+SamMorrowDrums@users.noreply.github.com>

Copilot AI requested a review from SamMorrowDrums January 6, 2026 13:29

Copilot finished work on behalf of SamMorrowDrums January 6, 2026 13:29

SamMorrowDrums reviewed Jan 6, 2026

View reviewed changes

Copilot started work on behalf of SamMorrowDrums January 6, 2026 14:30 View session

Update test comment to accurately describe protections

81c8337

Co-authored-by: SamMorrowDrums <4811358+SamMorrowDrums@users.noreply.github.com>

Copilot finished work on behalf of SamMorrowDrums January 6, 2026 14:35

Copilot AI requested a review from SamMorrowDrums January 6, 2026 14:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add comprehensive tests for feature-flagged tool handling #1747

Add comprehensive tests for feature-flagged tool handling #1747

Copilot AI commented Jan 6, 2026 •

edited

Loading

Uh oh!

SamMorrowDrums Jan 6, 2026

Uh oh!

Copilot AI Jan 6, 2026

Uh oh!

SamMorrowDrums Jan 6, 2026

Uh oh!

Copilot AI Jan 6, 2026

Uh oh!

SamMorrowDrums Jan 6, 2026

Uh oh!

Copilot AI Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		// TestToolsList_WithFeatureFlags validates that tools/list returns only the tools
		// available based on the current feature flag state, without duplicates

Add comprehensive tests for feature-flagged tool handling #1747

Are you sure you want to change the base?

Add comprehensive tests for feature-flagged tool handling #1747

Conversation

Copilot AI commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

What changed

MCP impact

Prompts tested (tool changes only)

Security / limits

Tool renaming

Lint & tests

Docs

Uh oh!

SamMorrowDrums Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

SamMorrowDrums Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

SamMorrowDrums Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Jan 6, 2026 •

edited

Loading