Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Jan 6, 2026

Summary

Adds regression tests for feature flag tool filtering to prevent issues like copilot-cli#878 (get_job_logs tool call failures due to incorrect filtering of same-name tools).

Why

PR #1719 fixed a bug where filterToolsByName only returned the first matching tool, breaking feature flag filtering when renaming tools. This PR adds generic tests to prevent regression.

What changed

  • Added 4 generic test suites validating feature flag behavior across all combinations:
    • TestFeatureFlaggedToolsAreMutuallyExclusive (pkg/github/tools_validation_test.go): Validates proper feature flag configuration ensuring no omissions, incorrect activations, or duplicate active tools across all flag states
    • TestToolsList_WithFeatureFlags (pkg/inventory/registry_test.go): Tests tools/list endpoint with various flag states, ensuring no duplicates or omissions
    • TestToolsCall_WithFeatureFlags (pkg/inventory/registry_test.go): Tests tools/call (ForMCPRequest) with feature flags
    • TestNoDuplicateToolsInAnyFeatureFlagCombination (pkg/inventory/registry_test.go): Exhaustive test of all 2^n flag combinations
  • Manually validated get_job_logs tool works correctly with real GitHub Actions workflows

MCP impact

  • No tool or API changes - Only test coverage added

Prompts tested (tool changes only)

N/A

Security / limits

  • No security or limits impact

Tool renaming

  • I am not renaming tools as part of this PR

Lint & tests

  • Linted locally with ./script/lint
  • Tested locally with ./script/test

Docs

  • Not needed
Original prompt

Please look into github/copilot-cli#878 and see if you need to patch anything and test anything to safe;y handle same-name tool renames, and listing and calling tools with various feature flag states etc. this should be reproducable and fixable, with prevention of regression if we get things. We could also add a test that validates that there are no duplicate tools with any combo of feature flags etc too.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 2 commits January 6, 2026 10:06
Co-authored-by: SamMorrowDrums <4811358+SamMorrowDrums@users.noreply.github.com>
Co-authored-by: SamMorrowDrums <4811358+SamMorrowDrums@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix handling of tool renames and feature flags Add comprehensive tests for feature-flagged tool handling Jan 6, 2026
Copilot AI requested a review from SamMorrowDrums January 6, 2026 10:16
Comment on lines 2304 to 2307
// TestGetJobLogsFeatureFlagScenario tests the specific scenario from GitHub issue #878
// where get_job_logs tool calls were failing with "unknown tool" errors.
// This validates that both GetJobLogs and ActionsGetJobLogs tools are properly
// handled with feature flags and that only one is active at a time.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot I don't want a specific test, only generic. Delete this addition. Now it's fixed it's not useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Removed the specific test in commit 727445f. The generic tests provide comprehensive coverage across all feature flag combinations.

Co-authored-by: SamMorrowDrums <4811358+SamMorrowDrums@users.noreply.github.com>
Comment on lines 1692 to 1693
// TestToolsList_WithFeatureFlags validates that tools/list returns only the tools
// available based on the current feature flag state, without duplicates
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot without duplicates or omissions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the comment in commit ef7117a to clarify the test validates "without duplicates or omissions".

Co-authored-by: SamMorrowDrums <4811358+SamMorrowDrums@users.noreply.github.com>
// the same name with different feature flags, they are mutually exclusive (one uses
// FeatureFlagEnable while the other uses FeatureFlagDisable with the same flag name).
// This ensures there are no conflicts where two tools with the same name could be active
// simultaneously, which would cause "unknown tool" errors.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot this is not quite correct. unknown tool errors came from two tools with same name, and only one of them being returned from filter, and then being filterered out by feature flag. Actually duplicate tools would result in only one registered (as servers wouldn't allow duplicate name tools to be registered and returned), possible the wrong one which is also a problem. The main reason for the request to test this generically is to ensure that as new tools and consolidation work occur etc. we don't miss something like this again.

We are protecting against both:

  • a tool showing up when it shouldn't
  • two or more tools showing up with same name when they shouldn't
  • no tools showing up when one should

And specifically we are looking at features, because we already have duplicate protection otherwise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the test comment in commit 81c8337 to accurately describe what we're protecting against:

  • At most one tool variant active for any feature flag state
  • A tool shows up when it should (no omissions)
  • No tool shows up when it shouldn't (no incorrect activations)
  • No duplicate tool names active simultaneously

This correctly reflects that the test prevents issues where filtering returns the wrong variant or no variant at all, rather than just "unknown tool" errors.

Co-authored-by: SamMorrowDrums <4811358+SamMorrowDrums@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants