-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Add comprehensive tests for feature-flagged tool handling #1747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: SamMorrowDrums <4811358+SamMorrowDrums@users.noreply.github.com>
Co-authored-by: SamMorrowDrums <4811358+SamMorrowDrums@users.noreply.github.com>
pkg/github/actions_test.go
Outdated
| // TestGetJobLogsFeatureFlagScenario tests the specific scenario from GitHub issue #878 | ||
| // where get_job_logs tool calls were failing with "unknown tool" errors. | ||
| // This validates that both GetJobLogs and ActionsGetJobLogs tools are properly | ||
| // handled with feature flags and that only one is active at a time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot I don't want a specific test, only generic. Delete this addition. Now it's fixed it's not useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Removed the specific test in commit 727445f. The generic tests provide comprehensive coverage across all feature flag combinations.
Co-authored-by: SamMorrowDrums <4811358+SamMorrowDrums@users.noreply.github.com>
pkg/inventory/registry_test.go
Outdated
| // TestToolsList_WithFeatureFlags validates that tools/list returns only the tools | ||
| // available based on the current feature flag state, without duplicates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot without duplicates or omissions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the comment in commit ef7117a to clarify the test validates "without duplicates or omissions".
Co-authored-by: SamMorrowDrums <4811358+SamMorrowDrums@users.noreply.github.com>
pkg/github/tools_validation_test.go
Outdated
| // the same name with different feature flags, they are mutually exclusive (one uses | ||
| // FeatureFlagEnable while the other uses FeatureFlagDisable with the same flag name). | ||
| // This ensures there are no conflicts where two tools with the same name could be active | ||
| // simultaneously, which would cause "unknown tool" errors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot this is not quite correct. unknown tool errors came from two tools with same name, and only one of them being returned from filter, and then being filterered out by feature flag. Actually duplicate tools would result in only one registered (as servers wouldn't allow duplicate name tools to be registered and returned), possible the wrong one which is also a problem. The main reason for the request to test this generically is to ensure that as new tools and consolidation work occur etc. we don't miss something like this again.
We are protecting against both:
- a tool showing up when it shouldn't
- two or more tools showing up with same name when they shouldn't
- no tools showing up when one should
And specifically we are looking at features, because we already have duplicate protection otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the test comment in commit 81c8337 to accurately describe what we're protecting against:
- At most one tool variant active for any feature flag state
- A tool shows up when it should (no omissions)
- No tool shows up when it shouldn't (no incorrect activations)
- No duplicate tool names active simultaneously
This correctly reflects that the test prevents issues where filtering returns the wrong variant or no variant at all, rather than just "unknown tool" errors.
Co-authored-by: SamMorrowDrums <4811358+SamMorrowDrums@users.noreply.github.com>
Summary
Adds regression tests for feature flag tool filtering to prevent issues like copilot-cli#878 (get_job_logs tool call failures due to incorrect filtering of same-name tools).
Why
PR #1719 fixed a bug where
filterToolsByNameonly returned the first matching tool, breaking feature flag filtering when renaming tools. This PR adds generic tests to prevent regression.What changed
TestFeatureFlaggedToolsAreMutuallyExclusive(pkg/github/tools_validation_test.go): Validates proper feature flag configuration ensuring no omissions, incorrect activations, or duplicate active tools across all flag statesTestToolsList_WithFeatureFlags(pkg/inventory/registry_test.go): Tests tools/list endpoint with various flag states, ensuring no duplicates or omissionsTestToolsCall_WithFeatureFlags(pkg/inventory/registry_test.go): Tests tools/call (ForMCPRequest) with feature flagsTestNoDuplicateToolsInAnyFeatureFlagCombination(pkg/inventory/registry_test.go): Exhaustive test of all 2^n flag combinationsMCP impact
Prompts tested (tool changes only)
N/A
Security / limits
Tool renaming
Lint & tests
./script/lint./script/testDocs
Original prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.