Skip to content

fix(linear-summarization): Remove interaction term check to enable features with colons#183

Open
tonywu1999 wants to merge 1 commit intodevelfrom
fix-linear-summary
Open

fix(linear-summarization): Remove interaction term check to enable features with colons#183
tonywu1999 wants to merge 1 commit intodevelfrom
fix-linear-summary

Conversation

@tonywu1999
Copy link
Contributor

@tonywu1999 tonywu1999 commented Feb 18, 2026

Motivation and Context

There looked to be problems w.r.t. features containing colons in them. Notably, here:

NumericVector find_colon = grep(":", coef_names);
NumericVector just_runs = setdiff(find_runs, find_colon);
NumericVector just_features = setdiff(find_features, find_colon);

I believe this was written to find interaction terms (e.g., RUN:FEATURE), and then setdiff/intersect against find_runs and find_features is used to distinguish "just runs", "just features", and "run:feature interactions".

But if a Run ID or Feature ID contains a colon (like UniMod:7), those indices end up in find_colon, and then get incorrectly excluded from just_features or just_runs via setdiff. So legitimate main-effect terms get treated as if they were interaction terms and dropped.

Looking at the model formulas, there are no interaction terms anywhere in this code. All models use only main effects, so the simplest fix is to just remove find_colon entirely and clean up the functions that use it:

Changes

Please provide a detailed bullet point list of your changes.

  • Remove all references to finding indices to colons in the linear summarization function.

Testing

  • Verified datasets with features containing only colons were processed properly

Checklist Before Requesting a Review

  • I have read the MSstats contributing guidelines
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules
  • I have run the devtools::document() command after my changes and committed the added files

@coderabbitai
Copy link

coderabbitai bot commented Feb 18, 2026

📝 Walkthrough

Walkthrough

Removed the find_colon parameter from get_features() and get_run() functions in src/linear_summary.cpp. These functions now directly index features and runs without separate colon-derived filtering logic. Updated all call sites in make_contrast_run_quant() to match the new function signatures.

Changes

Cohort / File(s) Summary
Function signature updates
src/linear_summary.cpp
Removed find_colon parameter from get_features() and get_run() functions. Updated implementations to derive temporary feature/run vectors directly from coef_names[find_features] and coef_names[find_runs] respectively. Updated call sites in make_contrast_run_quant() to invoke these functions with the new signatures (5 parameters for get_features(), 5 parameters for get_run()). Simplified feature/run filtering logic by eliminating separate colon-based indexing.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 Away with the colon, that shadowed the call,
Direct indexing now brightens it all!
Simpler signatures, cleaner and lean,
The finest refactoring I've seen—
Features flow pure, no filters to hide,
A rabbit hops happy with logic as guide! 🐇

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately reflects the main change: removing an interaction term check (the find_colon filtering logic) to enable features with colons in their names.
Description check ✅ Passed The pull request description addresses the motivation, provides specific changes, includes testing notes, but the template checklist items are unchecked.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix-linear-summary

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

Failed to generate code suggestions for PR

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/linear_summary.cpp (1)

29-40: ⚠️ Potential issue | 🟠 Major

Fix is correct for colon-named features, but may double-count genuine FEATURE:RUN interaction terms

The removal of find_colon-based exclusion correctly resolves the bug where feature coefficients whose names contain a colon (e.g., "FEATUREK.ABCDE:FGHIJ.R") were incorrectly filtered out.

However, find_features = grep("FEATURE", coef_names) can also match genuine FEATURE:RUN interaction-term coefficients (e.g., "FEATUREpeptide1:RUNrun1"), since those names contain the substring "FEATURE". Such terms are simultaneously present in find_runs as well and are already handled by get_feature_run() via intersect(find_runs, find_features). Including them here too causes double/triple-counting in the combined contrast vector.

Additionally, if interaction terms inflate temp_feature.length() beyond counts.ncol() - 1 (the length returned by get_feature_props()), the feature.attr("names") = temp_feature assignment becomes a length-mismatch that Rcpp will propagate as an error or silent corruption.

A safer approach is to compute pure (non-interaction) feature/run index sets in make_contrast_run_quant before calling these functions, so get_feature_run() remains the sole owner of interaction terms:

💡 Proposed fix in make_contrast_run_quant
     NumericVector find_features = grep("FEATURE", coef_names);
     NumericVector find_runs = grep("RUN", coef_names);
     NumericVector find_ref = grep("ref", coef_names);
+    // Terms matching BOTH patterns are FEATURE:RUN interaction terms;
+    // keep them exclusively in get_feature_run() to avoid double-counting.
+    NumericVector find_interaction = intersect(find_features, find_runs);
+    NumericVector find_pure_features = setdiff(find_features, find_interaction);
+    NumericVector find_pure_runs    = setdiff(find_runs,     find_interaction);
 
     NumericVector intercept = get_intercept(coef_names);
-    NumericVector features = get_features(coef_names, find_features,
+    NumericVector features = get_features(coef_names, find_pure_features,
                                           contrast_matrix, counts, input);
-    NumericVector runs = get_run(coef_names, find_runs,
+    NumericVector runs = get_run(coef_names, find_pure_runs,
                                  contrast_matrix, is_labeled, n_runs);

get_feature_run() already receives the original find_runs/find_features and uses intersect() internally, so interaction terms continue to be handled there without any change to its call site.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/linear_summary.cpp` around lines 29 - 40, get_features currently uses
find_features derived via grep("FEATURE", coef_names) which also matches
FEATURE:RUN interaction terms, causing double-counting and potential name-length
mismatches; fix by computing pure (non-interaction) feature and run index sets
in make_contrast_run_quant (i.e., remove any coef_names containing ':' from the
feature set before passing to get_features) so get_feature_run remains the sole
handler of interactions, then call get_features with that filtered find_features
(or alternatively have get_features filter out entries with ':' itself) and
ensure the returned temp_feature length matches counts.ncol()-1 before assigning
names to avoid Rcpp length-mismatch errors.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/linear_summary.cpp`:
- Around line 29-40: get_features currently uses find_features derived via
grep("FEATURE", coef_names) which also matches FEATURE:RUN interaction terms,
causing double-counting and potential name-length mismatches; fix by computing
pure (non-interaction) feature and run index sets in make_contrast_run_quant
(i.e., remove any coef_names containing ':' from the feature set before passing
to get_features) so get_feature_run remains the sole handler of interactions,
then call get_features with that filtered find_features (or alternatively have
get_features filter out entries with ':' itself) and ensure the returned
temp_feature length matches counts.ncol()-1 before assigning names to avoid Rcpp
length-mismatch errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments