fix(linear-summarization): Remove interaction term check to enable features with colons by tonywu1999 · Pull Request #183 · Vitek-Lab/MSstats

tonywu1999 · 2026-02-18T18:34:12Z

Motivation and Context

There looked to be problems w.r.t. features containing colons in them. Notably, here:

NumericVector find_colon = grep(":", coef_names);
NumericVector just_runs = setdiff(find_runs, find_colon);
NumericVector just_features = setdiff(find_features, find_colon);

I believe this was written to find interaction terms (e.g., RUN:FEATURE), and then setdiff/intersect against find_runs and find_features is used to distinguish "just runs", "just features", and "run:feature interactions".

But if a Run ID or Feature ID contains a colon (like UniMod:7), those indices end up in find_colon, and then get incorrectly excluded from just_features or just_runs via setdiff. So legitimate main-effect terms get treated as if they were interaction terms and dropped.

Looking at the model formulas, there are no interaction terms anywhere in this code. All models use only main effects, so the simplest fix is to just remove find_colon entirely and clean up the functions that use it:

Changes

Please provide a detailed bullet point list of your changes.

Remove all references to finding indices to colons in the linear summarization function.

Testing

Verified datasets with features containing only colons were processed properly

Checklist Before Requesting a Review

I have read the MSstats contributing guidelines
My changes generate no new warnings
Any dependent changes have been merged and published in downstream modules
I have run the devtools::document() command after my changes and committed the added files

…atures with colons

coderabbitai · 2026-02-18T18:34:31Z

📝 Walkthrough

Walkthrough

Removed the find_colon parameter from get_features() and get_run() functions in src/linear_summary.cpp. These functions now directly index features and runs without separate colon-derived filtering logic. Updated all call sites in make_contrast_run_quant() to match the new function signatures.

Changes

Cohort / File(s)	Summary
Function signature updates `src/linear_summary.cpp`	Removed `find_colon` parameter from `get_features()` and `get_run()` functions. Updated implementations to derive temporary feature/run vectors directly from `coef_names[find_features]` and `coef_names[find_runs]` respectively. Updated call sites in `make_contrast_run_quant()` to invoke these functions with the new signatures (5 parameters for `get_features()`, 5 parameters for `get_run()`). Simplified feature/run filtering logic by eliminating separate colon-based indexing.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 Away with the colon, that shadowed the call,
Direct indexing now brightens it all!
Simpler signatures, cleaner and lean,
The finest refactoring I've seen—
Features flow pure, no filters to hide,
A rabbit hops happy with logic as guide! 🐇

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately reflects the main change: removing an interaction term check (the find_colon filtering logic) to enable features with colons in their names.
Description check	✅ Passed	The pull request description addresses the motivation, provides specific changes, includes testing notes, but the template checklist items are unchecked.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix-linear-summary

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-02-18T18:35:01Z

Failed to generate code suggestions for PR

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/linear_summary.cpp (1)
29-40: ⚠️ Potential issue | 🟠 Major

Fix is correct for colon-named features, but may double-count genuine FEATURE:RUN interaction terms

The removal of find_colon-based exclusion correctly resolves the bug where feature coefficients whose names contain a colon (e.g., "FEATUREK.ABCDE:FGHIJ.R") were incorrectly filtered out.

However, find_features = grep("FEATURE", coef_names) can also match genuine FEATURE:RUN interaction-term coefficients (e.g., "FEATUREpeptide1:RUNrun1"), since those names contain the substring "FEATURE". Such terms are simultaneously present in find_runs as well and are already handled by get_feature_run() via intersect(find_runs, find_features). Including them here too causes double/triple-counting in the combined contrast vector.

Additionally, if interaction terms inflate temp_feature.length() beyond counts.ncol() - 1 (the length returned by get_feature_props()), the feature.attr("names") = temp_feature assignment becomes a length-mismatch that Rcpp will propagate as an error or silent corruption.

A safer approach is to compute pure (non-interaction) feature/run index sets in make_contrast_run_quant before calling these functions, so get_feature_run() remains the sole owner of interaction terms:
💡 Proposed fix in make_contrast_run_quant
     NumericVector find_features = grep("FEATURE", coef_names);
     NumericVector find_runs = grep("RUN", coef_names);
     NumericVector find_ref = grep("ref", coef_names);
+    // Terms matching BOTH patterns are FEATURE:RUN interaction terms;
+    // keep them exclusively in get_feature_run() to avoid double-counting.
+    NumericVector find_interaction = intersect(find_features, find_runs);
+    NumericVector find_pure_features = setdiff(find_features, find_interaction);
+    NumericVector find_pure_runs    = setdiff(find_runs,     find_interaction);
 
     NumericVector intercept = get_intercept(coef_names);
-    NumericVector features = get_features(coef_names, find_features,
+    NumericVector features = get_features(coef_names, find_pure_features,
                                           contrast_matrix, counts, input);
-    NumericVector runs = get_run(coef_names, find_runs,
+    NumericVector runs = get_run(coef_names, find_pure_runs,
                                  contrast_matrix, is_labeled, n_runs);
get_feature_run() already receives the original find_runs/find_features and uses intersect() internally, so interaction terms continue to be handled there without any change to its call site.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/linear_summary.cpp` around lines 29 - 40, get_features currently uses
find_features derived via grep("FEATURE", coef_names) which also matches
FEATURE:RUN interaction terms, causing double-counting and potential name-length
mismatches; fix by computing pure (non-interaction) feature and run index sets
in make_contrast_run_quant (i.e., remove any coef_names containing ':' from the
feature set before passing to get_features) so get_feature_run remains the sole
handler of interactions, then call get_features with that filtered find_features
(or alternatively have get_features filter out entries with ':' itself) and
ensure the returned temp_feature length matches counts.ncol()-1 before assigning
names to avoid Rcpp length-mismatch errors.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/linear_summary.cpp`:
- Around line 29-40: get_features currently uses find_features derived via
grep("FEATURE", coef_names) which also matches FEATURE:RUN interaction terms,
causing double-counting and potential name-length mismatches; fix by computing
pure (non-interaction) feature and run index sets in make_contrast_run_quant
(i.e., remove any coef_names containing ':' from the feature set before passing
to get_features) so get_feature_run remains the sole handler of interactions,
then call get_features with that filtered find_features (or alternatively have
get_features filter out entries with ':' itself) and ensure the returned
temp_feature length matches counts.ncol()-1 before assigning names to avoid Rcpp
length-mismatch errors.

fix(linear-summarization): Remove interaction term check to enable fe…

136d3a2

…atures with colons

coderabbitai bot reviewed Feb 18, 2026

View reviewed changes

tonywu1999 requested a review from devonjkohler February 18, 2026 21:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(linear-summarization): Remove interaction term check to enable features with colons#183

fix(linear-summarization): Remove interaction term check to enable features with colons#183
tonywu1999 wants to merge 1 commit intodevelfrom
fix-linear-summary

tonywu1999 commented Feb 18, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Feb 18, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions bot commented Feb 18, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

tonywu1999 commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation and Context

Changes

Testing

Checklist Before Requesting a Review

Uh oh!

coderabbitai bot commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions bot commented Feb 18, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

tonywu1999 commented Feb 18, 2026 •

edited

Loading

coderabbitai bot commented Feb 18, 2026 •

edited

Loading