Skip to content

refactor(diann): move q-value filtering to MSstatsClean for DIANN#116

Merged
tonywu1999 merged 5 commits intodevelfrom
refactor-diann
Feb 6, 2026
Merged

refactor(diann): move q-value filtering to MSstatsClean for DIANN#116
tonywu1999 merged 5 commits intodevelfrom
refactor-diann

Conversation

@tonywu1999
Copy link
Contributor

@tonywu1999 tonywu1999 commented Feb 6, 2026

Motivation and Context

See Vitek-Lab/MSstatsBig#10: q-value filtering is a process that is do-able with chunking because it's a simple filter that doesn't rely on other rows.

Changes

  • Move Q-value filtering to MSstatsClean so that the filtering can occur in MSstatsBig clean DIANN chunking function.

Testing

Added unit tests for the clean function.

Checklist Before Requesting a Review

  • I have read the MSstats contributing guidelines
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules

Summary by CodeRabbit

  • New Features

    • Added three configurable q-value cutoff parameters (defaults 0.01) for DIANN imports, with filtering applied consistently during the cleaning step and behavior adjusted when match-between-runs (MBR) is used.
    • Improved informational logging around filtering decisions.
  • Documentation

    • Expanded docs to describe the new q-value cutoff parameters and their MBR-dependent effects.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 6, 2026

Warning

Rate limit exceeded

@tonywu1999 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 11 minutes and 12 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📝 Walkthrough

Walkthrough

The PR adds three q-value cutoff parameters (global_qvalue_cutoff, qvalue_cutoff, pg_qvalue_cutoff) to the DIANN cleaning pipeline, surfaces them through DIANNtoMSstatsFormat to MSstatsClean and .cleanRawDIANN, and shifts conditional, MBR-aware filtering and logging into the cleaning functions.

Changes

Cohort / File(s) Summary
Metadata
DESCRIPTION
Updated RoxygenNote from 7.3.2 to 7.3.3.
DIANN Cleaning Core
R/clean_DIANN.R
Added global_qvalue_cutoff, qvalue_cutoff, pg_qvalue_cutoff to .cleanRawDIANN and extended .cleanDIANNCleanAndFilterData. Introduced MBR-aware conditional filtering on different q-value columns and added logging/MSstats logging hooks.
Data Conversion
R/converters_DIANNtoMSstatsFormat.R
Removed inline post-import filtering and logging; now forwards the three cutoff parameters to MSstatsClean instead of filtering in the converter.
Function Documentation
man/MSstatsClean.Rd, man/dot-cleanRawDIANN.Rd, man/DIANNtoMSstatsFormat.Rd
Updated signatures and documentation to include global_qvalue_cutoff, qvalue_cutoff, and pg_qvalue_cutoff (defaults 0.01) and described MBR-dependent behavior; minor doc text edits for pg_qvalue_cutoff.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant DIANNtoMSstatsFormat
    participant MSstatsClean
    participant cleanRawDIANN
    participant FilterLogic

    User->>DIANNtoMSstatsFormat: call with q-value cutoffs
    DIANNtoMSstatsFormat->>MSstatsClean: pass cutoff parameters
    MSstatsClean->>cleanRawDIANN: invoke with cutoffs
    cleanRawDIANN->>FilterLogic: apply conditional filtering (MBR-aware)
    FilterLogic->>FilterLogic: always apply global q-value filter
    alt MBR enabled
        FilterLogic->>FilterLogic: apply LibPGQValue & LibQValue filters
    else MBR disabled
        FilterLogic->>FilterLogic: apply GlobalPGQValue & GlobalQValue filters
    end
    FilterLogic->>cleanRawDIANN: return filtered data
    cleanRawDIANN->>MSstatsClean: cleaned data
    MSstatsClean->>DIANNtoMSstatsFormat: formatted result
    DIANNtoMSstatsFormat->>User: return clean data
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • mstaniak

Poem

🐰 Whiskers twitch with filtering glee,
Q-values trim the data tree,
MBR paths split with careful art,
Cutoffs hop into every part,
Cleaned and ready — bunny heart!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: moving q-value filtering logic to the MSstatsClean function for DIANN processing.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description check ✅ Passed Pull request description follows the required template structure with Motivation and Context, Changes, and Testing sections completed. Checklist items are marked as done.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch refactor-diann

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@man/MSstatsClean.Rd`:
- Around line 135-147: The documentation string for pg_qvalue_cutoff contains a
stray sentence "Run should be the same as filename." copied from the roxygen
block; remove that sentence from the roxygen comment in
R/converters_DIANNtoMSstatsFormat.R (the block that documents pg_qvalue_cutoff)
so the generated man/MSstatsClean.Rd no longer includes the artifact, and
regenerate the documentation (run devtools::document or equivalent) to update
MSstatsClean.Rd.

In `@R/clean_DIANN.R`:
- Around line 162-166: The log message is misleading: the code filters dn_input
on the run-level column QValue (dn_input[QValue < global_qvalue_cutoff, ]) but
the message and variable name imply "Global Q Value"; update the messages sent
via getOption("MSstatsLog") and getOption("MSstatsMsg") to explicitly state that
the filter is on QValue (run-specific precursor q-value) and include the
global_qvalue_cutoff value for clarity, referencing the dn_input data.table and
the QValue column used in the filter.

In `@R/converters_DIANNtoMSstatsFormat.R`:
- Around line 73-77: The call to MSstatsConvert::MSstatsClean is passing the
three q-value cutoffs positionally which is fragile; update the call in
converters_DIANNtoMSstatsFormat.R so that the cutoff arguments are passed by
name (e.g., global_qvalue_cutoff=..., qvalue_cutoff=..., pg_qvalue_cutoff=...)
alongside the existing MBR and quantificationColumn arguments to ensure correct
binding when MSstatsClean (S4 generic) dispatches.
🧹 Nitpick comments (1)
R/clean_DIANN.R (1)

167-193: Logging is emitted after the data is already filtered — consider reordering for clarity, and remove commented-out code.

Two observations:

  1. In both the MBR and non-MBR branches, the informational log messages about which filters are applied are emitted after the filtering has already happened (e.g., lines 173-178 come after lines 171-172). Moving the log messages before the actual filter operations would make the log output more intuitive for debugging.

  2. Lines 179 and 192 contain commented-out logging code (# getOption("MSstatsLog")("INFO", "\n")). These should be removed to keep the code clean.

♻️ Suggested reorder + cleanup for MBR branch (similar for non-MBR)
     if (MBR) {
         msg = '** MBR was used to analyze the data. Now setting names and filtering'
         msg_1_mbr = paste0('-- LibPGQValue < ', pg_qvalue_cutoff)
         msg_2_mbr = paste0('-- LibQValue < ', qvalue_cutoff)
-        dn_input = dn_input[LibPGQValue < pg_qvalue_cutoff, ]
-        dn_input = dn_input[LibQValue < qvalue_cutoff, ]
         getOption("MSstatsLog")("INFO", msg)
         getOption("MSstatsMsg")("INFO", msg)
         getOption("MSstatsLog")("INFO", msg_1_mbr)
         getOption("MSstatsMsg")("INFO", msg_1_mbr)
         getOption("MSstatsLog")("INFO", msg_2_mbr)
         getOption("MSstatsMsg")("INFO", msg_2_mbr)
-        # getOption("MSstatsLog")("INFO", "\n")
-    } else{
+        dn_input = dn_input[LibPGQValue < pg_qvalue_cutoff, ]
+        dn_input = dn_input[LibQValue < qvalue_cutoff, ]
+    } else {
         msg = '** MBR was not used to analyze the data. Now setting names and filtering'
         msg_1 = paste0('-- Filtering on GlobalPGQValue < ', pg_qvalue_cutoff)
         msg_2 = paste0('-- Filtering on GlobalQValue < ', qvalue_cutoff)
-        dn_input = dn_input[GlobalPGQValue < pg_qvalue_cutoff, ]
-        dn_input = dn_input[GlobalQValue < qvalue_cutoff, ]
         getOption("MSstatsLog")("INFO", msg)
         getOption("MSstatsMsg")("INFO", msg)
         getOption("MSstatsLog")("INFO", msg_1)
         getOption("MSstatsMsg")("INFO", msg_1)
         getOption("MSstatsLog")("INFO", msg_2)
         getOption("MSstatsMsg")("INFO", msg_2)
-        # getOption("MSstatsLog")("INFO", "\n")
+        dn_input = dn_input[GlobalPGQValue < pg_qvalue_cutoff, ]
+        dn_input = dn_input[GlobalQValue < qvalue_cutoff, ]
     }

@tonywu1999 tonywu1999 merged commit 26b856e into devel Feb 6, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant