refactor(diann): move q-value filtering to MSstatsClean for DIANN#116
refactor(diann): move q-value filtering to MSstatsClean for DIANN#116tonywu1999 merged 5 commits intodevelfrom
Conversation
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📝 WalkthroughWalkthroughThe PR adds three q-value cutoff parameters (global_qvalue_cutoff, qvalue_cutoff, pg_qvalue_cutoff) to the DIANN cleaning pipeline, surfaces them through DIANNtoMSstatsFormat to MSstatsClean and .cleanRawDIANN, and shifts conditional, MBR-aware filtering and logging into the cleaning functions. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant DIANNtoMSstatsFormat
participant MSstatsClean
participant cleanRawDIANN
participant FilterLogic
User->>DIANNtoMSstatsFormat: call with q-value cutoffs
DIANNtoMSstatsFormat->>MSstatsClean: pass cutoff parameters
MSstatsClean->>cleanRawDIANN: invoke with cutoffs
cleanRawDIANN->>FilterLogic: apply conditional filtering (MBR-aware)
FilterLogic->>FilterLogic: always apply global q-value filter
alt MBR enabled
FilterLogic->>FilterLogic: apply LibPGQValue & LibQValue filters
else MBR disabled
FilterLogic->>FilterLogic: apply GlobalPGQValue & GlobalQValue filters
end
FilterLogic->>cleanRawDIANN: return filtered data
cleanRawDIANN->>MSstatsClean: cleaned data
MSstatsClean->>DIANNtoMSstatsFormat: formatted result
DIANNtoMSstatsFormat->>User: return clean data
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@man/MSstatsClean.Rd`:
- Around line 135-147: The documentation string for pg_qvalue_cutoff contains a
stray sentence "Run should be the same as filename." copied from the roxygen
block; remove that sentence from the roxygen comment in
R/converters_DIANNtoMSstatsFormat.R (the block that documents pg_qvalue_cutoff)
so the generated man/MSstatsClean.Rd no longer includes the artifact, and
regenerate the documentation (run devtools::document or equivalent) to update
MSstatsClean.Rd.
In `@R/clean_DIANN.R`:
- Around line 162-166: The log message is misleading: the code filters dn_input
on the run-level column QValue (dn_input[QValue < global_qvalue_cutoff, ]) but
the message and variable name imply "Global Q Value"; update the messages sent
via getOption("MSstatsLog") and getOption("MSstatsMsg") to explicitly state that
the filter is on QValue (run-specific precursor q-value) and include the
global_qvalue_cutoff value for clarity, referencing the dn_input data.table and
the QValue column used in the filter.
In `@R/converters_DIANNtoMSstatsFormat.R`:
- Around line 73-77: The call to MSstatsConvert::MSstatsClean is passing the
three q-value cutoffs positionally which is fragile; update the call in
converters_DIANNtoMSstatsFormat.R so that the cutoff arguments are passed by
name (e.g., global_qvalue_cutoff=..., qvalue_cutoff=..., pg_qvalue_cutoff=...)
alongside the existing MBR and quantificationColumn arguments to ensure correct
binding when MSstatsClean (S4 generic) dispatches.
🧹 Nitpick comments (1)
R/clean_DIANN.R (1)
167-193: Logging is emitted after the data is already filtered — consider reordering for clarity, and remove commented-out code.Two observations:
In both the MBR and non-MBR branches, the informational log messages about which filters are applied are emitted after the filtering has already happened (e.g., lines 173-178 come after lines 171-172). Moving the log messages before the actual filter operations would make the log output more intuitive for debugging.
Lines 179 and 192 contain commented-out logging code (
# getOption("MSstatsLog")("INFO", "\n")). These should be removed to keep the code clean.♻️ Suggested reorder + cleanup for MBR branch (similar for non-MBR)
if (MBR) { msg = '** MBR was used to analyze the data. Now setting names and filtering' msg_1_mbr = paste0('-- LibPGQValue < ', pg_qvalue_cutoff) msg_2_mbr = paste0('-- LibQValue < ', qvalue_cutoff) - dn_input = dn_input[LibPGQValue < pg_qvalue_cutoff, ] - dn_input = dn_input[LibQValue < qvalue_cutoff, ] getOption("MSstatsLog")("INFO", msg) getOption("MSstatsMsg")("INFO", msg) getOption("MSstatsLog")("INFO", msg_1_mbr) getOption("MSstatsMsg")("INFO", msg_1_mbr) getOption("MSstatsLog")("INFO", msg_2_mbr) getOption("MSstatsMsg")("INFO", msg_2_mbr) - # getOption("MSstatsLog")("INFO", "\n") - } else{ + dn_input = dn_input[LibPGQValue < pg_qvalue_cutoff, ] + dn_input = dn_input[LibQValue < qvalue_cutoff, ] + } else { msg = '** MBR was not used to analyze the data. Now setting names and filtering' msg_1 = paste0('-- Filtering on GlobalPGQValue < ', pg_qvalue_cutoff) msg_2 = paste0('-- Filtering on GlobalQValue < ', qvalue_cutoff) - dn_input = dn_input[GlobalPGQValue < pg_qvalue_cutoff, ] - dn_input = dn_input[GlobalQValue < qvalue_cutoff, ] getOption("MSstatsLog")("INFO", msg) getOption("MSstatsMsg")("INFO", msg) getOption("MSstatsLog")("INFO", msg_1) getOption("MSstatsMsg")("INFO", msg_1) getOption("MSstatsLog")("INFO", msg_2) getOption("MSstatsMsg")("INFO", msg_2) - # getOption("MSstatsLog")("INFO", "\n") + dn_input = dn_input[GlobalPGQValue < pg_qvalue_cutoff, ] + dn_input = dn_input[GlobalQValue < qvalue_cutoff, ] }
Motivation and Context
See Vitek-Lab/MSstatsBig#10: q-value filtering is a process that is do-able with chunking because it's a simple filter that doesn't rely on other rows.
Changes
Testing
Added unit tests for the clean function.
Checklist Before Requesting a Review
Summary by CodeRabbit
New Features
Documentation