-
Notifications
You must be signed in to change notification settings - Fork 245
feat(submitting): add posting strategies #2973
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).
|
|
Claude finished @julienrbrt's task in 3m 5s —— View job Comprehensive Code Review Complete ✅I've completed a thorough review of the DA submission batching strategies implementation. This is an excellent enhancement that adds much-needed flexibility for optimizing DA submission costs and latency. The code quality is high overall, with good test coverage and clear separation of concerns. Below are my detailed findings organized by priority and category. ✅ Strengths
🔴 Critical Issues1. Potential Division by Zero in Batching StrategiesLocation: threshold := int(float64(maxBlobSize) * s.sizeThreshold)If if maxBlobSize <= 0 {
return false // or return an error/log warning
}
threshold := int(float64(maxBlobSize) * s.sizeThreshold)Impact: Medium - Could cause unexpected behavior with misconfigured max blob sizes 2. Max Blob Size Mismatch RiskLocation: The const DefaultMaxBlobSize = 7 * 1024 * 1024 // 7MB fallback blob size limitConcerns:
Recommendation:
🟡 High Priority Issues3. Concurrent Goroutine Leak Risk in Submission LoopLocation: The submission loop spawns goroutines using if s.headerSubmissionMtx.TryLock() {
go func() {
defer s.headerSubmissionMtx.Unlock()
// ... submission logic
}()
}Concerns:
Recommendation: if s.headerSubmissionMtx.TryLock() {
s.wg.Add(1) // Track the goroutine
go func() {
defer s.wg.Done()
defer s.headerSubmissionMtx.Unlock()
// Check context before proceeding
select {
case <-s.ctx.Done():
return
default:
}
// ... submission logic
}()
}4. Marshalled Cache Memory GrowthLocation: The marshalled cache uses // Marshalling cache to avoid redundant marshalling
marshalledCache sync.Map // key: uint64 (height), value: []byteConcerns:
Impact: Medium - Could cause memory pressure during DA outages
5. Error Handling in Size EstimationLocation: While the marshalling is now cached, the code retrieving from cache doesn't handle the case where marshalling failed during caching: // Calculate total size (excluding signature)
totalSize := 0
for _, marshalled := range marshalledHeaders {
totalSize += len(marshalled)
}If headers, marshalledHeaders, err := s.cache.GetPendingHeaders(s.ctx)
if err != nil {
s.logger.Error().Err(err).Msg("failed to get pending headers for batching decision")
return
}Recommendation: This is actually handled correctly - returning early if marshalling fails. No changes needed, but consider adding a metric to track marshalling failures. 🟠 Medium Priority Issues6. Race Condition in Time TrackingLocation: lastSubmitNanos := s.lastHeaderSubmit.Load()
timeSinceLastSubmit := time.Since(time.Unix(0, lastSubmitNanos))There's a tiny window where Impact: Very Low - Worst case is a batch submits slightly earlier/later than optimal 7. Configuration Validation MissingLocation: The case "immediate":
return &ImmediateStrategy{}, nilIssues:
Recommendation: case "immediate":
if cfg.BatchSizeThreshold != 0.8 || cfg.BatchMaxDelay.Duration != 0 || cfg.BatchMinItems != 1 {
// Log warning that these params are ignored for immediate strategy
}
return &ImmediateStrategy{}, nil8. Default BatchMaxDelay of 0 Could Be ConfusingLocation: BatchMaxDelay: DurationWrapper{0}, // 0 means use DA BlockTimeThe "0 means use DA BlockTime" convention is only documented in the comment and in the strategy constructors. Users might expect 0 to mean "no delay" instead. Recommendation:
🟢 Low Priority / Suggestions9. Test Coverage for Edge CasesThe test coverage is good (81.89%), but a few additional test cases would be valuable:
10. Code DocumentationSome complex logic could benefit from additional comments: Location: checkInterval := max(s.config.DA.BlockTime.Duration/4, 100*time.Millisecond)Recommendation: Explain why BlockTime/4 was chosen and why 100ms is the minimum 11. Metrics EnhancementsConsider adding these metrics for better observability:
12. Performance: Unnecessary Time OperationsLocation: s.lastHeaderSubmit.Store(time.Now().UnixNano())
📋 Code Quality & StylePositive Observations:
Minor Style Suggestions:
🏗️ Architecture & DesignThe overall architecture is solid:
Suggestion for Future Enhancement: Consider making strategies pluggable at runtime without restart, possibly via a strategy registry pattern. 🔐 Security ConsiderationsNo significant security issues identified. The code properly:
📊 Performance ConsiderationsThe marshalling cache optimization is excellent and addresses the main performance concern. Additional considerations:
✅ Summary & RecommendationThis is a high-quality PR that significantly enhances the DA submission system. The batching strategies are well-designed, properly tested, and will provide meaningful cost savings and flexibility. Recommended Actions Before Merge:
Nice-to-Have (can be addressed in follow-up PRs):
📝 Existing Review Comments StatusReviewing the previous comments: ✅ Gemini's double-fetching concern - Resolved by caching marshalled bytes Overall Assessment: Approve with minor recommendations The implementation is production-ready with the understanding that the critical issues listed above should be addressed soon (either before merge or in immediate follow-up). The code demonstrates strong engineering practices and will provide real value to users optimizing their DA costs. |
Summary of ChangesHello @julienrbrt, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly upgrades the Data Availability (DA) layer submission process by introducing a robust and configurable batching system. Instead of immediate submission, the system can now intelligently decide when to send data based on various strategies, such as accumulating a certain amount of data, waiting for a specific time interval, or an adaptive combination of both. This enhancement aims to improve the efficiency and cost-effectiveness of data submissions, particularly for DA layers like Celestia, by better utilizing blob capacity and reducing transaction overhead. The maximum blob size has also been updated to reflect current DA layer capabilities. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces configurable batching strategies for DA submissions, a valuable feature for optimizing costs and latency. The implementation is well-structured, adding immediate, size, time, and adaptive strategies, along with comprehensive tests. My review focuses on improving the efficiency and robustness of the new logic in the submitter, particularly around size estimation and data fetching.
| // Wait if current utilization is below minimum threshold | ||
| // Use epsilon for floating point comparison | ||
| const epsilon = 0.001 | ||
| currentUtilization := float64(currentSize) / float64(maxBlobSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a potential for division by zero if maxBlobSize is 0. In Go, floating-point division by zero results in +Inf or NaN rather than a panic, but this can lead to unexpected behavior in the comparison that follows. It would be safer to add a guard against this, similar to the pattern used in calculateBatchMetrics.
| data, err := h.MarshalBinary() | ||
| if err == nil { | ||
| totalSize += len(data) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Errors from h.MarshalBinary() are silently ignored. This could lead to an inaccurate totalSize, causing the batching strategy to make a suboptimal decision (e.g., delaying a submission). It's better to log these errors for visibility.
Additionally, this size estimation logic is duplicated for data submission. Consider extracting it into a shared helper function to improve maintainability.
data, err := h.MarshalBinary()
if err != nil {
s.logger.Warn().Err(err).Msg("failed to marshal header for size estimation")
continue
}
totalSize += len(data)| data, err := d.MarshalBinary() | ||
| if err == nil { | ||
| totalSize += len(data) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the header submission logic, errors from d.MarshalBinary() are silently ignored here. This can lead to inaccurate size estimations and suboptimal batching. It's better to log these errors for improved diagnostics and robustness.
data, err := d.MarshalBinary()
if err != nil {
s.logger.Warn().Err(err).Msg("failed to marshal data for size estimation")
continue
}
totalSize += len(data)
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2973 +/- ##
==========================================
+ Coverage 58.77% 59.22% +0.44%
==========================================
Files 101 102 +1
Lines 9685 9826 +141
==========================================
+ Hits 5692 5819 +127
- Misses 3381 3390 +9
- Partials 612 617 +5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
tac0turtle
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left two comments, i think we should cleanup marshaling flow to not marshal then not use the data. it will end up in wasted cpu cycles. not a large issue in our case but we should strive to reduce cpu cycles
| return nil, fmt.Errorf("failed to sign envelope: %w", err) | ||
| } | ||
| // Create the envelope and marshal it | ||
| envelope, err := header.MarshalDAEnvelope(envelopeSignature) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can open an issue for this as we are marshaling the header twice in this flow. one in the cache(this pr, previously elsewhere) and in the envelope again, but we sign over the encoded data. if we made this a function that takes in the encoded header then it removes the passing of headers on top of marshaled data.
tac0turtle
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice job
|
checking e2e, will fix! |
7d7300b to
ef3cd60
Compare
Closes: #2890
Add submission strategies.