Skip to content

Add debug-github-ci and debug-jenkins-ci skills#76

Draft
neubig wants to merge 3 commits intomainfrom
add-ci-debug-skills
Draft

Add debug-github-ci and debug-jenkins-ci skills#76
neubig wants to merge 3 commits intomainfrom
add-ci-debug-skills

Conversation

@neubig
Copy link
Contributor

@neubig neubig commented Feb 26, 2026

Summary

Adds two CI-debugging extensions:

  • debug-github-ci for GitHub Actions failures
  • debug-jenkins-ci for Jenkins build failures

The current branch now also adds unit coverage for the pure log-processing / formatting helpers that previous reviews requested.

Details

What these extensions provide

GitHub Actions

  • skill guidance for interactive CI debugging in OpenHands
  • a composite GitHub Action that can analyze a failed workflow run automatically
  • context-aware log truncation that prioritizes error regions before falling back to head/tail truncation

Jenkins

  • skill guidance for interactive Jenkins debugging
  • an agent script that fetches build metadata, stages, and console output
  • the same context-aware truncation approach for large logs

Review-driven follow-up changes on this branch

  • add pytest coverage for the pure helper functions called out in review:
    • GitHub: _find_error_context, _truncate_logs, format_failed_jobs
    • Jenkins: _find_error_context, _truncate_logs, format_duration, format_timestamp, and prompt.format_prompt
  • compile regex patterns once per scan instead of recompiling in the hot loop
  • skip invalid custom regex patterns with a warning instead of crashing the scan
  • make the agent-script modules importable for unit tests even when the full OpenHands runtime is not installed locally
  • document the pluginRoot marketplace resolution and why the plugin directories contain skill symlinks
  • document the Ubuntu runner assumption next to GitHub CLI installation in the action

Testing

$ python3 -m pytest \
    tests/test_debug_github_ci_agent_script.py \
    tests/test_debug_jenkins_ci_helpers.py -q
............
12 passed in 0.04s

Evidence

Verification link: View conversation

Cannot be fully verified end-to-end in current environment:

  • What I tried: reviewed the plugin/skill wiring, added the requested pure-function tests, and ran those tests locally.
  • Resource needed: a real failing GitHub Actions run plus a reachable Jenkins instance with credentials so the full automation paths can be exercised end-to-end.
  • Reason: the primary user-facing behavior here is autonomous debugging against live CI systems. I do not have honest live-run evidence yet for the GitHub Action + Jenkins integrations themselves in this environment.
  • Manual verification steps:
    1. Trigger the GitHub workflow against a known failed Actions run and confirm the agent posts a useful diagnosis.
    2. Run the Jenkins script against a failed Jenkins build with valid credentials.
    3. Confirm both flows fetch logs, summarize failed jobs/stages, and preserve relevant error context in truncated logs.

Checklist

  • Pure helper tests added for GitHub and Jenkins scripts
  • Regex hot-loop recompilation fixed in both scripts
  • Review-requested documentation clarifications added
  • Live GitHub Actions / Jenkins end-to-end evidence gathered
  • All review threads resolved

@neubig neubig marked this pull request as ready for review March 1, 2026 13:17
Copy link

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🔴 Needs Improvement

Core Issue: These are plugins with executable code being registered as skills in the marketplace, with broken source paths. The marketplace registration fundamentally conflicts with the actual directory structure.

Verdict: ❌ Needs rework - Path resolution is broken, architecture needs clarification.

Key Insight: You cannot register plugins as skills. Pick one: either register the SKILL.md files as skills (correct paths), or register these as plugins with their automation capabilities. The current approach breaks both.

@neubig neubig marked this pull request as draft March 1, 2026 13:28
@neubig neubig marked this pull request as ready for review March 2, 2026 03:40
Copy link

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🟡 Acceptable - Works but has unresolved design issues

Core Pattern: Previous review feedback was "resolved" with explanations rather than fixes. The code works, but fundamental concerns remain unaddressed.

Verdict: ✅ Worth merging with caveats - The automation is functional and useful, but carries technical debt from day one.

Key Insight: You built plugins with executable automation but registered them as skills in the marketplace. This identity crisis creates confusion about what these artifacts actually are and how they should be used.

@neubig neubig marked this pull request as draft March 2, 2026 12:21
@neubig neubig marked this pull request as ready for review March 2, 2026 12:36
Copy link

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🟡 Acceptable

Core Pattern: Solves a real problem (CI debugging) with straightforward implementation. Previous feedback addressed most technical issues. Remaining concerns are operational - cost controls, rate limiting, and failure modes.

Verdict: ✅ Worth merging with operational caveats - Code quality is solid after refactoring, but production usage needs monitoring.

Key Insight: Every CI failure will trigger an LLM agent that costs money and makes API calls. No safeguards for rate limiting, cost caps, or recursive failures.

@neubig neubig marked this pull request as draft March 2, 2026 12:46
@neubig neubig marked this pull request as ready for review March 2, 2026 13:00
Copy link

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🟡 Acceptable - Works but has unresolved issues

Core Issue: The code is reasonably well-structured after previous refactorings, but marketplace paths are still broken, there are no tests, and some failure modes remain unhandled.

Verdict: ❌ Needs fixes - Fix the marketplace paths and add basic test coverage before merging.

@neubig neubig marked this pull request as draft March 2, 2026 13:09
@neubig neubig marked this pull request as ready for review March 2, 2026 13:16
Copy link

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🟡 Acceptable - Works but misses fundamentals

Core Pattern: Solves a real problem (CI debugging) with mostly clean code, but ships 2000+ lines with zero tests for functions that have edge cases and can break silently.

Verdict: ❌ Needs fixes - Add tests for pure functions and fix regex compilation issue before merging.

Key Insight: Previous reviews spent energy on architecture bikeshedding (paths, dual-mode, recursion) but missed the fundamental issue: you cannot verify correctness of log truncation, pattern matching, and formatting without tests. "We'll rely on integration testing" is a lazy excuse when you have pure functions with clear edge cases.

@neubig neubig marked this pull request as draft March 3, 2026 13:50
Squashed commit from add-ci-debug-skills branch with conflict resolution.
Includes all changes from the original PR with main branch merged.

Co-authored-by: openhands <openhands@all-hands.dev>
@neubig neubig force-pushed the add-ci-debug-skills branch from 256241f to b617dbd Compare March 3, 2026 13:52
@neubig neubig marked this pull request as ready for review March 3, 2026 13:53
Copy link

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🔴 Needs Improvement

Core Issue: Three unresolved critical issues remain from previous reviews: regex compilation in hot loops (performance), zero tests for 2000+ lines (correctness), and symlink architecture confusion (maintainability).

Verdict: ❌ Needs rework before merge - Fix regex compilation and add tests for pure functions.

Key Insight: You solved the hard problems (recursion guards, configurable patterns, input validation) but left trivial performance bugs and skipped testing entirely. The regex issue has a 3-line fix; the testing objection ("ROI is low") doesn't hold for pure functions with no API dependencies.

@neubig neubig marked this pull request as draft March 7, 2026 19:24
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants