Add git_analytics dashboard generator and generated artifacts#77
Add git_analytics dashboard generator and generated artifacts#77Ritik574-coder wants to merge 1 commit into
Conversation
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
📝 WalkthroughWalkthroughThis PR introduces a complete Git merge analytics system. A Python script ( ChangesGit Merge Analytics Pipeline
Sequence DiagramsequenceDiagram
participant GitLog as Git Log
participant Analyzer as analyze_git_activity.py
participant CSVWriter as write_csv
participant HTMLWriter as write_html
participant MDWriter as write_markdown
participant Outputs as Output Files
GitLog->>Analyzer: git log --merges (pipe-delimited)
Analyzer->>Analyzer: parse_lines(log_output)
Analyzer->>CSVWriter: parsed rows
CSVWriter->>Outputs: merge_pr_activity.csv
Analyzer->>HTMLWriter: parsed rows
HTMLWriter->>Outputs: merge_pr_activity.html
Analyzer->>MDWriter: parsed rows
MDWriter->>Outputs: merge_pr_summary.md
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
|
Failed to generate code suggestions for PR |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a88527e7f7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| function renderTable(rows) {{ | ||
| const tbody = document.querySelector('#prTable tbody'); | ||
| tbody.innerHTML = rows.map(r => `<tr><td>${{r.date}}</td><td>${{r.pr_number ?? ''}}</td><td>${{r.author}}</td><td>${{r.source_branch}}</td><td><code>${{r.commit.slice(0,8)}}</code></td><td>${{r.subject}}</td></tr>`).join(''); |
There was a problem hiding this comment.
Escape merge fields before writing table HTML
The dashboard renders commit-derived strings (subject, author, source_branch) directly into innerHTML, so a merge commit message containing HTML/JS (for example from a PR title in a public repo) will execute when someone opens the generated report. Because these fields come from git log and are attacker-controllable in collaborative repositories, this introduces a stored XSS vector in the artifact.
Useful? React with 👍 / 👎.
|
|
||
|
|
||
| def run_git_log() -> list[str]: | ||
| cmd = ["git", "log", "--merges", "--date=short", f"--pretty=format:{LOG_FORMAT}"] |
There was a problem hiding this comment.
Limit merge analytics to first-parent history
Using git log --merges without --first-parent walks all parents, so merges that happened on side/topic branches are included in the dataset and can distort totals and trends for the main integration branch. In repos where teams regularly merge upstream into feature branches, this will inflate merge counts and make the monthly/weekday metrics report activity that was never merged directly into the target branch.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
git_analytics/analyze_git_activity.py (1)
31-55: ⚡ Quick winAdd parser-focused unit tests for merge-subject variants
This parser drives all dashboard metrics; a small format drift can silently skew analytics. Please add tests for canonical PR merges, non-PR merges, branch names with slashes, and subject strings containing HTML/script-like content.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@git_analytics/analyze_git_activity.py` around lines 31 - 55, Add unit tests for parse_lines to cover canonical PR merge subjects, non-PR merges, branch names containing slashes, and subjects with HTML/script-like content so format drift doesn't break metrics; craft input lines using the same "|" split format and validate parse_lines returns rows with correct commit, date, subject, pr_number (None or int), is_pr_merge (0/1), author, source_branch (preserving slashes), and derived fields month and weekday; include tests referencing PR_RE and FROM_RE behavior by asserting both matched and unmatched cases and edge-case subjects to ensure no exceptions or incorrect parsing.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@git_analytics/analyze_git_activity.py`:
- Around line 138-153: The renderTable function currently builds HTML via
tbody.innerHTML using template interpolation of git-derived fields (e.g.,
r.subject, r.author, r.source_branch, r.commit, r.pr_number) which allows
script/HTML injection; fix it by avoiding direct innerHTML construction: either
HTML-escape/encode those fields before interpolation or (preferably) create DOM
elements with document.createElement and set textContent for each cell so values
are treated as text, then append rows to the tbody; update renderTable to use
this safe rendering approach and ensure the same escaping is applied wherever
tableData or subject is used in inline JS.
---
Nitpick comments:
In `@git_analytics/analyze_git_activity.py`:
- Around line 31-55: Add unit tests for parse_lines to cover canonical PR merge
subjects, non-PR merges, branch names containing slashes, and subjects with
HTML/script-like content so format drift doesn't break metrics; craft input
lines using the same "|" split format and validate parse_lines returns rows with
correct commit, date, subject, pr_number (None or int), is_pr_merge (0/1),
author, source_branch (preserving slashes), and derived fields month and
weekday; include tests referencing PR_RE and FROM_RE behavior by asserting both
matched and unmatched cases and edge-case subjects to ensure no exceptions or
incorrect parsing.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 746f32cc-8ba1-4660-b80f-cc03465cf338
⛔ Files ignored due to path filters (1)
git_analytics/merge_pr_activity.csvis excluded by!**/*.csv
📒 Files selected for processing (3)
git_analytics/analyze_git_activity.pygit_analytics/merge_pr_activity.htmlgit_analytics/merge_pr_summary.md
| const tableData = {json.dumps(table_rows)}; | ||
|
|
||
| const commonLayout = {{paper_bgcolor:'#151b31', plot_bgcolor:'#151b31', font:{{color:'#e9eefc'}}}}; | ||
| Plotly.newPlot('monthChart', [ | ||
| {{x: months, y: monthMerge, name:'All merges', type:'scatter', mode:'lines+markers'}}, | ||
| {{x: months, y: monthPR, name:'PR merges', type:'scatter', mode:'lines+markers'}} | ||
| ], {{...commonLayout, title:'Monthly Pull/Merge Trend', xaxis:{{title:'Month'}}, yaxis:{{title:'Count'}}}}); | ||
|
|
||
| Plotly.newPlot('typeChart', [{{type:'pie', labels:['PR merges','Other merges'], values:[{total_pr_merges},{total_merges-total_pr_merges}], hole:0.48}}], {{...commonLayout, title:'Merge Type Split'}}); | ||
| Plotly.newPlot('weekdayChart', [{{x:weekday, y:weekdayVals, type:'bar', marker:{{color:'#53a7ff'}}}}], {{...commonLayout, title:'Merge Activity by Weekday', yaxis:{{title:'Count'}}}}); | ||
| Plotly.newPlot('authorChart', [{{x:authorVals, y:authorNames, type:'bar', orientation:'h', marker:{{color:'#7bd88f'}}}}], {{...commonLayout, title:'Top PR Sources (from merge text)', xaxis:{{title:'Merge count'}}}}); | ||
|
|
||
| function renderTable(rows) {{ | ||
| const tbody = document.querySelector('#prTable tbody'); | ||
| tbody.innerHTML = rows.map(r => `<tr><td>${{r.date}}</td><td>${{r.pr_number ?? ''}}</td><td>${{r.author}}</td><td>${{r.source_branch}}</td><td><code>${{r.commit.slice(0,8)}}</code></td><td>${{r.subject}}</td></tr>`).join(''); | ||
| }} |
There was a problem hiding this comment.
Prevent script/HTML injection from git metadata before rendering
Git-derived fields (notably subject) are embedded in inline JS and then rendered with innerHTML. A crafted merge subject can break out of <script> or execute in the table context.
🔐 Suggested hardening patch
@@
- table_rows = rows[:100]
+ table_rows = rows[:100]
+ safe_table_json = json.dumps(table_rows).replace("</", "<\\/")
@@
-const tableData = {json.dumps(table_rows)};
+const tableData = {safe_table_json};
@@
function renderTable(rows) {{
const tbody = document.querySelector('`#prTable` tbody');
- tbody.innerHTML = rows.map(r => `<tr><td>${{r.date}}</td><td>${{r.pr_number ?? ''}}</td><td>${{r.author}}</td><td>${{r.source_branch}}</td><td><code>${{r.commit.slice(0,8)}}</code></td><td>${{r.subject}}</td></tr>`).join('');
+ tbody.textContent = '';
+ for (const r of rows) {{
+ const tr = document.createElement('tr');
+ const values = [r.date, r.pr_number ?? '', r.author, r.source_branch, r.commit.slice(0, 8), r.subject];
+ values.forEach((value, idx) => {{
+ const td = document.createElement('td');
+ if (idx === 4) {{
+ const code = document.createElement('code');
+ code.textContent = String(value);
+ td.appendChild(code);
+ }} else {{
+ td.textContent = String(value);
+ }}
+ tr.appendChild(td);
+ }});
+ tbody.appendChild(tr);
+ }}
}}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@git_analytics/analyze_git_activity.py` around lines 138 - 153, The
renderTable function currently builds HTML via tbody.innerHTML using template
interpolation of git-derived fields (e.g., r.subject, r.author, r.source_branch,
r.commit, r.pr_number) which allows script/HTML injection; fix it by avoiding
direct innerHTML construction: either HTML-escape/encode those fields before
interpolation or (preferably) create DOM elements with document.createElement
and set textContent for each cell so values are treated as text, then append
rows to the tbody; update renderTable to use this safe rendering approach and
ensure the same escaping is applied wherever tableData or subject is used in
inline JS.
Motivation
CSV,HTML,MD) to summarize PR merges, authors, weekdays and monthly trends.Description
git_analytics/analyze_git_activity.pywhich runsgit log --merges, parses commits with regexes for PR numbers and source info, and builds structured rows with fields likecommit,date,month,weekday,pr_number,is_pr_merge,author, andsource_branch.merge_pr_activity.csv, an interactivemerge_pr_activity.htmldashboard (Plotly charts, searchable top-100 table), and a summarymerge_pr_summary.mdwith KPIs and PR range.git_analytics/merge_pr_activity.csv,git_analytics/merge_pr_activity.html, andgit_analytics/merge_pr_summary.mdreflecting parsed data.subprocess.runto invokegit,csv/jsonfor serialization, andcollections.Counterfor aggregations used in visualizations.Testing
Codex Task
Summary by CodeRabbit