Skip to content

Add git_analytics dashboard generator and generated artifacts#77

Open
Ritik574-coder wants to merge 1 commit into
mainfrom
codex/create-visualization-for-git-merge-and-pull-requests
Open

Add git_analytics dashboard generator and generated artifacts#77
Ritik574-coder wants to merge 1 commit into
mainfrom
codex/create-visualization-for-git-merge-and-pull-requests

Conversation

@Ritik574-coder

@Ritik574-coder Ritik574-coder commented May 25, 2026

Copy link
Copy Markdown
Owner

Motivation

  • Provide an automated way to extract merge/PR activity from Git history and present it as a professional interactive dashboard for trend analysis and reporting.
  • Produce consumable artifacts (CSV, HTML, MD) to summarize PR merges, authors, weekdays and monthly trends.

Description

  • Add git_analytics/analyze_git_activity.py which runs git log --merges, parses commits with regexes for PR numbers and source info, and builds structured rows with fields like commit, date, month, weekday, pr_number, is_pr_merge, author, and source_branch.
  • Implement writers that emit merge_pr_activity.csv, an interactive merge_pr_activity.html dashboard (Plotly charts, searchable top-100 table), and a summary merge_pr_summary.md with KPIs and PR range.
  • Include generated example artifacts in the repo: git_analytics/merge_pr_activity.csv, git_analytics/merge_pr_activity.html, and git_analytics/merge_pr_summary.md reflecting parsed data.
  • Use subprocess.run to invoke git, csv/json for serialization, and collections.Counter for aggregations used in visualizations.

Testing

  • No automated tests were added or executed as part of this change.

Codex Task

Summary by CodeRabbit

  • New Features
    • Added comprehensive Git merge analytics tool that analyzes repository history and generates insights.
    • Interactive dashboard featuring multiple visualizations: monthly trends, merge-type distribution, weekday patterns, and top contributor activity.
    • Searchable and filterable table displaying recent merge and PR records.
    • Summary reports available in CSV, HTML, and Markdown formats.

Review Change Stack

@qodo-code-review

Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@coderabbitai

coderabbitai Bot commented May 25, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

This PR introduces a complete Git merge analytics system. A Python script (analyze_git_activity.py) extracts merge commits from Git history, parses PR metadata and timestamps, and generates three outputs: a CSV file, an interactive HTML dashboard with Plotly visualizations and filtering, and a Markdown summary of key metrics.

Changes

Git Merge Analytics Pipeline

Layer / File(s) Summary
Data extraction and parsing
git_analytics/analyze_git_activity.py
Defines output paths and regex patterns for PR detection, implements run_git_log() to execute git log with compact format, and parse_lines() to extract PR numbers, commit hashes, dates, subjects, authors, source branches, and derived time fields (month, weekday).
Multi-format output generation
git_analytics/analyze_git_activity.py
Implements write_csv() for raw merge records, write_html() that embeds Plotly charts for monthly trends, merge-type distribution, weekday patterns, and top authors with client-side table filtering for recent records, plus write_markdown() that computes and formats totals, PR ranges, peak month, and top source author.
Pipeline orchestration and entry point
git_analytics/analyze_git_activity.py
main() function orchestrates the full pipeline (git log → parse → write all outputs), with __main__ entry point to run automatically when executed as a script.
Generated dashboard and summary artifacts
git_analytics/merge_pr_activity.html, git_analytics/merge_pr_summary.md
HTML dashboard renders styled KPI cards and interactive Plotly charts driven by embedded data, plus searchable recent-records table with client-side text filtering; Markdown summary provides concise headline metrics and links to the full dashboard.

Sequence Diagram

sequenceDiagram
  participant GitLog as Git Log
  participant Analyzer as analyze_git_activity.py
  participant CSVWriter as write_csv
  participant HTMLWriter as write_html
  participant MDWriter as write_markdown
  participant Outputs as Output Files

  GitLog->>Analyzer: git log --merges (pipe-delimited)
  Analyzer->>Analyzer: parse_lines(log_output)
  Analyzer->>CSVWriter: parsed rows
  CSVWriter->>Outputs: merge_pr_activity.csv
  Analyzer->>HTMLWriter: parsed rows
  HTMLWriter->>Outputs: merge_pr_activity.html
  Analyzer->>MDWriter: parsed rows
  MDWriter->>Outputs: merge_pr_summary.md
Loading

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A script hops through the Git history deep,
Counting merges and PRs to keep,
Charts bloom bright in HTML's gleam,
While the markdown dreams its summary dream,
Analytics dance from terminal to screen! 📊

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately summarizes the primary change: adding a git analytics dashboard generator script and its generated output artifacts (CSV, HTML, MD).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/create-visualization-for-git-merge-and-pull-requests

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

Copy link
Copy Markdown

Failed to generate code suggestions for PR

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a88527e7f7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


function renderTable(rows) {{
const tbody = document.querySelector('#prTable tbody');
tbody.innerHTML = rows.map(r => `<tr><td>${{r.date}}</td><td>${{r.pr_number ?? ''}}</td><td>${{r.author}}</td><td>${{r.source_branch}}</td><td><code>${{r.commit.slice(0,8)}}</code></td><td>${{r.subject}}</td></tr>`).join('');

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Escape merge fields before writing table HTML

The dashboard renders commit-derived strings (subject, author, source_branch) directly into innerHTML, so a merge commit message containing HTML/JS (for example from a PR title in a public repo) will execute when someone opens the generated report. Because these fields come from git log and are attacker-controllable in collaborative repositories, this introduces a stored XSS vector in the artifact.

Useful? React with 👍 / 👎.



def run_git_log() -> list[str]:
cmd = ["git", "log", "--merges", "--date=short", f"--pretty=format:{LOG_FORMAT}"]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Limit merge analytics to first-parent history

Using git log --merges without --first-parent walks all parents, so merges that happened on side/topic branches are included in the dataset and can distort totals and trends for the main integration branch. In repos where teams regularly merge upstream into feature branches, this will inflate merge counts and make the monthly/weekday metrics report activity that was never merged directly into the target branch.

Useful? React with 👍 / 👎.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
git_analytics/analyze_git_activity.py (1)

31-55: ⚡ Quick win

Add parser-focused unit tests for merge-subject variants

This parser drives all dashboard metrics; a small format drift can silently skew analytics. Please add tests for canonical PR merges, non-PR merges, branch names with slashes, and subject strings containing HTML/script-like content.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@git_analytics/analyze_git_activity.py` around lines 31 - 55, Add unit tests
for parse_lines to cover canonical PR merge subjects, non-PR merges, branch
names containing slashes, and subjects with HTML/script-like content so format
drift doesn't break metrics; craft input lines using the same "|" split format
and validate parse_lines returns rows with correct commit, date, subject,
pr_number (None or int), is_pr_merge (0/1), author, source_branch (preserving
slashes), and derived fields month and weekday; include tests referencing PR_RE
and FROM_RE behavior by asserting both matched and unmatched cases and edge-case
subjects to ensure no exceptions or incorrect parsing.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@git_analytics/analyze_git_activity.py`:
- Around line 138-153: The renderTable function currently builds HTML via
tbody.innerHTML using template interpolation of git-derived fields (e.g.,
r.subject, r.author, r.source_branch, r.commit, r.pr_number) which allows
script/HTML injection; fix it by avoiding direct innerHTML construction: either
HTML-escape/encode those fields before interpolation or (preferably) create DOM
elements with document.createElement and set textContent for each cell so values
are treated as text, then append rows to the tbody; update renderTable to use
this safe rendering approach and ensure the same escaping is applied wherever
tableData or subject is used in inline JS.

---

Nitpick comments:
In `@git_analytics/analyze_git_activity.py`:
- Around line 31-55: Add unit tests for parse_lines to cover canonical PR merge
subjects, non-PR merges, branch names containing slashes, and subjects with
HTML/script-like content so format drift doesn't break metrics; craft input
lines using the same "|" split format and validate parse_lines returns rows with
correct commit, date, subject, pr_number (None or int), is_pr_merge (0/1),
author, source_branch (preserving slashes), and derived fields month and
weekday; include tests referencing PR_RE and FROM_RE behavior by asserting both
matched and unmatched cases and edge-case subjects to ensure no exceptions or
incorrect parsing.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 746f32cc-8ba1-4660-b80f-cc03465cf338

📥 Commits

Reviewing files that changed from the base of the PR and between b8f962e and a88527e.

⛔ Files ignored due to path filters (1)
  • git_analytics/merge_pr_activity.csv is excluded by !**/*.csv
📒 Files selected for processing (3)
  • git_analytics/analyze_git_activity.py
  • git_analytics/merge_pr_activity.html
  • git_analytics/merge_pr_summary.md

Comment on lines +138 to +153
const tableData = {json.dumps(table_rows)};

const commonLayout = {{paper_bgcolor:'#151b31', plot_bgcolor:'#151b31', font:{{color:'#e9eefc'}}}};
Plotly.newPlot('monthChart', [
{{x: months, y: monthMerge, name:'All merges', type:'scatter', mode:'lines+markers'}},
{{x: months, y: monthPR, name:'PR merges', type:'scatter', mode:'lines+markers'}}
], {{...commonLayout, title:'Monthly Pull/Merge Trend', xaxis:{{title:'Month'}}, yaxis:{{title:'Count'}}}});

Plotly.newPlot('typeChart', [{{type:'pie', labels:['PR merges','Other merges'], values:[{total_pr_merges},{total_merges-total_pr_merges}], hole:0.48}}], {{...commonLayout, title:'Merge Type Split'}});
Plotly.newPlot('weekdayChart', [{{x:weekday, y:weekdayVals, type:'bar', marker:{{color:'#53a7ff'}}}}], {{...commonLayout, title:'Merge Activity by Weekday', yaxis:{{title:'Count'}}}});
Plotly.newPlot('authorChart', [{{x:authorVals, y:authorNames, type:'bar', orientation:'h', marker:{{color:'#7bd88f'}}}}], {{...commonLayout, title:'Top PR Sources (from merge text)', xaxis:{{title:'Merge count'}}}});

function renderTable(rows) {{
const tbody = document.querySelector('#prTable tbody');
tbody.innerHTML = rows.map(r => `<tr><td>${{r.date}}</td><td>${{r.pr_number ?? ''}}</td><td>${{r.author}}</td><td>${{r.source_branch}}</td><td><code>${{r.commit.slice(0,8)}}</code></td><td>${{r.subject}}</td></tr>`).join('');
}}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Prevent script/HTML injection from git metadata before rendering

Git-derived fields (notably subject) are embedded in inline JS and then rendered with innerHTML. A crafted merge subject can break out of <script> or execute in the table context.

🔐 Suggested hardening patch
@@
-    table_rows = rows[:100]
+    table_rows = rows[:100]
+    safe_table_json = json.dumps(table_rows).replace("</", "<\\/")
@@
-const tableData = {json.dumps(table_rows)};
+const tableData = {safe_table_json};
@@
 function renderTable(rows) {{
   const tbody = document.querySelector('`#prTable` tbody');
-  tbody.innerHTML = rows.map(r => `<tr><td>${{r.date}}</td><td>${{r.pr_number ?? ''}}</td><td>${{r.author}}</td><td>${{r.source_branch}}</td><td><code>${{r.commit.slice(0,8)}}</code></td><td>${{r.subject}}</td></tr>`).join('');
+  tbody.textContent = '';
+  for (const r of rows) {{
+    const tr = document.createElement('tr');
+    const values = [r.date, r.pr_number ?? '', r.author, r.source_branch, r.commit.slice(0, 8), r.subject];
+    values.forEach((value, idx) => {{
+      const td = document.createElement('td');
+      if (idx === 4) {{
+        const code = document.createElement('code');
+        code.textContent = String(value);
+        td.appendChild(code);
+      }} else {{
+        td.textContent = String(value);
+      }}
+      tr.appendChild(td);
+    }});
+    tbody.appendChild(tr);
+  }}
 }}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@git_analytics/analyze_git_activity.py` around lines 138 - 153, The
renderTable function currently builds HTML via tbody.innerHTML using template
interpolation of git-derived fields (e.g., r.subject, r.author, r.source_branch,
r.commit, r.pr_number) which allows script/HTML injection; fix it by avoiding
direct innerHTML construction: either HTML-escape/encode those fields before
interpolation or (preferably) create DOM elements with document.createElement
and set textContent for each cell so values are treated as text, then append
rows to the tbody; update renderTable to use this safe rendering approach and
ensure the same escaping is applied wherever tableData or subject is used in
inline JS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant