feat: bench suite — employees (MariaDB/4M rows) + lahman (SQLite/700k rows)#1718
Open
dimitri wants to merge 12 commits into
Open
feat: bench suite — employees (MariaDB/4M rows) + lahman (SQLite/700k rows)#1718dimitri wants to merge 12 commits into
dimitri wants to merge 12 commits into
Conversation
write-summary-json was using pr-str which produces Clojure EDN, not valid JSON. Switch to clojure.data.json/write-str (new dep: org.clojure/data.json 2.5.0) so --summary foo.json produces JSON that downstream tools (report scripts, CI aggregators) can parse directly. Add -S FILE as alias for --summary FILE, matching v3 pgloader's CLI.
…0k rows)
Adds clojure/tests/bench/ — a dedicated benchmark suite that runs each
dataset 3 times against v4 and v3 and produces a timing comparison table.
Layout
------
Dockerfile Custom MariaDB 11 image; fetches employees tarball
(35 MB) at build time, pre-seeds via init-employees.sh
init-employees.sh Strips SOURCE commands (MySQL CLI-only) from employees.sql
and runs DDL + per-dump-file loading directly
docker-compose.yml mariadb (bench source) + postgres (target) + test-runner
employees.load pgloader LOAD DATABASE mariadb→postgres, workers=4
lahman.load pgloader LOAD DATABASE sqlite→postgres
Makefile 3-run timing loop per target; augments each JSON summary
with os-wall-ms (date +%s%3N before/after pgloader)
report.py Reads v4 JSON (grand-total.total-nanos) and v3 JSON
(root SECS key) plus os-wall-ms; prints comparison table
and writes Markdown to $GITHUB_STEP_SUMMARY when set
CI additions
------------
build-bench-source Builds + caches the MariaDB image (keyed on Dockerfile
and init-employees.sh; skipped when unchanged)
bench matrix employees×{v4,v3} + lahman×{v4,v3}, RUNS=3 each
bench-report Aggregates timing JSONs, prints table, writes step summary
publish-dev Now requires bench jobs to pass before publishing
Lahman SQLite (66 MB, jknecht/baseball-archive-sqlite 2022) is fetched by
'make lahman.sqlite' and cached in CI with actions/cache keyed on the
release tag. Not committed to git.
make -C tests bench now delegates to tests/bench/Makefile. Also adds employees, employees-v3, lahman, lahman-v3, bench-report, bench-down as individual pass-throughs.
Three bugs found during local trial run and fixed: 1. date +%s%3N (macOS): BSD date appends literal N; switch to perl -MTime::HiRes to get ms epoch on both Linux and macOS. 2. Perl inline define block: single $ in Makefile define blocks is expanded by make as $(var) (empty string). Extracted the JSON injection to inject-ms.pl so all Perl variables are unambiguous. 3. Missing GRANT: MariaDB Docker creates the pgloader user via MARIADB_USER/PASSWORD but grants it no database access. Added GRANT ALL PRIVILEGES ON employees.* TO 'pgloader'@'%' at the end of init-employees.sh.
Previously 'make -C tests bench' delegated to 'make -C bench bench' which
ran pgloader directly on the host. That means the Docker hostnames
mariadb/postgres can't resolve, date +%s%3N fails on macOS, python3 isn't
available, and summary files are lost when the container exits.
The fix mirrors every other integration suite:
make -C tests bench
→ docker compose -f bench/docker-compose.yml run --rm test-runner
(starts mariadb + postgres, waits for healthchecks,
then runs 'make -C /suite bench' inside the test-runner)
→ python3 bench/report.py bench/summaries (host, after container exits)
→ teardown
Key changes:
- tests/Makefile bench target uses docker compose run (not make -C bench)
- bench/lahman.sqlite is downloaded as a prerequisite before docker starts
- SUMMARY_DIR defaults to $(BENCH_DIR)summaries = /suite/summaries inside
the container, which maps to bench/summaries/ on the host via .:/suite
- bench target in bench/Makefile drops 'report' (no python3 in container)
- bench/summaries/ and bench/lahman.sqlite added to .gitignore
…clear summaries
Three fixes:
1. report.py parse_v3: v3 JSON DATA is a list of groups where each group
is a list of per-table dicts (concurrent batches), not a flat list.
Some groups can also be JSON null (e.g. SQLite loads with no DATA).
Fixed by flattening and filtering Nones before summing ROWS.
2. report.py median row now includes the median row count column.
3. tests/Makefile bench target:
- clears bench/summaries/ before each run so stale files from a
previous run (different RUNS count or old stub lahman.sqlite) don't
pollute the report
- propagates RUNS variable into the container via
'make -C /suite bench RUNS=$(RUNS)'
New layout:
run │ step │ employees v3 │ employees v4 │ v3÷v4 │ lahman v3 │ lahman v4 │ v3÷v4
1 │ pgloader │ x.xxxs │ x.xxxs │ x.xx× │ x.xxxs │ x.xxxs │ x.xx×
1 │ COPY wall │ — │ x.xxxs │ — │ — │ x.xxxs │ —
1 │ OS wall │ x.xxxs │ x.xxxs │ x.xx× │ x.xxxs │ x.xxxs │ x.xx×
...
med │ pgloader │ ...
Column width adapts to the widest suite+version header. ─ separators
after each run block. v3 COPY wall is always — (not reported by v3).
…--quiet runs - parse_v3: extract COPY wall time from POSTLOAD 'COPY Threads Completion' entry (equivalent to v4's 'COPY Wall-Clock Time' post-phase entry) - build_table: v3÷v4 ratio is always pgloader-time based (total time reported by pgloader), shown only on the pgloader row; COPY wall and OS wall rows show — in the ratio column - Makefile: use --quiet for both v4 (java -jar ... --quiet) and v3 (pgloader --quiet) so log I/O does not inflate bench timings
logback.xml had an explicit '<logger name="pgloader" level="DEBUG"/>' that pinned every pgloader.* logger to DEBUG regardless of --quiet or any programmatic level change — set-log-level! only updated the root logger. Two-part fix: 1. Remove the explicit DEBUG override from logback.xml so the pgloader logger inherits from root (INFO by default, matching the root setting). 2. Harden set-log-level! to always clear the pgloader named-logger level (setLevel null → inherit from root) so any future logback.xml pin cannot defeat --quiet again. Default output is now INFO-only (was DEBUG) without any flags.
The COPY wall comparison is the most meaningful signal (pure transfer time, no connection setup or index overhead). OS wall stays — as it is indicative only.
## New benchmark: Divvy Bikeshare trips (CSV)
- 3 summer months 2023 (June/July/August): ≈ 2.2 M rows, ≈ 450 MB CSV
- Uses pgloader's filename-pattern feature:
FROM ALL FILENAMES MATCHING ~/\d{6}-divvy-tripdata\.csv$/ IN DIRECTORY ...
- Data fetched on host via `make divvy-data` (curl + unzip), then
bind-mounted read-only into the test-runner at /work/divvy/
- Same CI caching pattern as lahman (actions/cache keyed on month range)
## Report rewritten to match PR-description format
- Columns: dataset │ ver │ run │ pgloader │ COPY wall │ OS wall │ rows │ bytes │ MB/s
- Rows grouped by (dataset, version); v3÷v4 ratio row per dataset
- bytes from grand-total.bytes (v4) / root BYTES (v3)
- rows from grand-total.rows (v4) / sum of DATA[*].ROWS (v3)
- MB/s = bytes / (1 MiB) / copy_wall_s (shown per run and median)
- Suites auto-detected from summary file names (no hard-coded list)
- GITHUB_STEP_SUMMARY: appended as markdown code block from the CI step
## CI fixes (bench jobs were all failing with exit code 2)
1. make command: was `make $target` (wrong CWD in container);
fixed to `make -C /suite $target`
2. artifact path: was /tmp/pgloader-bench/ (non-existent);
fixed to clojure/tests/bench/summaries/
3. step summary: removed invalid ${{ github.step_summary }} env override;
now wraps report output in a markdown code block via tee + redirect
report.py: ratio was v3÷v4 (> 1 = v4 faster), flip to v4÷v3 (< 1 = v4 faster, > 1 = v4 slower) so the direction reads naturally when comparing v4 against v3. csv.clj: GlobCSVSource.read-rows was wrapping each file's lazy sequence in (vec ...), forcing the entire CSV file into memory before the prefetch pipeline could drain it. For large multi-file loads (e.g. 3 × 150 MB Divvy CSVs) this blows the default JVM heap. Drop the vec to keep rows lazy -- the prefetch reader loop already handles lazy seqs via first/rest. bench/Makefile: add -Xmx2g to PGLOADER_V4 as a belt-and- suspenders safety net for the bench runs.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds
clojure/tests/bench/— a timing benchmark suite that runs each dataset 3 times against both v4 and v3 and reports a side-by-side comparison table.Also fixes two pre-requisite gaps in v4's summary output:
--summarynow emits real JSON (clojure.data.json) instead of Clojure EDN (pr-str), so downstream scripts canjson.load()it directly-S FILEadded as alias for--summary FILE, matching v3's CLIDatasets
make lahman.sqlite, cached in CI)How timing works
Each Makefile target runs pgloader N times (default 3). Around each invocation:
report.pyreads all JSON files, extracts pgloader-reported time (grand-total.total-nanos) and OS wall time (os-wall-ms), computes medians and v3÷v4 ratio, and prints:In GitHub Actions the table is also written to the job's step summary as Markdown.
CI additions
build-bench-source— builds and caches the custom MariaDB+employees image (keyed onDockerfile+init-employees.sh; skipped when unchanged)benchmatrix — 4 jobs: employees×{v4,v3} + lahman×{v4,v3}, RUNS=3 each; uploads per-run timing JSON as artifactsbench-report— aggregates all 4 artifact sets and emits the comparison tablepublish-devnow waits for bench jobs to pass before publishing