Summary
Add two improvements to the static analytics site generator to reduce bot noise in analytics reports:
-
Suspicious page path filter — regex-based filter in fetch.py that removes malformed/bot page paths from the pageviews detail table (broken markdown links, CMS probes, asset requests, etc.)
-
Engaged sessions metric — queries GA4's engagedSessions metric alongside sessions and displays it in the stats card as "Engaged Sessions" instead of "User Sessions"
Details
Suspicious page path filter
Removes paths like:
/](https://...) — broken markdown links
//checkout/ — e-commerce probes
/help@lists... — email-as-path
/robots.txt, /favicon-32x32.png — asset requests
/docs/, /docs-EN/ — CMS probes
Engaged sessions
GA4's engagedSessions counts only sessions where the user stayed 10+ seconds, viewed 2+ pages, or triggered a conversion. This gives a more honest session count by excluding bot drive-bys.
Files changed
analytics/static_site/fetch.py — add SUSPICIOUS_PAGE_PATH_RE, METRIC_ENGAGED_SESSIONS, filter logic
analytics/static_site/export.py — export engaged_sessions in meta.json
analytics/static_site/template/index.html — display engaged sessions in stats card
Note
These changes affect all sites using the shared analytics package (AnVIL Portal, LungMAP, HCA Explorer, etc.)
Summary
Add two improvements to the static analytics site generator to reduce bot noise in analytics reports:
Suspicious page path filter — regex-based filter in
fetch.pythat removes malformed/bot page paths from the pageviews detail table (broken markdown links, CMS probes, asset requests, etc.)Engaged sessions metric — queries GA4's
engagedSessionsmetric alongsidesessionsand displays it in the stats card as "Engaged Sessions" instead of "User Sessions"Details
Suspicious page path filter
Removes paths like:
/](https://...)— broken markdown links//checkout/— e-commerce probes/help@lists...— email-as-path/robots.txt,/favicon-32x32.png— asset requests/docs/,/docs-EN/— CMS probesEngaged sessions
GA4's
engagedSessionscounts only sessions where the user stayed 10+ seconds, viewed 2+ pages, or triggered a conversion. This gives a more honest session count by excluding bot drive-bys.Files changed
analytics/static_site/fetch.py— addSUSPICIOUS_PAGE_PATH_RE,METRIC_ENGAGED_SESSIONS, filter logicanalytics/static_site/export.py— exportengaged_sessionsin meta.jsonanalytics/static_site/template/index.html— display engaged sessions in stats cardNote
These changes affect all sites using the shared analytics package (AnVIL Portal, LungMAP, HCA Explorer, etc.)