Skip to content

Commit b47e64c

Browse files
LukasWallrichRichard Dushimericharddushimeclaude
authored
Add working link checker workflow using lychee (#705)
* fix:compatibility * Add working link checker workflow using lychee Replace the broken filiph/linkcheck workflow with lychee, which crawls the live forrt.org site weekly and creates a GitHub issue listing any broken links found. Includes .lychee.toml config with exclusions for common false positives (LinkedIn, Twitter/X, doi.org, web.archive.org). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix lychee action path: lycheeverse/lychee-action Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Crawl full site via sitemap instead of single URL Lychee doesn't support recursive crawling, so fetch all page URLs from forrt.org/sitemap.xml and check links on every page. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Use portable grep/sed for sitemap URL extraction Replace grep -oP (Perl regex) with grep -o + sed for broader shell compatibility. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Check all links (internal + external) using build artifact Download the latest deploy artifact instead of crawling the live site. Lychee scans the local HTML files and checks every link it finds, both internal and external. This catches broken outbound links that the sitemap-only approach missed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Replace deprecated --base with --base-url Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix malformed author fields causing broken URLs - Remove email addresses from author fields in educators-corner posts (Sarah von Grebmer, Rachel Heyard) - Fix YAML syntax in Berit Barthelmes author profile (stray 'Name' prefix) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Exclude internal forrt.org links and bot-blocking publishers Internal links resolve to remote fetches via --base-url, causing thousands of false 404s for assets. Exclude forrt.org since those are already local files. Also exclude Sage, T&F, APA which block automated requests with 403s. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Accept 403 globally instead of excluding publishers Academic publishers (Sage, T&F, APA, etc.) return 403 for all automated requests — valid and invalid URLs alike. Accept 403 as non-broken so these links are still checked but don't produce false positives. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Replace publisher DOI URLs with doi.org and flag remaining ones - Convert 488 publisher-specific DOI URLs to canonical https://doi.org/ format across 11 content files (glossary excluded as auto-generated) - Strip session-specific casa_token query params from all URLs - Remove doi.org from lychee exclusion list (it returns proper 404s for invalid DOIs, unlike publishers that block all bot requests) - Add workflow step to flag remaining publisher DOI URLs in the link checker issue report Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Broaden publisher URL detection to include ScienceDirect, JSTOR, etc. Flag any direct publisher URL (not just those with visible DOIs) so contributors know to look up and use the doi.org format. Added ScienceDirect, JSTOR, LWW, and Royal Society to the pattern. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Collapse 403 errors into a separate section in issue report Remove 403 from accepted status codes so they appear in lychee output, then post-process to move them into a collapsed <details> block. This keeps the main report focused on actionable errors while still surfacing bot-blocked URLs for reference. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Deduplicate errors and shorten issue report Lychee reports the same broken URL once per page it appears on, making the issue body exceed GitHub's 65KB limit. Post-process to show each broken URL only once, with shortened output. Also moves per-page headers out in favour of a flat deduplicated list. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Truncate 403 and publisher URL lists to fit GitHub issue body limit GitHub limits issue bodies to 65KB. Cap 403 and publisher URL lists at 100 entries each with a count of remaining items. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Show page locations for broken links, uncollapse publisher section - Track which page(s) each broken URL appears on so they can be found - Keep publisher URL section open (not collapsed) as last section - 403s still collapsed and capped at 100 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Compact publisher URL output: show file:line + URL only The full grep line content from reversals.md made the issue body exceed 65KB. Use grep -o to extract just the URL, with file:line prefix, and deduplicate. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Richard Dushime <rdm@rdm.local> Co-authored-by: Richard Dushime <45734838+richarddushime@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent faeca75 commit b47e64c

24 files changed

Lines changed: 577 additions & 349 deletions

File tree

.github/workflows/link-check.yaml

Lines changed: 171 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,171 @@
1+
name: Link Checker
2+
3+
# =======================
4+
# Website Link Validation
5+
# =======================
6+
# Purpose: Downloads the latest built site and checks all links (internal + external)
7+
# Triggers: Weekly on Mondays at 01:30 UTC or manual dispatch
8+
# Reports: Creates a GitHub issue with label "link-check" when broken links are found
9+
# Config: See .lychee.toml for exclusion patterns and request settings
10+
11+
on:
12+
schedule:
13+
# Runs at 01:30 UTC every Monday
14+
- cron: '30 1 * * 1'
15+
workflow_dispatch:
16+
17+
permissions:
18+
contents: read
19+
issues: write
20+
actions: read
21+
22+
concurrency:
23+
group: link-check-${{ github.ref }}
24+
cancel-in-progress: true
25+
26+
jobs:
27+
link-check:
28+
name: Check Links
29+
runs-on: ubuntu-latest
30+
steps:
31+
- name: Checkout repository
32+
uses: actions/checkout@v4
33+
34+
- name: Download latest build artifact
35+
uses: dawidd6/action-download-artifact@07ab29fd4a977ae4d2b275087cf67563dfdf0295
36+
with:
37+
workflow: deploy.yaml
38+
name: forrt-website-.*
39+
name_is_regexp: true
40+
path: /tmp/site
41+
github_token: ${{ secrets.GITHUB_TOKEN }}
42+
search_artifacts: true
43+
if_no_artifact_found: fail
44+
45+
- name: Run lychee link checker
46+
id: lychee
47+
uses: lycheeverse/lychee-action@v2
48+
with:
49+
args: "--config .lychee.toml --base-url https://forrt.org /tmp/site"
50+
output: /tmp/lychee/out.md
51+
fail: false
52+
env:
53+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
54+
55+
- name: Process lychee output
56+
if: steps.lychee.outputs.exit_code != 0
57+
run: |
58+
python3 << 'PYEOF'
59+
import re
60+
61+
with open("/tmp/lychee/out.md") as f:
62+
content = f.read()
63+
64+
lines = content.split("\n")
65+
66+
# Keep the summary table (everything before "## Errors per input")
67+
summary_lines = []
68+
error_lines = []
69+
in_errors = False
70+
for line in lines:
71+
if line.startswith("## Errors per input"):
72+
in_errors = True
73+
continue
74+
if in_errors:
75+
error_lines.append(line)
76+
else:
77+
summary_lines.append(line)
78+
79+
# Parse errors, tracking which pages each URL appears on
80+
# url -> {"status": str, "pages": [str]}
81+
url_info = {}
82+
main_errors = {} # non-403
83+
forbidden_errors = {} # 403
84+
current_page = ""
85+
86+
for line in error_lines:
87+
# Track section headers (### Errors in /tmp/site/.../index.html)
88+
page_match = re.match(r"^### Errors in /tmp/site/[^/]+/(.+)", line)
89+
if page_match:
90+
# Convert file path to URL path
91+
path = page_match.group(1)
92+
path = re.sub(r"/index\.html$", "/", path)
93+
current_page = f"/{path}"
94+
continue
95+
96+
m = re.match(r"^\* \[(\w+)\] <([^>]+)>", line)
97+
if not m:
98+
continue
99+
status, url = m.group(1), m.group(2)
100+
101+
target = forbidden_errors if status == "403" else main_errors
102+
if url not in target:
103+
target[url] = {"status": status, "pages": []}
104+
if current_page and current_page not in target[url]["pages"]:
105+
target[url]["pages"].append(current_page)
106+
107+
# Build output
108+
output = "\n".join(summary_lines).rstrip()
109+
output += "\n\n## Broken links\n\n"
110+
111+
if main_errors:
112+
for url, info in main_errors.items():
113+
pages = info["pages"]
114+
page_str = f" (in {pages[0]})" if len(pages) == 1 else f" (in {len(pages)} pages)"
115+
output += f"* [{info['status']}] <{url}>{page_str}\n"
116+
else:
117+
output += "No broken links found (excluding 403s).\n"
118+
119+
if forbidden_errors:
120+
output += f"\n<details>\n<summary>403 Forbidden ({len(forbidden_errors)} URLs — likely bot-blocking, not broken)</summary>\n\n"
121+
output += "These sites block automated requests. The links may still be valid.\n"
122+
output += "Showing first 100 — see workflow logs for full list.\n\n"
123+
for i, (url, info) in enumerate(forbidden_errors.items()):
124+
if i >= 100:
125+
output += f"\n*... and {len(forbidden_errors) - 100} more*"
126+
break
127+
output += f"* <{url}>\n"
128+
output += "\n</details>\n"
129+
130+
with open("/tmp/lychee/out.md", "w") as f:
131+
f.write(output)
132+
PYEOF
133+
134+
- name: Find publisher URLs that should use doi.org
135+
id: doi-check
136+
run: |
137+
# Search source markdown files (excluding glossary, which is auto-generated)
138+
# for direct publisher URLs that should use https://doi.org/ instead.
139+
PUBLISHERS='(journals\.sagepub|tandfonline|psycnet\.apa|onlinelibrary\.wiley|link\.springer|academic\.oup|sciencedirect|jstor\.org|journals\.lww|royalsocietypublishing)\.(com|org)'
140+
# Extract just file:line and the URL itself (not full line content)
141+
MATCHES=$(grep -rno --include='*.md' -E \
142+
"https?://[^ )\"']*(${PUBLISHERS})/[^ )\"']*(doi/|article|fulltext)[^ )\"']*" \
143+
content/ --exclude-dir=content/glossary | sort -u || true)
144+
if [ -n "$MATCHES" ]; then
145+
COUNT=$(echo "$MATCHES" | wc -l)
146+
{
147+
echo ""
148+
echo "## Publisher URLs that should use doi.org ($COUNT found)"
149+
echo ""
150+
echo "The following links point directly to publisher websites instead of using"
151+
echo "\`https://doi.org/{DOI}\` format. Publishers often block automated requests,"
152+
echo "making these URLs uncheckable. Please replace them with doi.org links."
153+
echo "If the DOI is not visible in the URL, look it up on https://search.crossref.org"
154+
echo ""
155+
echo '```'
156+
echo "$MATCHES"
157+
echo '```'
158+
} >> /tmp/lychee/out.md
159+
echo "found=true" >> "$GITHUB_OUTPUT"
160+
else
161+
echo "found=false" >> "$GITHUB_OUTPUT"
162+
fi
163+
164+
- name: Create issue from lychee output
165+
if: steps.lychee.outputs.exit_code != 0 || steps.doi-check.outputs.found == 'true'
166+
uses: peter-evans/create-issue-from-file@v5
167+
with:
168+
title: "Link Checker Report"
169+
content-filepath: /tmp/lychee/out.md
170+
labels: link-check
171+
token: ${{ secrets.GITHUB_TOKEN }}

.lychee.toml

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# Lychee link checker configuration
2+
# https://lychee.cli.rs/usage/config/
3+
#
4+
# Used by the link-check GitHub Actions workflow.
5+
# Checks all links (internal + external) in the built HTML files.
6+
7+
# ---------------------
8+
# Exclusions
9+
# ---------------------
10+
# Patterns to exclude from link checking (common false positives)
11+
exclude = [
12+
# Internal links — already verified as local files in the build artifact
13+
"forrt\\.org",
14+
15+
# Placeholder / example domains
16+
"example\\.com",
17+
"localhost",
18+
"127\\.0\\.0\\.1",
19+
20+
# Social media sites that block automated requests
21+
"linkedin\\.com",
22+
"twitter\\.com",
23+
"x\\.com",
24+
25+
# Web Archive — often slow or flaky
26+
"web\\.archive\\.org",
27+
28+
# GitHub edit links with templated paths
29+
"github\\.com/.*/edit/",
30+
]
31+
32+
# ---------------------
33+
# Request settings
34+
# ---------------------
35+
# Accept 2xx/3xx and 429 (rate limiting)
36+
# Note: 403 is NOT accepted — those are separated into a collapsed section
37+
# by the workflow's post-processing step, since many publishers block bots.
38+
accept = ["100..=399", "429"]
39+
40+
# Timeout per request in seconds
41+
timeout = 30
42+
43+
# Maximum number of retries per link
44+
max_retries = 3
45+
46+
# Maximum concurrent requests
47+
max_concurrency = 16
48+
49+
# Do not show progress bar (cleaner CI output)
50+
no_progress = true

content/authors/berit-t-barthelmes-msc/_index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ name: "Berit T. Barthelmes, M.Sc."
44

55
# Username (this should match the folder name)
66
authors:
7-
- Name "Berit T. Barthelmes, M.Sc."
7+
- "Berit T. Barthelmes, M.Sc."
88

99
# Is this the primary user of the site?
1010
superuser: false

content/clusters/cluster3.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -164,7 +164,7 @@ PsuTeachR's [Data Skills for Reproducible Science](https://psyteachr.github.io/m
164164

165165
***Includes tools such as statcheck.io, GRIM, and SPRITE***
166166

167-
* Brown, N. J., & Heathers, J. A. (2016). The GRIM test: A simple technique detects numerous anomalies in the reporting of results in psychology. Social Psychological and Personality Science, 1948550616673876. http://journals.sagepub.com/doi/pdf/10.1177/1948550616673876
167+
* Brown, N. J., & Heathers, J. A. (2016). The GRIM test: A simple technique detects numerous anomalies in the reporting of results in psychology. Social Psychological and Personality Science, 1948550616673876. https://doi.org/10.1177/1948550616673876
168168

169169
* Nuijten, M. B., Van Assen, M. A. L. M., Hartgerink, C. H. J., Epskamp, S., & Wicherts, J. M. (2017). The validity of the tool “statcheck” in discovering statistical reporting inconsistencies. Preprint retrieved from https://psyarxiv.com/tcxaj/.
170170

content/educators-corner/004-Teaching-why-how-replication/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -121,7 +121,7 @@ As another example, when I illustrate the smallest-effect-size-of-interest workf
121121
![Notes](fig9.webp "Notes")
122122

123123

124-
One last issue that comes out of the simulations is the number of assumptions that one must make in the process of doing a simulation study. This includes both statistical assumptions, such as the size of the standard deviation of the outcome measure, and non-statistical assumptions, such as the length of time it takes for a typical participate in the study (a fact that is necessary to accurately estimate the number of participants who can participate in a lab-based study, for example). I argue that pilot studies are useful for developing good values for these assumptions. Pilot studies are _not_ useful for directly estimating the value of the target effect size itself ([Albers & Lakens, 2018](https://www.sciencedirect.com/science/article/pii/S002210311630230X?casa_token=OETt_Sm5VFEAAAAA:-9rK8QScds9e0A1siznusvdtvl0-yC2WpBVWe7ztdGkZ8eVILbyqWMC5WmcsAxHWp6X7X7voPeA)); in any case it is better to power to a smallest effect size of interest than the expected effect size.
124+
One last issue that comes out of the simulations is the number of assumptions that one must make in the process of doing a simulation study. This includes both statistical assumptions, such as the size of the standard deviation of the outcome measure, and non-statistical assumptions, such as the length of time it takes for a typical participate in the study (a fact that is necessary to accurately estimate the number of participants who can participate in a lab-based study, for example). I argue that pilot studies are useful for developing good values for these assumptions. Pilot studies are _not_ useful for directly estimating the value of the target effect size itself ([Albers & Lakens, 2018](https://www.sciencedirect.com/science/article/pii/S002210311630230X)); in any case it is better to power to a smallest effect size of interest than the expected effect size.
125125

126126
![Workflow 2](fig10.webp "Workflow 2")
127127

content/educators-corner/006-qualitative-OS-practices/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ We thus gathered people interested in qualitative open science research in educa
4646
![Criteria for Reporting Qualitative Studies](Fig1.webp "Criteria for Reporting Qualitative Studies")
4747

4848

49-
3. **Open Materials.** Thanks to the internet, researchers have websites and repositories where they can upload the tools for others to access. In qualitative research, this might mean interview protocols, memos, coding notebooks, tools (such as Nvivo or R packages), or even the data itself. This provides a sort of audit trail so others can verify the results of the research. There is no all or nothing here; open materials, much like the rest of these open science practices, exist along a spectrum. Not only what researchers share is on a spectrum; researchers can also dictate who may access the open materials. Perhaps it’s the entire public, but it could just be people who want to verify findings (i.e., dissertation committees, participants, reviewers). Below you can see how [Bowman and Keene (2018)](https://www.tandfonline.com/doi/pdf/10.1080/08824096.2018.1513273) described open science practices as a layered onion with the innermost layer being the most transparent. However, no matter what or to whom materials are shared, researchers must include their plan within their consent procedures and IRB protocols to not violate any ethical boundaries.
49+
3. **Open Materials.** Thanks to the internet, researchers have websites and repositories where they can upload the tools for others to access. In qualitative research, this might mean interview protocols, memos, coding notebooks, tools (such as Nvivo or R packages), or even the data itself. This provides a sort of audit trail so others can verify the results of the research. There is no all or nothing here; open materials, much like the rest of these open science practices, exist along a spectrum. Not only what researchers share is on a spectrum; researchers can also dictate who may access the open materials. Perhaps it’s the entire public, but it could just be people who want to verify findings (i.e., dissertation committees, participants, reviewers). Below you can see how [Bowman and Keene (2018)](https://doi.org/10.1080/08824096.2018.1513273) described open science practices as a layered onion with the innermost layer being the most transparent. However, no matter what or to whom materials are shared, researchers must include their plan within their consent procedures and IRB protocols to not violate any ethical boundaries.
5050

5151

5252
![Conceptual Onion of Open Science Practices](Fig2.webp "Conceptual Onion of Open Science Practices")

content/educators-corner/010-Neurodiversity/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ People with disabilities are more likely to be excluded from the academic workfo
5858

5959
Many gatekeepers determine whether an individual is neurodivergent and these processes are driven by individuals who are neurotypical. As a result, referral time for these services vary widely, from 4 weeks to 201 weeks within the UK (Lloyd, 2019) and if a person does not fit the criteria, the individual can be ignored and may not receive the much-needed help that they require. This can lead to poor self-esteem, unemployment (e.g. around 22% in autistic people are in any type of employment; see Figure 2 in [fact sheet](https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/disability/articles/outcomesfordisabledpeopleintheuk/2020)). As a result, neurodivergent individuals may blame themselves for the difficulties they encounter, as opposed to the barriers that society has placed on them.
6060

61-
Despite this, people of different neurodivergent conditions or families of the people with the conditions have begun meeting and talking to each other about their experiences and one common shared experience is a history of misinterpretation and mistreatment by the dominant neurotypical cultures and its institutions such as academia. As a result of centuries of oppression of disabled people worldwide and a hyper-normalised environment, in addition to seeing the disproportionate impact of the coronavirus pandemic on disabled students (see this amazing [paper ](https://link.springer.com/article/10.1007/s10639-021-10559-3)by Dr Joanna Zawadka), many neurodivergent and disabled staff feel discouraged in an environment that should aim to support them. They do not feel like they belong, their differences are seen as an impairment and their voice does not seem to matter. They do not see themselves represented in psychological science, academia, business, teaching or elsewhere.
61+
Despite this, people of different neurodivergent conditions or families of the people with the conditions have begun meeting and talking to each other about their experiences and one common shared experience is a history of misinterpretation and mistreatment by the dominant neurotypical cultures and its institutions such as academia. As a result of centuries of oppression of disabled people worldwide and a hyper-normalised environment, in addition to seeing the disproportionate impact of the coronavirus pandemic on disabled students (see this amazing [paper ](https://doi.org/10.1007/s10639-021-10559-3)by Dr Joanna Zawadka), many neurodivergent and disabled staff feel discouraged in an environment that should aim to support them. They do not feel like they belong, their differences are seen as an impairment and their voice does not seem to matter. They do not see themselves represented in psychological science, academia, business, teaching or elsewhere.
6262

6363
We are a group of early-career neurotypical and neurodivergent researchers that are a part of the Framework of Open Reproducible Research and Training (FORRT) community, aiming to make academia and the open scholarship community more open to neurodiversity. Everyone, no matter what they identify with, is welcome in this group. We aim to discuss how open scholarship can be intersected with the neurodiversity movement, and emphasise how differences should be highlighted and accepted, whilst supporting the idea of accessibility. Our neurodiversity team is a group that currently consists of individuals that have autism, dyspraxia/DCD, speech-language differences, ADHD, dyslexia, or are neurotypical allies. If you have these or other neurominorities and wish to be part of the team, you are more than welcome to join!
6464

0 commit comments

Comments
 (0)