Skip to content

GH-145000: Add a tool to check removed HTML IDs#145001

Open
encukou wants to merge 5 commits intopython:mainfrom
encukou:check-removed-anchors
Open

GH-145000: Add a tool to check removed HTML IDs#145001
encukou wants to merge 5 commits intopython:mainfrom
encukou:check-removed-anchors

Conversation

@encukou
Copy link
Member

@encukou encukou commented Feb 19, 2026

With this, make -CDoc html-ids will create a html-ids.json.gz file containing IDs in the built docs.


📚 Documentation preview 📚: https://cpython-previews--145001.org.readthedocs.build/

def get_ids_from_file(path):
ids = set()
gatherer = IDGatherer(ids)
with path.open() as file:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Encoding?

Copy link
Member Author

@encukou encukou Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UTF-8, as per PEP-686 :)
I'll set it explicitly to ease backporting. Thanks for the catch!

Comment on lines 63 to 64
ids = future.result()
ids_by_page[str(relative_path)] = future.result()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

future.result() is called twice.


if args.command == 'check':
with gzip.open(args.baseline_file) as zfile:
baseline = json.load(zfile)['ids_by_page']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

json.load() requires a text file, gzip.open() returns a binary file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not true.

checked = json.load(zfile)['ids_by_page']
excluded = set()
if args.exclude_file:
with open(args.exclude_file) as file:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Encoding?

ids = gather_ids(args.htmldir, verbose_print=verbose_print)
if args.outfile is None:
args.outfile = args.htmldir / 'html-ids.json.gz'
with gzip.open(args.outfile, 'wt') as zfile:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Encoding?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments