fix(cli): pin UTF-8 encoding on init-options and .extensionignore I/O#2686
Open
Quratulain-bilal wants to merge 1 commit into
Open
fix(cli): pin UTF-8 encoding on init-options and .extensionignore I/O#2686Quratulain-bilal wants to merge 1 commit into
Quratulain-bilal wants to merge 1 commit into
Conversation
``Path.read_text`` / ``Path.write_text`` default to the system locale
codec, which is cp1252 / gb2312 / cp932 on Windows. Two user-facing
file paths in spec-kit were calling them without an explicit
``encoding=`` argument:
- ``src/specify_cli/__init__.py:400,412`` —
``save_init_options`` / ``load_init_options`` for
``.specify/init-options.json``. A peer machine with a different
default locale (or a UTF-8 Unix CI runner reading a file written on
a cp1252 Windows host) cannot decode the file, raising
``UnicodeDecodeError``. ``UnicodeDecodeError`` is a subclass of
``ValueError`` — not ``OSError`` / ``json.JSONDecodeError`` — so
the existing fall-back ``except`` tuple in ``load_init_options``
also misses it and the error propagates raw to the CLI.
- ``src/specify_cli/extensions.py:764`` — ``.extensionignore``
pattern reader. The very next line already normalises
backslashes "so Windows-authored files work", proving the codebase
expects Windows authors to write this file. Multibyte UTF-8
patterns (Chinese filenames, accented directory names) silently
mojibake when the host locale is not UTF-8, so the patterns fail
to match and unintended files are shipped with the extension.
The sibling integration-catalog reader at
``src/specify_cli/integrations/catalog.py:150,156,193,202,374``
already pins ``encoding="utf-8"`` everywhere. PR github#2280 fixed the
symmetric PowerShell-template BOM bug. This change brings the two
remaining drifted paths in line with that precedent.
Regression tests:
- ``tests/test_presets.py::TestInitOptions`` — parametrized non-ASCII
round-trip (CJK, Latin-1, Greek, emoji) plus a corrupted-file case
that asserts the existing "fall back to {}" contract still holds
when a peer file contains bytes invalid as UTF-8.
- ``tests/test_extensions.py::TestExtensionIgnore`` — Japanese
(``ドキュメント/``) and Latin-1 (``café/``) ignore patterns
correctly exclude their directories during install.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Path.read_text()/Path.write_text()default to the system locale codec, which is cp1252 / gb2312 / cp932 on Windows.Two user-facing file paths in spec-kit were calling them without an explicit
encoding=argument:src/specify_cli/__init__.py:400, 412—save_init_options/load_init_optionsfor.specify/init-options.json. A peer machine with a different default locale (or a UTF-8 Unix CI runner reading a filewritten on a cp1252 Windows host) cannot decode the file, raising
UnicodeDecodeError.UnicodeDecodeErroris asubclass of
ValueError— notOSError/json.JSONDecodeError— so the existing fall-backexcepttuple inload_init_optionsalso misses it and the error propagates raw to the CLI.src/specify_cli/extensions.py:764—.extensionignorepattern reader. The very next line already normalisesbackslashes "so Windows-authored files work", proving the codebase expects Windows authors to write this file.
Multibyte UTF-8 patterns (Chinese filenames, accented directory names) silently mojibake when the host locale is not
UTF-8, so the patterns fail to match and unintended files are shipped with the extension.
Reproducer
Why this matters
src/specify_cli/integrations/catalog.py:150,156,193,202,374already pinsencoding="utf-8"everywhere. PR fix(powershell): ensure UTF-8 templates are written without BOM #2280 (f684305) fixed the symmetric PowerShell-template BOM bug. The two paths inthis PR were the remaining drifted ones.
init-options.jsonis meant to be a portable record of how a project was scaffolded — a peer cloning the repo on adifferent OS / locale must be able to read it. Today they can't if the original author's project name (or any future
field) contains non-ASCII.
.extensionignorealready explicitly supports Windows authors (see line 766). UTF-8 patterns are part of that samecontract.
The change
src/specify_cli/__init__.py— pinencoding="utf-8"on bothwrite_textandread_text; extend the existingexcepttuple inload_init_optionsto includeUnicodeDecodeErrorso a peer file written in a non-UTF-8 codec fallsback to
{}per the existing contract instead of crashing.src/specify_cli/extensions.py— pinencoding="utf-8"on the.extensionignorereader.Tests
tests/test_presets.py::TestInitOptions— parametrized non-ASCII round-trip (CJK / Latin-1 / Greek / emoji) plus a0xe9-byte corrupted-file fallback test.tests/test_extensions.py::TestExtensionIgnore— Japanese (ドキュメント/) and Latin-1 (café/) ignore patternscorrectly exclude their directories during install.
Scope
Intentionally narrow: no behaviour change for ASCII content (UTF-8 is a superset). Only non-ASCII content that previously
round-tripped accidentally (when host locale happened to be UTF-8) or silently mojibaked (when it wasn't) now
round-trips reliably across all hosts.