Skip to content

Fix yaml and json loaders#298

Merged
masklinn merged 1 commit intoua-parser:masterfrom
masklinn:loaders-fix
Mar 28, 2026
Merged

Fix yaml and json loaders#298
masklinn merged 1 commit intoua-parser:masterfrom
masklinn:loaders-fix

Conversation

@masklinn
Copy link
Copy Markdown
Contributor

This is mostly an issue on windows, but technically it affects every OS. I know about this issue so there's no excuse, although I guess this means there's no user on windows, or at least of the yaml loader on windows.

The (well known, at least by me) problem where is that open in text mode (the default) will retrieve the encoding it uses via locale.getencoding() which is pretty much always the wrong thing to do but it is what it is1.

On most unices this is innocuous because they generally have utf-8 set as the locale encoding, but on windows it's generally going to fuck you up because it likely has a stupid ANSI locale set (only if you're really lucky has someone chanced setting 65001), leading to any non-ascii content potentially breaking file reading as the codepage will either not be able to decode it or will misread it.

Both JSON and PyYAML are perfectly happy reading binary files, in which case they'll apply UTF-8 decoding and go on their merry way, so change the code to that.

Footnotes

  1. it's finally being fixed in 3.15 by PEP 686 https://docs.python.org/3.15/whatsnew/3.15.html#whatsnew315-utf8-default

@masklinn masklinn enabled auto-merge (rebase) March 28, 2026 17:13
This is mostly an issue on windows, but technically it affects
every OS. I know about this issue so there's no excuse, although I
guess this means there's no user on windows, or at least of the yaml
loader on windows.

The (well known, at least by me) problem where is that `open` in text
mode (the default) will retrieve the encoding it uses via
`locale.getencoding()` which is pretty much always the wrong thing to
do but it is what it is[^1].

On most unices this is innocuous because they generally have utf-8 set
as the locale encoding, but on windows it's generally going to fuck
you up because it likely has a stupid ANSI locale set (only if you're
really lucky has someone chanced setting 65001), leading to any
non-ascii content potentially breaking file reading as the codepage
will either not be able to decode it or will misread it.

Both JSON and PyYAML are perfectly happy reading binary files, in
which case they'll apply UTF-8 decoding and go on their merry
way, so change the code to that.

[^1]: it's finally being fixed in 3.15 by PEP 686
    https://docs.python.org/3.15/whatsnew/3.15.html#whatsnew315-utf8-default
@masklinn masklinn merged commit 52b7c6c into ua-parser:master Mar 28, 2026
30 checks passed
@masklinn masklinn deleted the loaders-fix branch March 28, 2026 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant