Summary
The bash tool in Copilot CLI spawns shells with LANG="" and LC_CTYPE="C". This causes all non-ASCII characters (Chinese, Japanese, Korean, accented Latin, emoji, etc.) to be silently stripped from command strings. File paths containing such characters become unresolvable, and file content written via heredoc or echo loses all multibyte characters.
Environment
- Copilot CLI version: 1.0.49
- OS: macOS (Darwin)
- Shell: bash (spawned by Copilot CLI)
Steps to reproduce
- Open Copilot CLI
- Run any bash command involving non-ASCII characters:
Expected: 41 42 43 e4bda0 e5a5bd 44 45 46 0a (UTF-8 bytes preserved)
Actual: 41 42 43 44 45 46 0a — the Chinese characters are gone entirely.
- Attempt to write a file to a path containing non-ASCII directory names:
echo "test" > "/path/to/中文目录/file.txt"
Expected: File created successfully.
Actual: Command hangs indefinitely. The shell appears to attempt resolving a corrupted path (with multibyte characters stripped) and never returns.
Root cause
Running locale inside the bash tool confirms:
LANG=""
LC_COLLATE="C"
LC_CTYPE="C"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL=
With LC_CTYPE=C, the shell treats all bytes above 0x7F as invalid and discards them during command-line processing. This affects:
- String literals in commands (echo, printf, heredoc content)
- File path arguments containing non-ASCII characters
- Variable assignments with multibyte content
The same operations succeed when run through Python (which manages its own UTF-8 encoding independent of the shell locale), confirming the filesystem and disk are fine.
Suggested fix
Set LANG=C.UTF-8 (or en_US.UTF-8) when spawning bash subprocesses. This preserves the benefits of a minimal locale (deterministic sort order, no locale-dependent formatting surprises) while correctly handling multibyte characters:
env LANG=C.UTF-8 /bin/bash --norc --noprofile -c "..."
Using --norc --noprofile already avoids loading user shell plugins. Adding a UTF-8 locale on top shouldn't introduce any slowness or side effects.
Impact
This bug makes Copilot CLI effectively unusable for any task involving non-English file paths or content — which is common for users working in CJK languages, or anyone whose project happens to have non-ASCII directory names. The failure mode is particularly bad: commands silently produce corrupted output or hang forever, with no error message indicating the cause.
Summary
The bash tool in Copilot CLI spawns shells with
LANG=""andLC_CTYPE="C". This causes all non-ASCII characters (Chinese, Japanese, Korean, accented Latin, emoji, etc.) to be silently stripped from command strings. File paths containing such characters become unresolvable, and file content written via heredoc or echo loses all multibyte characters.Environment
Steps to reproduce
Expected:
41 42 43 e4bda0 e5a5bd 44 45 46 0a(UTF-8 bytes preserved)Actual:
41 42 43 44 45 46 0a— the Chinese characters are gone entirely.Expected: File created successfully.
Actual: Command hangs indefinitely. The shell appears to attempt resolving a corrupted path (with multibyte characters stripped) and never returns.
Root cause
Running
localeinside the bash tool confirms:With
LC_CTYPE=C, the shell treats all bytes above 0x7F as invalid and discards them during command-line processing. This affects:The same operations succeed when run through Python (which manages its own UTF-8 encoding independent of the shell locale), confirming the filesystem and disk are fine.
Suggested fix
Set
LANG=C.UTF-8(oren_US.UTF-8) when spawning bash subprocesses. This preserves the benefits of a minimal locale (deterministic sort order, no locale-dependent formatting surprises) while correctly handling multibyte characters:env LANG=C.UTF-8 /bin/bash --norc --noprofile -c "..."Using
--norc --noprofilealready avoids loading user shell plugins. Adding a UTF-8 locale on top shouldn't introduce any slowness or side effects.Impact
This bug makes Copilot CLI effectively unusable for any task involving non-English file paths or content — which is common for users working in CJK languages, or anyone whose project happens to have non-ASCII directory names. The failure mode is particularly bad: commands silently produce corrupted output or hang forever, with no error message indicating the cause.