Skip to content

Bash tool drops non-ASCII characters due to LC_CTYPE=C in shell environment #3601

@404hub

Description

@404hub

Summary

The bash tool in Copilot CLI spawns shells with LANG="" and LC_CTYPE="C". This causes all non-ASCII characters (Chinese, Japanese, Korean, accented Latin, emoji, etc.) to be silently stripped from command strings. File paths containing such characters become unresolvable, and file content written via heredoc or echo loses all multibyte characters.

Environment

  • Copilot CLI version: 1.0.49
  • OS: macOS (Darwin)
  • Shell: bash (spawned by Copilot CLI)

Steps to reproduce

  1. Open Copilot CLI
  2. Run any bash command involving non-ASCII characters:
echo "ABC你好DEF" | xxd

Expected: 41 42 43 e4bda0 e5a5bd 44 45 46 0a (UTF-8 bytes preserved)

Actual: 41 42 43 44 45 46 0a — the Chinese characters are gone entirely.

  1. Attempt to write a file to a path containing non-ASCII directory names:
echo "test" > "/path/to/中文目录/file.txt"

Expected: File created successfully.

Actual: Command hangs indefinitely. The shell appears to attempt resolving a corrupted path (with multibyte characters stripped) and never returns.

Root cause

Running locale inside the bash tool confirms:

LANG=""
LC_COLLATE="C"
LC_CTYPE="C"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL=

With LC_CTYPE=C, the shell treats all bytes above 0x7F as invalid and discards them during command-line processing. This affects:

  • String literals in commands (echo, printf, heredoc content)
  • File path arguments containing non-ASCII characters
  • Variable assignments with multibyte content

The same operations succeed when run through Python (which manages its own UTF-8 encoding independent of the shell locale), confirming the filesystem and disk are fine.

Suggested fix

Set LANG=C.UTF-8 (or en_US.UTF-8) when spawning bash subprocesses. This preserves the benefits of a minimal locale (deterministic sort order, no locale-dependent formatting surprises) while correctly handling multibyte characters:

env LANG=C.UTF-8 /bin/bash --norc --noprofile -c "..."

Using --norc --noprofile already avoids loading user shell plugins. Adding a UTF-8 locale on top shouldn't introduce any slowness or side effects.

Impact

This bug makes Copilot CLI effectively unusable for any task involving non-English file paths or content — which is common for users working in CJK languages, or anyone whose project happens to have non-ASCII directory names. The failure mode is particularly bad: commands silently produce corrupted output or hang forever, with no error message indicating the cause.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:toolsBuilt-in tools: file editing, shell, search, LSP, git, and tool call behavior

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions