diff --git a/src/input-format.md b/src/input-format.md index 3e35cba1ee..c1daabfd0a 100644 --- a/src/input-format.md +++ b/src/input-format.md @@ -42,21 +42,33 @@ r[input.shebang] ## Shebang removal r[input.shebang.intro] -If the remaining sequence begins with the characters `#!`, the characters up to and including the first `U+000A` (LF) are removed from the sequence. +A *shebang* is an optional line that is typically used in Unix-like systems to specify an interpreter for executing the file. -For example, the first line of the following file would be ignored: +> [!EXAMPLE] +> +> ```rust,ignore +> #!/usr/bin/env rustx +> +> fn main() { +> println!("Hello!"); +> } +> ``` - -```rust,ignore -#!/usr/bin/env rustx +r[input.shebang.syntax] -fn main() { - println!("Hello!"); -} +```grammar,lexer +@root SHEBANG -> + `#!` !((WHITESPACE | LINE_COMMENT | BLOCK_COMMENT)* `[`) + ~LF* (LF | EOF) ``` -r[input.shebang.inner-attribute] -As an exception, if the `#!` characters are followed (ignoring intervening [comments] or [whitespace]) by a `[` token, nothing is removed. This prevents an [inner attribute] at the start of a source file being removed. +The shebang starts with the characters `#!`. However, if these characters are followed by `[` (ignoring any intervening [comments] or [whitespace]), the line is not considered a shebang to avoid ambiguity with an [inner attribute]. The shebang continues to and including the first `U+000A` (LF), or to EOF if there is no line ending. + +r[input.shebang.position] +The shebang may appear immediately at the start of the file or after the optional [byte order mark]. + +r[input.shebang.removal] +The shebang is removed from the input sequence and is ignored. r[input.tokenization] ## Tokenization diff --git a/src/whitespace.md b/src/whitespace.md index 236680f74d..7e16c51d41 100644 --- a/src/whitespace.md +++ b/src/whitespace.md @@ -3,7 +3,7 @@ r[lex.whitespace] r[whitespace.syntax] ```grammar,lexer -@root WHITESPACE -> +WHITESPACE -> U+0009 // Horizontal tab, `'\t'` | U+000A // Line feed, `'\n'` | U+000B // Vertical tab