Skip to content

gh-144759: Fix undefined behavior from NULL pointer arithmetic in lexer#144788

Open
raminfp wants to merge 3 commits intopython:mainfrom
raminfp:fix-lexer-ub-null-pointer-arithmetic
Open

gh-144759: Fix undefined behavior from NULL pointer arithmetic in lexer#144788
raminfp wants to merge 3 commits intopython:mainfrom
raminfp:fix-lexer-ub-null-pointer-arithmetic

Conversation

@raminfp
Copy link

@raminfp raminfp commented Feb 13, 2026

Fix undefined behavior in _PyLexer_remember_fstring_buffers and _PyLexer_restore_fstring_buffers caused by performing pointer arithmetic on NULL pointers (NULL - tok->buf).

When tok_mode_stack[0] is initialized, the start and multi_line_start fields are not explicitly set and remain NULL (from PyMem_Calloc). Later, when the lexer buffer is reallocated, the remember/restore functions perform NULL - valid_pointer and valid_pointer + negative_offset, both of which are undefined behavior in C.

The fix adds explicit NULL checks: store -1 as a sentinel offset when the pointer is NULL, and restore NULL when the offset is negative.

Detected with --with-undefined-behavior-sanitizer:

Parser/lexer/buffer.c:30:32: runtime error: pointer index expression with base 0x50300007f130 overflowed to 0xfffffddfffc38020
Parser/lexer/buffer.c:31:43: runtime error: pointer index expression with base 0x50300007f130 overflowed to 0xfffffddfffc38020

Fixes #144759

@python-cla-bot
Copy link

python-cla-bot bot commented Feb 13, 2026

All commit authors signed the Contributor License Agreement.

CLA signed

…in lexer

Guard against NULL pointer arithmetic in `_PyLexer_remember_fstring_buffers`
and `_PyLexer_restore_fstring_buffers`. When `start` or `multi_line_start`
are NULL (uninitialized in tok_mode_stack[0]), performing `NULL - tok->buf`
is undefined behavior. Add explicit NULL checks to store -1 as sentinel
and restore NULL accordingly.
@raminfp raminfp force-pushed the fix-lexer-ub-null-pointer-arithmetic branch from 588d391 to 0b18bc0 Compare February 13, 2026 16:10
…tions

Replace :c:func: references with double-backtick markup since these
are internal functions without documentation entries.
@eendebakpt
Copy link
Contributor

@raminfp Could you add a regression test? I suspect Lib/test/test_repl.py is the right location, but I am not sure.

And please avoid force pushes to the PR so we preserve history.

…exer

Add test_lexer_buffer_realloc_with_null_start to test_repl.py that
exercises the code path where the lexer buffer is reallocated while
tok_mode_stack[0] has NULL start/multi_line_start pointers. This
triggers _PyLexer_remember_fstring_buffers and verifies the NULL
checks prevent undefined behavior.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Uninitialized start and multi_line_start Causing Undefined Behavior - Pointer overflow

2 participants