Skip to content

make relative-date regex digit quantifiers possessive#1335

Merged
serhii73 merged 3 commits into
scrapinghub:masterfrom
alhudz:relative-regex-possessive-digits
Jun 9, 2026
Merged

make relative-date regex digit quantifiers possessive#1335
serhii73 merged 3 commits into
scrapinghub:masterfrom
alhudz:relative-regex-possessive-digits

Conversation

@alhudz

@alhudz alhudz commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

The relative-date matchers are built from \d+[.,]?\d* and run across the whole input: PATTERN in freshness_date_parser via findall/sub, and the split_relative regex in dictionary.py via split. On a long run of digits the two adjacent digit quantifiers backtrack without ever changing the match, so cost grows superlinearly.

Repro: dateparser.parse('9' * 3200), or any long digit run reaching the relative parser.
Cause: the expression is matched at every position; splitting \d+ against \d* over a digit run backtracks quadratically in split, near-cubically for the freshness findall.
Fix: make the digit quantifiers possessive (\d++/\d*+). A digit run can only be followed by [.,] or the literal unit text, so giving digits back can never complete a match.

PATTERN.findall('9' * 3200) goes from ~23s to ~0.02s; match results are unchanged and the existing suite passes.

@AdrianAtZyte AdrianAtZyte mentioned this pull request Jun 8, 2026
@codecov

codecov Bot commented Jun 8, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.11%. Comparing base (081d251) to head (3529261).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1335      +/-   ##
==========================================
- Coverage   97.11%   97.11%   -0.01%     
==========================================
  Files         235      235              
  Lines        2912     2909       -3     
==========================================
- Hits         2828     2825       -3     
  Misses         84       84              

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@AdrianAtZyte AdrianAtZyte requested a review from serhii73 June 8, 2026 10:33
@serhii73

serhii73 commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Thanks!

@serhii73 serhii73 merged commit 98b9c32 into scrapinghub:master Jun 9, 2026
15 checks passed
@serhii73 serhii73 mentioned this pull request Jun 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants