max_crawl_limit behavior different than javascript version

I have two crawlers, one on typescript and one on python. 

In the python version, I have noticed that the crawl limit is only reached if request_finished >= max_requests_per_crawl. For example, I have set a maxRequestsPerCrawl of 1000, and I finished a crawl with the stats:
```
[crawlee.crawlers._playwright._playwright_crawler] INFO  Final request statistics:
┌───────────────────────────────┬───────────────────┐
│ requests_finished             │ 310               │
│ requests_failed               │ 6030              │
│ retry_histogram               │ [312, 0, 0, 6028] │
│ request_avg_failed_duration   │ 231.3ms           │
│ request_avg_finished_duration │ 2.01s             │
│ requests_finished_per_minute  │ 32                │
│ requests_failed_per_minute    │ 625               │
│ request_total_duration        │ 33min 39.3s       │
│ requests_total                │ 6340              │
│ crawler_runtime               │ 9min 38.6s        │
└───────────────────────────────┴───────────────────┘
```

However, on the javascript one, the crawl limit is triggered if the request_finished + request_failed >= maxRequestsPerCrawl
```
INFO  SessionAwareCrawler: Crawler reached the maxRequestsPerCrawl limit of 1000 requests and will shut down soon. Requests that are in progress will be allowed to finish.
INFO  SessionAwareCrawler: Earlier, the crawler reached the maxRequestsPerCrawl limit of 1000 requests and all requests that were in progress at that time have now finished. In total, the crawler processed 1000 requests and will shut down.
INFO  SessionAwareCrawler: Final request statistics: {"requestsFinished":298,"requestsFailed":702,"retryHistogram":[1000],"requestAvgFailedDurationMillis":362,"requestAvgFinishedDurationMillis":1804,"requestsFinishedPerMinute":154,"requestsFailedPerMinute":363,"requestTotalDurationMillis":791397,"requestsTotal":1000,"crawlerRuntimeMillis":115970}
```

Looking in the code base, that seems to be the case? Is this intentional?
Python:
https://github.com/apify/crawlee-python/blob/142e4ef3391ea16da6195b024447dde2d9e59ee5/src/crawlee/crawlers/_basic/_basic_crawler.py#L564-L572 
Javascript:
https://github.com/apify/crawlee/blob/c3a4b3b0d5be63f1f7a779ff43560ab2b426f3bb/packages/basic-crawler/src/internals/basic-crawler.ts#L812 

I noticed the docs are the same verbiage as well:
https://crawlee.dev/js/api/playwright-crawler/interface/PlaywrightCrawlerOptions#maxRequestsPerCrawl
https://crawlee.dev/python/api/class/PlaywrightCrawlerOptions#max_requests_per_crawl

	def _stop_if_max_requests_count_exceeded(self) -> None:
	"""Call `stop` when the maximum number of requests to crawl has been reached."""
	if self._max_requests_per_crawl is None:
	return

	if self._statistics.state.requests_finished >= self._max_requests_per_crawl:
	self.stop(
	reason=f'The crawler has reached its limit of {self._max_requests_per_crawl} requests per crawl. '
	)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

max_crawl_limit behavior different than javascript version #1765

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

max_crawl_limit behavior different than javascript version #1765

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions