Move ML recap analysis to heavy-processing Celery worker by patrick91 · Pull Request #4571 · pythonitalia/pycon

patrick91 · 2026-02-09T14:44:16Z

Summary

Offload compute_similar_talks and compute_topic_clusters to the heavy_processing Celery queue to prevent OOM in Gunicorn web workers
Admin view now checks cache first, dispatches a Celery task on miss, and returns {"status": "processing"} for the frontend to poll
Frontend JS polls every 3s (up to ~2 min timeout) until results are ready
Follows the same pattern as video_uploads/admin.py (apply_async(queue="heavy_processing") + check_pending_heavy_processing_work.delay())

Test plan

All 11 recap tests pass (reviews/tests/test_recap.py)
Verify cache-hit path returns results immediately
Verify cache-miss dispatches task and returns {"status": "processing"}
Verify recompute flag bypasses cache and dispatches task
Verify frontend polling renders results when task completes

Offload compute_similar_talks and compute_topic_clusters to the heavy_processing Celery queue to prevent OOM in Gunicorn web workers. The admin view now checks cache first, dispatches a Celery task on miss, and returns {"status": "processing"} for the frontend to poll.

vercel · 2026-02-09T14:44:27Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
pycon	Ready	Preview	Feb 9, 2026 6:36pm

claude · 2026-02-09T14:45:36Z

Summary

This PR offloads ML-based recap analysis (compute_similar_talks and compute_topic_clusters) to a Celery heavy-processing queue to prevent OOM in Gunicorn web workers. The admin view now checks cache first, dispatches a Celery task on miss, and returns {"status": "processing"} for the frontend to poll. Comprehensive test coverage (11 tests) validates cache hits, task dispatch, lock management, and error handling.

Issues Found

Security: MD5 Usage for Cache Keys

backend/reviews/cache_keys.py:18

Using MD5 for cache key generation is not a security issue here since it is only for cache key generation (not cryptographic purposes), but it is worth noting that MD5 is flagged by security scanners. Consider using hashlib.sha256() instead for better compatibility with security policies.

Potential Race Condition in Lock Management

backend/reviews/admin.py:478-488

There is a subtle race condition between checking the stale lock and acquiring it. Both threads could end up dispatching tasks after one deletes a stale lock. This is a minor issue since the lock timeout (300s) and polling mechanism will eventually resolve it, but consider using a compare-and-delete operation if your cache backend supports it.

Error Handling: Missing Database Connectivity Check

backend/reviews/tasks.py:22-29

The task catches Conference.DoesNotExist, but if the database is temporarily unavailable, it will raise an unhandled exception. Consider adding a broader exception handler to handle database connectivity issues gracefully.

Architecture: Cache Key Calculation in Request Handler

backend/reviews/admin.py:456

The view converts the entire queryset to a list just to compute the cache key. For conferences with many submissions, this loads all submission data into memory in the web worker, which contradicts the goal of keeping web workers lightweight. Consider computing a simpler cache key based only on conference ID and submission count/last-modified timestamp.

Testing: Missing Test for Duplicate Simultaneous Requests

The test suite covers stale locks and active locks, but does not test what happens when multiple simultaneous requests arrive at exactly the same time before any lock is set. Consider adding a concurrency test that simulates this scenario.

codecov · 2026-02-09T14:49:26Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.52%. Comparing base (4883f6c) to head (77c48b7).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4571      +/-   ##
==========================================
+ Coverage   92.50%   92.52%   +0.02%     
==========================================
  Files         355      357       +2     
  Lines       10658    10690      +32     
  Branches      812      812              
==========================================
+ Hits         9859     9891      +32     
  Misses        687      687              
  Partials      112      112

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…ential backoff - Extract get_accepted_submissions() to remove duplicated query between admin.py and tasks.py - Add try/except in Celery task that caches error state so the frontend can show the actual failure instead of a timeout - Frontend handles {"status": "error"} responses from cached error state - Replace fixed 3s polling with exponential backoff (1s, 2s, 3s, 5s...)

…ndling, tests - Add cache.add lock to prevent duplicate task dispatch on concurrent requests - Pass combined_cache_key from view to task to avoid key mismatch from race conditions between dispatch and execution - Handle Conference.DoesNotExist in task for deleted conferences - Clean up computing lock in finally block - Align frontend poll timeout (3min) with error cache TTL (2min) - Add integration tests: task cache population, error caching, missing conference - Add stampede prevention test (cache.add returns False)

…ation - Move get_cache_key and get_embedding_text to reviews/cache_keys.py (lightweight module with no ML imports) so admin view can import without triggering torch/sentence-transformers loading - Skip individual function caching in task (conference_id=None) since the combined result is cached; avoids triple cache storage - Remove _get_cache_key mock from tests — real hash computation now exercises the cache key generation logic

…lignment - Store Celery task ID in computing lock and check AsyncResult state to detect stale locks from crashed workers before dispatching duplicates - Clean up computing lock on Conference.DoesNotExist early return - Increase frontend poll timeout to 360s to exceed backend 300s lock TTL - Extend polling backoff intervals to 15s for large conferences - Add .order_by("id") to get_accepted_submissions for consistent ordering - Extract RESULT_CACHE_TTL / ERROR_CACHE_TTL constants - Add tests for stale lock detection, active lock preservation, and DoesNotExist lock cleanup

vercel bot deployed to Preview February 9, 2026 14:46 View deployment

vercel bot deployed to Preview February 9, 2026 15:06 View deployment

vercel bot deployed to Preview February 9, 2026 15:36 View deployment

Reduce computing lock TTL from 600s to 300s for auto-expiry safety

5a2fab1

vercel bot deployed to Preview February 9, 2026 16:19 View deployment

vercel bot deployed to Preview February 9, 2026 16:27 View deployment

vercel bot deployed to Preview February 9, 2026 18:36 View deployment

patrick91 merged commit bed7f4c into main Feb 9, 2026
8 checks passed

patrick91 deleted the feat/celery-recap-analysis branch February 9, 2026 19:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move ML recap analysis to heavy-processing Celery worker#4571

Move ML recap analysis to heavy-processing Celery worker#4571
patrick91 merged 6 commits intomainfrom
feat/celery-recap-analysis

patrick91 commented Feb 9, 2026

Uh oh!

vercel bot commented Feb 9, 2026 •

edited

Loading

Uh oh!

claude bot commented Feb 9, 2026 •

edited

Loading

Uh oh!

codecov bot commented Feb 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

patrick91 commented Feb 9, 2026

Summary

Test plan

Uh oh!

vercel bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Issues Found

Security: MD5 Usage for Cache Keys

Potential Race Condition in Lock Management

Error Handling: Missing Database Connectivity Check

Architecture: Cache Key Calculation in Request Handler

Testing: Missing Test for Duplicate Simultaneous Requests

Uh oh!

codecov bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

vercel bot commented Feb 9, 2026 •

edited

Loading

claude bot commented Feb 9, 2026 •

edited

Loading

codecov bot commented Feb 9, 2026 •

edited

Loading