Conversation
patrick91
commented
Feb 9, 2026
- Rename recap to shortlist
- Recap screen
- Add similar talks POC
- WIP
- Order things
- Add color-coded similarity score badges
- Defer similar talks & topic clusters to button-triggered load
Orange for 60-79% similarity, red for 80%+ to highlight potentially overlapping talks at a glance.
Move ML computation (similar talks, topic clusters) behind a "Compute Analysis" button to avoid slow page loads on cache miss. Adds a JSON endpoint that returns both datasets, rendered client-side.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
This PR adds a recap screen with ML-based similar talks detection and topic clustering for accepted submissions. It also renames "recap" to "shortlist" throughout the adapter interface. Issues FoundArchitecture & Designadmin.py:369-379 - Missing tenant scoping in query (CRITICAL SECURITY ISSUE) def _get_accepted_submissions(self, conference):
return (
Submission.objects.filter(conference=conference)
.filter(...)This query in Recommendation: The similar_talks.py:250 - Insecure hash function for cache keys content_hash = hashlib.md5()Using MD5 for cache key generation. While this isn't directly exploitable for cache poisoning since it's only used for cache keys (not cryptographic purposes), it's a code smell that could become a vulnerability if this pattern is copied elsewhere. Recommendation: Use similar_talks.py - Heavy ML dependencies without resource limits
Recommendations:
Testing & Coveragetest_similar_talks.py - Insufficient edge case coverage
test_recap.py - Missing negative permission tests
Missing integration tests
Error Handlingadmin.py:442-492 - No error handling for ML computation failures def review_recap_compute_analysis_view(self, request, review_session_id):
# ... permission checks ...
similar_talks = compute_similar_talks(...) # Can fail
topic_clusters = compute_topic_clusters(...) # Can fail
return JsonResponse({...})If either ML function raises an exception (model loading fails, encoding fails, clustering fails), the user gets a 500 error with no helpful message. The functions can fail for many reasons:
Recommendation: Wrap ML calls in try-except and return user-friendly error messages at backend/reviews/admin.py:455-467. similar_talks.py:186-187 - Silent failure on NLTK download except LookupError:
logger.info("Downloading NLTK stopwords...")
nltk.download("stopwords", quiet=True)If the download fails (no internet, disk full, permission denied), the function continues silently and the subsequent Recommendation: Add error handling for failed NLTK downloads at backend/reviews/similar_talks.py:186-188. Performanceadmin.py:399 - N+1 query potential accepted_submissions = self._get_accepted_submissions(conference)
# Later in template context:
for s in accepted_submissionsWhile the query uses Fixed on line 377: The query includes similar_talks.py:289 - Entire dataset loaded into memory texts = [get_embedding_text(s) for s in submissions_list]
embeddings = model.encode(texts)For a conference with 500+ submissions, this loads all text and all embeddings into memory at once. The Recommendation: Add batch processing for conferences with >200 submissions at backend/reviews/similar_talks.py:289-290. similar_talks.py:292 - O(n²) cosine similarity computation similarity_matrix = cosine_similarity(embeddings)This computes similarity for every pair of submissions. For 500 submissions, that's 250,000 comparisons. While Recommendation: If memory becomes an issue, consider computing similarities on-demand or using approximate nearest neighbor search (e.g., FAISS). Other IssuesMissing index on UserReview.review_session_id similar_talks.py:17 - Hardcoded cache timeout CACHE_TIMEOUT = 60 * 60 * 24 # 24 hoursThis should be configurable via Django settings rather than hardcoded. Inconsistent naming: "recap" vs "shortlist"
Recommendation: Add documentation clarifying the three-stage flow and what each screen is for. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4567 +/- ##
==========================================
- Coverage 92.70% 92.50% -0.20%
==========================================
Files 354 355 +1
Lines 10510 10658 +148
Branches 780 812 +32
==========================================
+ Hits 9743 9859 +116
- Misses 665 687 +22
- Partials 102 112 +10 🚀 New features to boost your workflow:
|
- Add force_recompute param to compute_similar_talks/compute_topic_clusters - Add ?recompute=1 query param support to compute-analysis endpoint - Show "Recompute (ignore cache)" button after initial results load - Replace globals with functools.cache, simplify _get_submission_languages - Remove unused language mappings, keep only English and Italian
3ed6c11 to
3ff0cf7
Compare