Skip to content

Fix Reddit scrape pipeline and Decodo search reliability#9

Open
paulius-krutkis-dcd wants to merge 1 commit into
mainfrom
fix-reddit-scrape-pipeline
Open

Fix Reddit scrape pipeline and Decodo search reliability#9
paulius-krutkis-dcd wants to merge 1 commit into
mainfrom
fix-reddit-scrape-pipeline

Conversation

@paulius-krutkis-dcd

@paulius-krutkis-dcd paulius-krutkis-dcd commented Jun 10, 2026

Copy link
Copy Markdown

Summary

  • Fix Decodo Reddit search by using universal + headless: html for .json URLs (fixes 613/block-page failures)
  • Scrape via search queries only, with topic relevance filtering so results match the product/topic
  • Update default Claude model to claude-sonnet-4-6 and improve LLM/Decodo error messages
  • Mongo Docker workaround for macOS/OrbStack

Test plan

  • bun run dev with valid DECODO_BASIC_AUTH_TOKEN and ANTHROPIC_API_KEY
  • Run analysis for a product prompt (e.g. Firecrawl) — expect on-topic posts, not random hot-feed drama
  • Confirm scrape no longer fails with "scraping failed for all targets"

… and error handling

- Added support for headless mode in DecodoService to handle Reddit's JSON API more effectively.
- Updated error handling to throw ServiceUnavailableException for failed Decodo responses.
- Modified TrackerService to prioritize search results and deduplicate posts based on topic relevance.
- Adjusted scraping methods to use the universal target with headless mode for both search and subreddit scraping.
- Updated tests to reflect changes in service behavior and ensure proper error handling.
- Revised documentation to clarify the new scraping architecture and methods.
@paulius-krutkis-dcd paulius-krutkis-dcd changed the title Enhance DecodoService and TrackerService for improved Reddit scraping… Fix Reddit scrape pipeline and Decodo search reliability Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants