Implementation Priority List for python-proxy-headers Extensions

Based on library analysis considering GitHub stars, activity level, technical feasibility, and user impact.

Priority Rankings

🔴 Priority 1: HIGH - Implement First

1. pycurl Extension

File: python_proxy_headers/pycurl_proxy.py

Metric	Value
GitHub Stars	1,146
Last Active	2026-01-30
Feasibility	✅ HIGH
Impact	HIGH - Direct libcurl access

Why High Priority:

libcurl already supports CURLOPT_PROXYHEADER for sending custom headers
Can capture CONNECT response headers via CURLOPT_HEADERFUNCTION
Foundation for curl_cffi work

Implementation Plan:

# pycurl_proxy.py - Proposed API

class ProxyCurl:
    """PycURL wrapper with proxy header support."""
    
    def __init__(self, proxy_headers=None):
        self.proxy_headers = proxy_headers or {}
        self._response_proxy_headers = {}
    
    def get(self, url, proxy=None) -> ProxyResponse:
        """Make GET request with proxy header support."""
        pass
    
    @property
    def received_proxy_headers(self) -> dict:
        """Headers received from proxy during CONNECT."""
        return self._response_proxy_headers

def request(method, url, proxy=None, proxy_headers=None) -> ProxyResponse:
    """Convenience function for one-off requests."""
    pass

Technical Approach:

Use pycurl.PROXYHEADER option to send custom headers
Use HEADERFUNCTION callback to capture CONNECT response headers
Parse headers to separate proxy headers from origin headers
Expose via clean API matching existing python-proxy-headers style

2. curl_cffi Extension

File: python_proxy_headers/curl_cffi_proxy.py

Metric	Value
GitHub Stars	4,873
Last Active	2026-01-30
Feasibility	⚠️ MEDIUM-HIGH
Impact	VERY HIGH - Popular anti-bot library

Why High Priority:

Very popular for bypassing bot detection
Uses libcurl which has proxy header capabilities
Active development means potential upstream contributions

Implementation Plan:

# curl_cffi_proxy.py - Proposed API

from curl_cffi import Session

class ProxySession(Session):
    """curl_cffi Session with proxy header support."""
    
    def __init__(self, proxy_headers=None, **kwargs):
        super().__init__(**kwargs)
        self._proxy_headers = proxy_headers or {}
        self._last_proxy_response_headers = {}
    
    def request(self, method, url, **kwargs) -> ProxyResponse:
        """Make request capturing proxy headers."""
        pass
    
    @property
    def proxy_response_headers(self) -> dict:
        """Headers from last proxy CONNECT response."""
        return self._last_proxy_response_headers

# Convenience functions
def get(url, proxy=None, proxy_headers=None, impersonate=None, **kwargs):
    pass

Technical Approach:

Investigate if curl_cffi exposes low-level curl options
If yes: Use CURLOPT_PROXYHEADER directly
If no: Create PR to curl_cffi to expose these options
May need to work with curl_cffi maintainers

Upstream Contribution Opportunity:

File issue requesting proxy_headers parameter
Contribute PR if welcomed

3. cloudscraper Extension

File: python_proxy_headers/cloudscraper_proxy.py

Metric	Value
GitHub Stars	6,060
Last Active	2025-06-10
Feasibility	✅ HIGH
Impact	HIGH - Popular for Cloudflare bypass

Why High Priority:

Built on requests - can use our existing adapter
Popular for accessing protected sites
Easy integration

Implementation Plan:

# cloudscraper_proxy.py - Proposed API

import cloudscraper
from .requests_adapter import HTTPProxyHeaderAdapter

class ProxyCloudScraper(cloudscraper.CloudScraper):
    """CloudScraper with proxy header support."""
    
    def __init__(self, proxy_headers=None, **kwargs):
        super().__init__(**kwargs)
        adapter = HTTPProxyHeaderAdapter(proxy_headers=proxy_headers)
        self.mount('https://', adapter)
        self.mount('http://', adapter)

def create_scraper(proxy_headers=None, **kwargs):
    """Create a CloudScraper with proxy header support."""
    return ProxyCloudScraper(proxy_headers=proxy_headers, **kwargs)

Technical Approach:

Subclass cloudscraper.CloudScraper
Mount our HTTPProxyHeaderAdapter
Preserve all cloudscraper functionality
Simple integration - likely <50 lines of code

🟡 Priority 2: MEDIUM - Implement Second

4. autoscraper Extension

File: python_proxy_headers/autoscraper_proxy.py

Metric	Value
GitHub Stars	7,082
Last Active	2025-06-09
Feasibility	✅ HIGH
Impact	MEDIUM - Niche use case

Implementation Plan:

# autoscraper_proxy.py

from autoscraper import AutoScraper
from .requests_adapter import ProxySession

class ProxyAutoScraper(AutoScraper):
    """AutoScraper with proxy header support."""
    
    def __init__(self, proxy_headers=None):
        super().__init__()
        self._proxy_session = ProxySession(proxy_headers=proxy_headers)
    
    def build(self, url, wanted_list, proxy_headers=None, **kwargs):
        """Build scraper with proxy header support."""
        # Use our ProxySession for requests
        pass

5. treq Extension

File: python_proxy_headers/treq_proxy.py

Metric	Value
GitHub Stars	606
Last Active	2026-01-03
Feasibility	⚠️ MEDIUM
Impact	MEDIUM - Twisted ecosystem

Implementation Plan:

# treq_proxy.py

from twisted.web.client import Agent, ProxyAgent
from twisted.internet import reactor

class ProxyHeaderAgent(ProxyAgent):
    """Twisted Agent with proxy header support."""
    
    def __init__(self, proxy_headers=None, **kwargs):
        super().__init__(**kwargs)
        self._proxy_headers = proxy_headers or {}
    
    # Override connection methods to inject headers

Technical Approach:

Subclass ProxyAgent
Override _connect method to add custom headers
Capture CONNECT response headers
More complex due to Twisted's async nature

6. crawlee-python Extension

File: python_proxy_headers/crawlee_proxy.py

Metric	Value
GitHub Stars	7,968
Last Active	2026-01-30
Feasibility	⚠️ MEDIUM
Impact	MEDIUM - Only HTTP crawler portion

Implementation Plan:

# crawlee_proxy.py

from crawlee.crawlers import BeautifulSoupCrawler
from .httpx_proxy import HTTPProxyTransport

class ProxyBeautifulSoupCrawler(BeautifulSoupCrawler):
    """Crawler with proxy header support for HTTP requests."""
    
    def __init__(self, proxy_headers=None, **kwargs):
        # Configure httpx client with our transport
        pass

Note: Only applies to BeautifulSoupCrawler, not PlaywrightCrawler

7. requestium Extension

File: python_proxy_headers/requestium_proxy.py

Metric	Value
GitHub Stars	1,838
Last Active	2026-01-26
Feasibility	⚠️ MEDIUM
Impact	LOW - Requests portion only

Implementation Plan:

# requestium_proxy.py

from requestium import Session
from .requests_adapter import HTTPProxyHeaderAdapter

class ProxySession(Session):
    """Requestium Session with proxy header support."""
    
    def __init__(self, proxy_headers=None, **kwargs):
        super().__init__(**kwargs)
        adapter = HTTPProxyHeaderAdapter(proxy_headers=proxy_headers)
        self.mount('https://', adapter)
        self.mount('http://', adapter)

8. botasaurus Extension

File: python_proxy_headers/botasaurus_proxy.py

Metric	Value
GitHub Stars	3,808
Last Active	2026-01-10
Feasibility	⚠️ MEDIUM
Impact	LOW - Request decorator only

Implementation Plan:

Investigate botasaurus's request module internals
May require monkey-patching or upstream PR

🟢 Priority 3: LOW - Browser-Based (Not Recommended)

These libraries use browser automation where proxy handling is delegated to the browser engine. Custom proxy header support is not feasible without browser extensions or significant browser-level modifications.

Library	Stars	Reason for Low Priority
crawl4ai	59,235	Browser-based (Playwright)
Scrapegraph-ai	22,434	Browser-based (Playwright)
playwright-python	14,209	Browser handles proxy
SeleniumBase	12,139	Browser handles proxy
Selenium	N/A	Browser handles proxy
splash	4,198	Qt WebKit-based

Recommendation: Do not implement extensions for these libraries. Instead, document that proxy header support is not possible due to browser architecture limitations.

Implementation Roadmap

Phase 1: Foundation (Weeks 1-2)

✅ pycurl extension
✅ cloudscraper extension (quick win)

Phase 2: High-Impact (Weeks 3-4)

curl_cffi extension (may require upstream work)
autoscraper extension

Phase 3: Ecosystem (Weeks 5-6)

treq extension
crawlee-python extension
requestium extension

Phase 4: Optional

botasaurus extension (if feasible)

File Structure

python_proxy_headers/
├── __init__.py
├── urllib3_proxy_manager.py     # Existing
├── requests_adapter.py          # Existing
├── httpx_proxy.py               # Existing
├── aiohttp_proxy.py             # Existing
├── pycurl_proxy.py              # NEW - Priority 1
├── curl_cffi_proxy.py           # NEW - Priority 1
├── cloudscraper_proxy.py        # NEW - Priority 1
├── autoscraper_proxy.py         # NEW - Priority 2
├── treq_proxy.py                # NEW - Priority 2
├── crawlee_proxy.py             # NEW - Priority 2
├── requestium_proxy.py          # NEW - Priority 2
└── botasaurus_proxy.py          # NEW - Priority 2

Documentation Updates

For each new extension, add:

RST doc file in docs/
Entry in docs/index.rst
Usage example in README.md
Example code in proxy-examples repo

Created: January 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation Priority List for python-proxy-headers Extensions

Priority Rankings

🔴 Priority 1: HIGH - Implement First

1. pycurl Extension

2. curl_cffi Extension

3. cloudscraper Extension

🟡 Priority 2: MEDIUM - Implement Second

4. autoscraper Extension

5. treq Extension

6. crawlee-python Extension

7. requestium Extension

8. botasaurus Extension

🟢 Priority 3: LOW - Browser-Based (Not Recommended)

Implementation Roadmap

Phase 1: Foundation (Weeks 1-2)

Phase 2: High-Impact (Weeks 3-4)

Phase 3: Ecosystem (Weeks 5-6)

Phase 4: Optional

File Structure

Documentation Updates

FilesExpand file tree

IMPLEMENTATION_PRIORITY.md

Latest commit

History

IMPLEMENTATION_PRIORITY.md

File metadata and controls

Implementation Priority List for python-proxy-headers Extensions

Priority Rankings

🔴 Priority 1: HIGH - Implement First

1. pycurl Extension

2. curl_cffi Extension

3. cloudscraper Extension

🟡 Priority 2: MEDIUM - Implement Second

4. autoscraper Extension

5. treq Extension

6. crawlee-python Extension

7. requestium Extension

8. botasaurus Extension

🟢 Priority 3: LOW - Browser-Based (Not Recommended)

Implementation Roadmap

Phase 1: Foundation (Weeks 1-2)

Phase 2: High-Impact (Weeks 3-4)

Phase 3: Ecosystem (Weeks 5-6)

Phase 4: Optional

File Structure

Documentation Updates