Zyte

Web data extraction platform with anti-ban technology, browser automation, and Scrapy Cloud hosting.

GitHub Education Access

Claim your free access at GitHub Education Pack - look for Zyte.

Dashboard

Main Dashboard: https://app.zyte.com
API Access: https://app.zyte.com/o/account/api-access

Features

Zyte API: HTTP requests without bans
Browser Automation: Headless browser control
AI Parsing: Automatic data extraction
Scrapy Cloud: Hosted spider deployment
Web Scraping Copilot: Code generation

Quick Setup

Zyte API (HTTP)

pip install zyte-api

from zyte_api import ZyteAPI

client = ZyteAPI(api_key=os.environ.get("ZYTE_API_KEY"))

response = client.get({
    "url": "https://example.com",
    "browserHtml": True
})

Scrapy with Zyte

pip install scrapy scrapy-zyte-api

# settings.py
DOWNLOAD_HANDLERS = {
    "http": "scrapy_zyte_api.ScrapyZyteAPIDownloadHandler",
    "https": "scrapy_zyte_api.ScrapyZyteAPIDownloadHandler",
}
ZYTE_API_KEY = os.environ.get("ZYTE_API_KEY")

Scrapy Cloud Deployment

pip install shub

# Login
shub login

# Deploy spider
shub deploy

Configuration

scrapy.cfg

[settings]
default = myproject.settings

[deploy]
project = YOUR_PROJECT_ID

Environment Variables

ZYTE_API_KEY=your_api_key
# Get from: https://app.zyte.com/o/account/api-access

Zyte API Features

Browser Rendering

response = client.get({
    "url": "https://example.com",
    "browserHtml": True,
    "javascript": True,
    "screenshot": True
})

AI Data Extraction

response = client.get({
    "url": "https://example.com/product",
    "product": True  # Auto-extract product data
})

product = response["product"]
print(product["name"], product["price"])

Geolocation

response = client.get({
    "url": "https://example.com",
    "geolocation": "US"  # Request from US IP
})

Scrapy Spider Example

import scrapy
from scrapy_zyte_api import ZyteAPISpider

class ProductSpider(ZyteAPISpider):
    name = "products"
    start_urls = ["https://example.com/products"]

    def parse(self, response):
        for product in response.css(".product"):
            yield {
                "name": product.css(".title::text").get(),
                "price": product.css(".price::text").get(),
            }

        next_page = response.css("a.next::attr(href)").get()
        if next_page:
            yield response.follow(next_page, self.parse)

Best Practices

Use browser rendering sparingly: More expensive than HTTP
Set appropriate delays: Respect rate limits
Handle errors gracefully: Implement retry logic
Use geolocation: When content varies by region
Schedule in Scrapy Cloud: For recurring scrapes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zyte

GitHub Education Access

Dashboard

Features

Quick Setup

Zyte API (HTTP)

Scrapy with Zyte

Scrapy Cloud Deployment

Configuration

scrapy.cfg

Environment Variables

Zyte API Features

Browser Rendering

AI Data Extraction

Geolocation

Scrapy Spider Example

Best Practices

Resources

FilesExpand file tree

zyte.md

Latest commit

History

zyte.md

File metadata and controls

Zyte

GitHub Education Access

Dashboard

Features

Quick Setup

Zyte API (HTTP)

Scrapy with Zyte

Scrapy Cloud Deployment

Configuration

scrapy.cfg

Environment Variables

Zyte API Features

Browser Rendering

AI Data Extraction

Geolocation

Scrapy Spider Example

Best Practices

Resources