Web data extraction platform with anti-ban technology, browser automation, and Scrapy Cloud hosting.
Claim your free access at GitHub Education Pack - look for Zyte.
- Main Dashboard: https://app.zyte.com
- API Access: https://app.zyte.com/o/account/api-access
- Zyte API: HTTP requests without bans
- Browser Automation: Headless browser control
- AI Parsing: Automatic data extraction
- Scrapy Cloud: Hosted spider deployment
- Web Scraping Copilot: Code generation
pip install zyte-apifrom zyte_api import ZyteAPI
client = ZyteAPI(api_key=os.environ.get("ZYTE_API_KEY"))
response = client.get({
"url": "https://example.com",
"browserHtml": True
})pip install scrapy scrapy-zyte-api# settings.py
DOWNLOAD_HANDLERS = {
"http": "scrapy_zyte_api.ScrapyZyteAPIDownloadHandler",
"https": "scrapy_zyte_api.ScrapyZyteAPIDownloadHandler",
}
ZYTE_API_KEY = os.environ.get("ZYTE_API_KEY")pip install shub
# Login
shub login
# Deploy spider
shub deploy[settings]
default = myproject.settings
[deploy]
project = YOUR_PROJECT_IDZYTE_API_KEY=your_api_key
# Get from: https://app.zyte.com/o/account/api-accessresponse = client.get({
"url": "https://example.com",
"browserHtml": True,
"javascript": True,
"screenshot": True
})response = client.get({
"url": "https://example.com/product",
"product": True # Auto-extract product data
})
product = response["product"]
print(product["name"], product["price"])response = client.get({
"url": "https://example.com",
"geolocation": "US" # Request from US IP
})import scrapy
from scrapy_zyte_api import ZyteAPISpider
class ProductSpider(ZyteAPISpider):
name = "products"
start_urls = ["https://example.com/products"]
def parse(self, response):
for product in response.css(".product"):
yield {
"name": product.css(".title::text").get(),
"price": product.css(".price::text").get(),
}
next_page = response.css("a.next::attr(href)").get()
if next_page:
yield response.follow(next_page, self.parse)- Use browser rendering sparingly: More expensive than HTTP
- Set appropriate delays: Respect rate limits
- Handle errors gracefully: Implement retry logic
- Use geolocation: When content varies by region
- Schedule in Scrapy Cloud: For recurring scrapes