A complete guide and working code to scrape Instagram profiles, posts, and hashtags using the Crawlbase Crawling API. Extract public Instagram data at scale without getting blocked.
- Why Scrape Instagram Data?
- Getting Started
- Basic Scraping with Crawlbase
- instagram-post Scraper
- instagram-profile Scraper
- instagram-hashtag Scraper
- Overcoming Anti-Scraping Challenges
- Project Structure
- FAQ
Instagram, with over 2 billion active accounts, is a goldmine of public data. Here's what you can do with it:
- Market Research β Understand audience preferences, behaviors, and trends from profiles, posts, and comments
- Competitor Analysis β Study competitors' content strategies, post frequency, and engagement
- Influencer Marketing β Evaluate influencer profiles for engagement rates and audience relevance before hiring
- Content Strategy β Discover what content performs best in your niche
- Social Media Analytics β Track follower growth, post reach, and engagement over time
- Lead Generation β Identify ideal customers based on interests and activity
- Trend Analysis β Monitor viral content and emerging hashtags
- Academic Research β Gather social data for research and experiments
- Python 3.7+
- A free Crawlbase account β Sign up here (first 1,000 requests free, no credit card needed)
pip install crawlbaseAfter signing up, get your token from the Crawlbase dashboard.
touch instagram_scraper.pyThe simplest usage β fetch the raw HTML of any Instagram page:
from crawlbase import CrawlingAPI
# Set your Crawlbase token
crawlbase_token = 'YOUR_CRAWLBASE_TOKEN'
# URL of the Instagram page to scrape
instagram_page_url = 'https://www.instagram.com/apple/'
# Create a Crawlbase API instance with your token
api = CrawlingAPI({'token': crawlbase_token})
try:
# Send a GET request to crawl the URL
response = api.get(instagram_page_url)
# Check if the response status code is 200 (OK)
if 'status_code' in response:
if response['status_code'] == 200:
# Print the response body
print(response['body'])
else:
print(f"Request failed with status code: {response['status_code']}")
else:
print("Response does not contain a status code.")
except Exception as e:
print(f"An error occurred: {str(e)}")This returns the raw HTML of the Instagram page. For structured JSON data, use the dedicated scrapers below.
Extract structured data from any Instagram post β likes, comments, captions, media, tags, and more.
from crawlbase import CrawlingAPI
crawlbase_token = 'YOUR_CRAWLBASE_TOKEN'
instagram_post_url = 'https://www.instagram.com/p/B5LQhLiFFCX'
options = {
'scraper': 'instagram-post',
}
api = CrawlingAPI({'token': crawlbase_token})
try:
response = api.get(instagram_post_url, options=options)
if response.get('statusCode', 0) == 200:
response_body_json = response.get('body', {})
print(response_body_json)
else:
print(f"Request failed with status code: {response.get('statusCode', 0)}")
except Exception as e:
print(f"API request error: {str(e)}"){
"postedBy": {
"accountName": "apple",
"accountUserName": "apple",
"accountLink": "https://www.instagram.com/apple/"
},
"postLocation": {
"locationName": "Cheonan, Korea",
"link": "https://www.instagram.com/explore/locations/236722267/cheonan-korea/"
},
"caption": {
"text": "\"Nature can be a designer.\" #landscapephotography #ShotoniPhone by Chang D.",
"tags": [
{
"hashtag": "#landscapephotography",
"link": "https://www.instagram.com/explore/tags/landscapephotography/"
},
{
"hashtag": "#ShotoniPhone",
"link": "https://www.instagram.com/explore/tags/shotoniphone/"
}
]
},
"media": {
"images": [
"https://instagram.fccu1-1.fna.fbcdn.net/..."
],
"videos": []
},
"likesCount": 373174,
"viewsCount": 0,
"dateTime": "2019-11-22T17:21:42.000Z",
"repliesCount": 12,
"replies": [
{
"accountUserName": "user123",
"accountLink": "https://www.instagram.com/user123/",
"text": "Beautiful shot!",
"likesCount": 0,
"dateTime": "2020-03-26T05:48:15.000Z"
}
]
}Extract full profile data β follower counts, bio, posts, stories, and IGTV content.
from crawlbase import CrawlingAPI
crawlbase_token = 'YOUR_CRAWLBASE_TOKEN'
instagram_profile_url = 'https://www.instagram.com/apple/'
options = {
'scraper': 'instagram-profile',
}
api = CrawlingAPI({'token': crawlbase_token})
try:
response = api.get(instagram_profile_url, options=options)
if response.get('statusCode', 0) == 200:
response_body_json = response.get('body', {})
print(response_body_json)
else:
print(f"Request failed with status code: {response.get('statusCode', 0)}")
except Exception as e:
print(f"API request error: {str(e)}"){
"username": "apple",
"verified": true,
"postsCount": {
"value": "645",
"text": "645"
},
"followersCount": {
"value": "23,226,349",
"text": "23.2m"
},
"followingCount": {
"value": "6",
"text": "6"
},
"name": "apple",
"bio": {
"text": "Everyone has a story to tell. Tag #ShotoniPhone to take part.",
"tags": [
{
"hashtag": "#ShotoniPhone",
"link": "https://www.instagram.com/explore/tags/shotoniphone/"
}
]
},
"posts": [
{
"link": "https://www.instagram.com/p/B_XxvQvlsGe/",
"image": "https://scontent-ams4-1.cdninstagram.com/...",
"imageData": "Photo by apple on April 24, 2020."
}
],
"igtvs": [
{
"link": "https://www.instagram.com/tv/B9ex0TSlMCg/",
"caption": "Shifting Perspectives",
"duration": "1:44"
}
]
}Extract posts, engagement metrics, and trending content from any public Instagram hashtag page.
from crawlbase import CrawlingAPI
crawlbase_token = 'YOUR_CRAWLBASE_TOKEN'
instagram_hashtag_url = 'https://www.instagram.com/explore/tags/love/'
options = {
'scraper': 'instagram-hashtag',
}
api = CrawlingAPI({'token': crawlbase_token})
try:
response = api.get(instagram_hashtag_url, options=options)
if response.get('statusCode', 0) == 200:
response_body_json = response.get('body', {})
print(response_body_json)
else:
print(f"Request failed with status code: {response.get('statusCode', 0)}")
except Exception as e:
print(f"API request error: {str(e)}"){
"hashtag": "#love",
"postsCount": 1922533116,
"posts": [
{
"link": "https://www.instagram.com/p/CFr2LTkDGAL",
"shortcode": "CFr2LTkDGAL",
"caption": "Serious.\n#fitness #gym #love #lifestyle...",
"commentCount": 20,
"likeCount": 633,
"takenAt": "2020-09-28T15:23:11.000+00:00",
"isVideo": false
}
]
}Instagram employs several layers of protection:
- Rate Limiting β Restricts the number of requests per time window; exceeding limits results in temporary or permanent blocks
- CAPTCHA β Triggers during login or suspicious browsing activity
- Dynamic Content β Pages are frequently updated, breaking selector-based scrapers
- Session Cookies β Tracks user behavior and flags sudden pattern changes
- User-Agent Checks β Suspicious UA strings trigger detection
| Strategy | Description |
|---|---|
| Use Rotating Proxies | Distribute requests across multiple IPs to avoid rate limits |
| Randomize User Agents | Rotate UA strings to mimic different browsers and devices |
| Session Management | Maintain consistent sessions rather than creating new ones repeatedly |
| Limit Request Frequency | Add random delays between requests to mimic human behavior |
| Simulate Human Behavior | Scroll, click, and interact naturally rather than hammering endpoints |
| Scrape Off-Peak Hours | Less server load means fewer CAPTCHAs and rate limit triggers |
| Respect robots.txt | Check Instagram's scraping guidelines and adhere to them |
| Use Headless Browsers | Tools like Selenium render JavaScript for a more authentic experience |
Tip: Crawlbase handles all of these automatically β proxies, CAPTCHAs, rate limiting, and JS rendering are built in, so you can focus on the data.
instagram-scraper/
βββ README.md
βββ LICENSE
βββ .gitignore
βββ .gitattributes
βββ requirements.txt
βββ examples/
βββ instagram_page_scraper.py # Raw HTML scraping
βββ instagram_post_scraper.py # Structured post data
βββ instagram_profile_scraper.py # Full profile extraction
βββ instagram_hashtag_scraper.py # Hashtag page scraping
An Instagram scraper is a tool that automates collecting public data from Instagram β including profiles, posts, comments, hashtags, and engagement metrics β without manual browsing.
Scraping is legal when limited to publicly accessible data (images, captions, likes, follower counts). Avoid scraping private information or violating copyright. Always comply with Instagram's terms of service and applicable data protection laws like GDPR.
- User Profiles β username, bio, follower/following counts, post count
- Posts β captions, images, videos, hashtags, likes, comments
- Comments β text, timestamps, usernames
- Hashtags β post count, trending posts under a tag
- Stories β public story content
- IGTV β video titles and durations
- Location Data β geotags on public posts
Respect user privacy, obtain consent where required, avoid collecting personal contact details, and use scraped data responsibly. Responsible scraping means not using data for spam, harassment, or re-selling personal information.
- Social media marketing optimization
- Influencer discovery and vetting
- Competitor content analysis
- Brand sentiment monitoring
- Trend identification and reporting
- Market research and academic studies
- Crawlbase Crawling API Docs
- Instagram Scraper Reference
- Full Blog Post: How to Scrape Instagram Data Using Python
- Crawlbase Pricing
- Email: support@crawlbase.com
- Docs: crawlbase.com/docs
- Status: status.crawlbase.com
MIT License β see LICENSE for details.
Start scraping today! Create a free Crawlbase account β no credit card required, first 1,000 requests are on us.