Where IT meets innovation, IT-INFO lights the path to technological brilliance
Learn how to start web scraping with Crawlee and Python
Crawlee is a versatile and modern web scraping framework for Python, designed to make crawling and scraping websites easy and efficient. In this guide, we'll explore the basics of using Crawlee, starting from installation and moving to a practical example.
Before you get started, make sure you have the following installed:
To install Crawlee, use the following pip command:
pip install crawlee
Let's walk through a simple example where we will scrape the title and meta description from a webpage.
# Import the necessary modules from Crawlee
from crawlee import PlaywrightCrawler
# Define an asynchronous function to handle the scraping logic
async def handle_request(context):
page = context.page
title = await page.title()
description = await page.query_selector("meta[name='description']")
description_content = await description.get_attribute('content') if description else 'No description'
print(f"Page Title: {title}")
print(f"Meta Description: {description_content}")
# Create the crawler and attach the request handler
crawler = PlaywrightCrawler(handle_request)
# Add a request to the crawler
crawler.add_requests('https://example.com')
# Run the crawler
crawler.run()
https://example.com
).Crawlee provides multiple options to enhance your web scraping experience, such as:
You can explore more advanced features in the official guides.
Crawlee offers a robust and flexible solution for web scraping in Python, making it easy to handle dynamic websites. In this quick-start guide, we covered how to scrape a webpage's title and meta description. For more advanced use cases, such as handling multiple pages, setting up concurrency, or customizing requests, be sure to dive deeper into Crawlee’s documentation.
Created on Sept. 19, 2024, 7:10 p.m.