You want to dive into the world of web scraping but without getting lost in complicated codes?
With Python and the library BeautifulSoup, you can easily extract and organize data of a website in just a few lines.

Prerequisites for scraping Python with BeautifulSoup
✅ Before you get started, it's important to have a few programming basics. This gives you a better understanding of how the code works. You don't need to be an expert, but knowing how to read and execute a Python script will help you a lot.
Next, here's what you need to do first to make scraping on Python with BeautifulSoup :
- ✔ Install Python as well as a development environment.
- ✔ Install
pip, the tool that makes it easy to add Python libraries. - ✔ Install BeautifulSoup with the command :
pip install beautifulsoup4
- ✔ Install Requests to retrieve web pages with the command :
pip install requests
How to web scrap with Python and BeautifulSoup?
Follow our tutorial for a simple web scraping project.

Project : retrieve the title of a page and all the links it contains.
Step 1: Retrieve page content with Requests
To perform a HTTP GET request to a URL, use the Requests.
📌 When you send an HTTP request with Requests, the server always returns a status code. These codes indicate whether the request was successful or not.
200 : success.
301 / 302 redirection.
404 page not found.
500 internal server error.
.status_code. Here is an example of code that sends a request to bonjour.comwhich checks the status code and displays a snippet of HTML content if all is well:
import requests
# Target URL
url = "https://bonjour.com"
# Send a GET request
response = requests.get(url)
# Check status code
if response.status_code == 200:
print("Success: the page has been retrieved!")
html = response.text # HTML content of the page
print("Extract HTML content:")
print(html[:500]) # displays only the first 500 characters
else:
print(f "Error: status code {response.status_code}")
Step 2: Analyze HTML code with BeautifulSoup
response.text), you get a character string containing all the page's HTML code. To easily manipulate this HTML, we use BeautifulSoup to create an object BeautifulSoup."html.parser). This allows BeautifulSoup to interpret the HTML correctly and avoid warnings.from bs4 import BeautifulSoup
import requests
url = "https://bonjour.com"
response = requests.get(url)
html = response.text
# Specifying the parser is recommended
soup = BeautifulSoup(html, "html.parser")
Step 3: Find and extract elements
- Use
find()andfind_all()
# Recover title <h1>
h1 = soup.find("h1")
print(h1.get_text())
# Retrieve all links <a>
liens = soup.find_all("a")
for lien in liens:
print(lien.get_text(), lien.get("href"))
- Target elements by attribute
You can refine the search by attributes such as class, id or any other HTML attribute.
⚠️ Remark In Python, we write class_ instead of class to avoid conflict with the reserved word class.
# Retrieve a div with a specific id
container = soup.find("div", id="main")
# Retrieve all links with a specific class
liens_nav = soup.find_all("a", class_="nav-link")
- Using CSS selectors with
select()
For more precise searches, use select() with CSS selectors.
# All links in article titles
links_articles = soup.select("article h2 a")
# All <a> whose href attribute begins with "http".
links_http = soup.select('a[href^="http"]')
The CSS selectors are very powerful if you want to target specific parts of a page without manually going through all the HTML.
How to extract data from an HTML table with BeautifulSoup?

So far, we have seen how to retrieve titles, links, and text from a web page.
⚠ But often, real-world use cases are more complex: structured data extraction such as tables or lists, pagination management, and resolving common scraping errors. That's exactly what we're going to look at together.
Extract tables and lists
Websites often present their data in HTML tables (<table>, <tr>, <th>, <td>) or lists (
- /
- ). To transform these structures into usable data, you need to learn how to go through them line by line or element by element.
Whenever you want extract an HTML tablethe principle is simple:
- ✅ Recover headers (
<th>) to identify column headings. - ✅ Browse each line (
<tr>) and search for cells (<td>) which contain the real data. - ✅ Store information in a list or dictionary.
For a HTML list (
- or
- ) :
- ✅ Locate all beacons
withfind_all. - ✅ Retrieve their content (text or link) and add it to a Python list.
In summary :
Beacons
<table>,<tr>,<th>,<td>are used to reconstruct an array.
Beacons/,transform an HTML list into a Python list.Here's an example with a table:
html = """ <table> <tr> <th>Last name</th> <th>Age</th> <th>Town</th> </tr> <tr> <td>Alice</td> <td>25</td> <td>Paris</td> </tr> <tr> <td>Bob</td> <td>30</td> <td>Lyon</td> </tr> </table> """ # Create BeautifulSoup object soup = BeautifulSoup(html, "html.parser") # Extract headers from array headers = [th.get_text(strip=True) for th in soup.find_all("th")] print("Headers:", headers) # Extract data rows (skip 1st row as these are the headers) rows = [] for tr in soup.find_all("tr")[1:]: cells = [td.get_text(strip=True) for td in tr.find_all("td")] if cells: rows.append(cells) print("Lines :", rows)Here,
find_all("th")retrieves headers andfind_all("td")retrieves the cells in each row. Loop over the<tr>to rebuild the table row by row.Here's an example on a list:
from bs4 import BeautifulSoup html_list = """- Apple
- Banana
- Orange
Here, every
is directly transformed into a Python list element, giving the result["Apple", "Banana", "Orange"].Manage pagination and links
In many cases, the data doesn't fit on a single page. It's spread across several pages via “next page” or a numbered pagination (?page=1, ?page=2, ...).
📌 In both cases, you must curl (browse in a loop) to fetch all the pages and merge the data.
Example with a page parameter :
import time import requests from bs4 import BeautifulSoup # Example of URL with pagination BASE_URL = "https://bonjour.com/articles?page={}" HEADERS = {"User-Agent": "Mozilla/5.0"} all_articles = [] # Assume 5 pages to browse for page in range(1, 6): url = BASE_URL.format(page) r = requests.get(url, headers=HEADERS, timeout=20) if r.status_code == 200: soup = BeautifulSoup(r.text, "html.parser") # Extract article titles articles = [h2.get_text(strip=True) for h2 in soup.find_all("h2", class_="title")] all_articles.extend(articles) else: print(f "Error on page {page} (code : {r.status_code})") time.sleep(1.0) # politeness print("Articles retrieved :", all_articles)Brief explanation:
- Prepare the URL with a {} placeholder to insert the page number.
BASE_URL = "https://bonjour.com/articles?page={}- Some websites block requests without a “browser identity.” Adding a User-Agent prevents you from being mistaken for a bot.
headers = {"User-Agent": "Mozilla/5.0"} requests.get(url, headers=headers)- Loop from page 1 to 5.
for page in range(1, 6):- Retrieve the page's HTML.
requests.get(url)- Limit waiting time if the site doesn't respond.
requests.get(url, timeout=20)- Parser la page.
BeautifulSoup(response.text, "html.parser")- Retrieve all article titles.
find_all("h2", class_="title")- Add found items to a global list.
all_articles.extend(articles)- Introduce a pause between each request to avoid overloading the server and being banned.
time.sleep(1.0)- After the loop,
all_articlescontains all 5 page titles.
Common mistakes and challenges
❗ Scraping isn't always as simple as pressing a button and everything is fine. You may encounter frequent obstacles such as:
- HTTP errors
404 page not found
403 access forbidden
500 server-side errorExample :
response = requests.get(url) if response.status_code == 200: # Page OK print("Page retrieved successfully") elif response.status_code == 404: print("Error: page not found") else: print("Code returned:", response.status_code)- Sites that block scraping
Some detect automatic requests and block access.
- Dynamic pages (JavaScript)
BeautifulSoup only reads static HTML. If the page loads its content with JavaScriptyou'll see nothing.
✅ In this case, use tools such as Selenium Where Playwright.
On the other hand, if you want to scrape efficiently without getting blocked or damaging the site, here are the best practices:
- ✔ Respect the robots.txt file of a website.
- ✔ Set up deadlines between requests to avoid overloading the server (using time.sleep()).
- ✔ Use proxies and rotate them.
- ✔ Regularly change your User Agent.
Web scraping with Selenium and BeautifulSoup?

Web scraping with Selenium and BeautifulSoup on Chrome. ©Christina for Alucare.fr ⚠ A reminder BeautifulSoup is an excellent HTML parser, but it cannot execute JavaScript on a web page. That's where Selenium comes in handy!
Basically, Selenium control a real browserIt executes JavaScript and displays the page as if a human were browsing. BeautifulSoup will then analyze the HTML code once the page is fully rendered. So you can extract what you want.
Step 1: Install Selenium and BeautifulSoup
Here, instead of using the Request library, we will use Selenium. To install it, you must go through pip.
pip install selenium beautifulsoup4Next, you need to download and install a WebDriver which corresponds to your browser version (e.g. ChromeDrive for Google Chrome).
✅ You can either place it in the same folder as your Python script or’add to the PATH environment variable of your system.
Step 2: Configure Selenium
First and foremost, you need to import webdriver from Selenium to control a browser.
from selenium import webdriver from selenium.webdriver.common.by import ByNext, launch a browser. This will open the web page and will execute the JavaScript (Example: Chrome).
driver = webdriver.Chrome()You tell the browser which page to visit.
driver.get("https://www.exemple.com")If the page takes a while to display certain elements, you can tell Selenium to wait a little while.
driver.implicitly_wait(10)Step 3: Retrieving page content
Once the page is loaded, the Full DOM (HTML source code after JS execution).
html_content = driver.page_sourceStep 4: HTML analysis with BeautifulSoup
Now pass this source code to BeautifulSoup so you can use it:
from bs4 import BeautifulSoup # Create a BeautifulSoup object soup = BeautifulSoup(html_content, 'html.parser') # Example: retrieve all page titles titles = soup.find_all('h2') for title in titles: print(title.get_text())👉 BeautifulSoup offers powerful methods like find(), find_all(), and CSS selectors for target and extract elements HTML code.
Step 5: Closing the browser
Very important: always close your browser after running the program to free up resources!
driver.quit()✅ And there you have it! You can now combine the power of Selenium to simulate human navigation (clicks, scrolls, etc.) with the efficiency of BeautifulSoup for HTML code analysis.
FAQs
What's the best tool for web scraping in Python?
There is no such thing as the best universal tool, but rather solutions tailored to your project.
🔥 BeautifulSoup HTML parser: simple and effective for parsing HTML and extracting content quickly. Ideal for beginners and small projects.
🔥 Scrapy : it is a comprehensive framework designed to manage large volumes of data with advanced features.
🔥 Playwright : perfect for complex JavaScript-generated sites, as it simulates a real browser and allows you to interact with the page like a human.
How to use BeautifulSoup to extract content from a tag <div> ?
With BeautifulSoup, you can target a specific beacon with a CSS selector. To extract content from a <div>here are the steps:
- Retrieve the page with Requests, then analyze with BeautifulSoup
from bs4 import BeautifulSoup import requests url = "URL_OF YOUR_SITE" # Replace with real URL response = requests.get(url) html_content = reponse.text soup = BeautifulSoup(html_content, "html.parser")- Use the
select()by passing it your CSS selector to target the<div>
To retrieve the first element, use
soup.select_oneTo retrieve all items, use
soup.selectHTML example:
<div class="article"> <h2>Article title</h2> <p>Here's what the paragraph says.</p> </div>Example with CSS :
# Retrieve first div with "article" class div_article = soup.select_one("div.article") # Display text content if div_article: print(div_article.get_text(strip=True))Here, the CSS selector is
div.article.- Extract elements inside the
<div>
# Retrieve title from inside div title = soup.select_one("div.article h2").get_text() # Retrieve paragraph inside div paragraph = soup.select_one("div.item p").get_text() print("Title:", title) print("Paragraph:", paragraph)How do I use Requests and BeautifulSoup together?
These two libraries are complementary.
- Requests retrieves the content of a web page with an HTTP request.
It sends an HTTP request to the target site and downloads the page's raw HTML code.
import requests url = "https://sitecible.com" response = requests.get(url) # HTTP request print(response.text) # displays raw HTMLAt this stage, all you have is a huge text full of tags (<html>,<div><p>etc.).
- BeautifulSoup analyzes this HTML content to extract what's of interest to you.
It takes raw HTML and transforms it into an organized structure. This allows you to easily navigate within the HTML: locate, extract, and retrieve data.
from bs4 import BeautifulSoup soup = BeautifulSoup(response.text, "html.parser") # parses the HTML title = soup.find("h1").get_text() # extracts the content of a <h1> print(title)Why doesn't my web scraping code work on some sites?
Sometimes your script won't retrieve anything, because some sites don't provide all their content directly in HTML.
These sites use JavaScript to dynamically load data. However, BeautifulSoup does not allow you to’analyze data rendered by JavaScript.
In this case, you should turn to tools such as Playwright Where Selenium.
What role does BeautifulSoup play in web scraping?
BeautifulSoup acts as’HTML parser.
It takes the source code of a page in plain text form and transforms it into a structured object that you can easily browse.
Without this library, you'll see a huge block of unreadable text. Simply put, BeautifulSoup is the translator between the Raw HTML and your Python code.
Web scraping: BeautifulSoup vs Scrapy?
BeautifulSoup and Scrapy are very different, although both are used for web scraping.
BeautifulSoup Scrapy A simple library for parsing HTML and extracting data. A complete framework that manages the entire scraping process
(queries, link tracking, pagination, data export, error handling).In summary, BeautifulSoup makes it easier to’HTML data extraction in Python. This library is perfect for beginners, as it makes scraping quick and easy.
Otherwise, if you don't want no coding, the comprehensive tool Bright Data is also an excellent solution for web scraping.
👉 Now, tell us in comments what you managed to scrape!
- ✅ Locate all beacons
- with
- ✅ Recover headers (
- ,





