How do I web scrap with Laravel?

Author :

React :

Comment

the web scraping with Laravel combines the power of the PHP framework and specialized libraries to automate data extraction.

It is a robust solution for collecting, processing, and organizing information online. In this article, let's find out how to do this. web scraping with Laravel.

Laravel uses PHP as its engine, offering an organized structure and integrated tools to facilitate web scraping.
Laravel uses PHP as its engine, offering an organized structure and integrated tools to facilitate web scraping. ©Christina for Alucare.fr

Prerequisites for scraping with Laravel

Laravel is a PHP framework widely used to develop modern web applications.

Thanks to its rich ecosystem, it offers an ideal environment for setting up web scraping with PHP in an organized and maintainable way. To start with, it's important to :

  • 🔥 Master the basics of PHP and Laravel.
  • 🔥 Understand HTML and CSS to target elements.
  • 🔥 Know how to manipulate Composer to install packages.

👉 The essential tools are :

  • Drop : the reference PHP library. It simplifies queries and data extraction.
  • Puppeteer/Headless Chrome A headless browser. Indispensable for scraping pages that use a lot of JavaScript.
  • Laravel HTTP Client allows you to make queries with Http::get() to retrieve simple content.

Tutorial for creating your first scraper with Laravel

Follow this step-by-step tutorial to create a functional scraper with Laravel.

⚠ Always respect the site's terms and conditions of use. robots.txt and local legislation. Limit the load (rate-limit), identify a User-Agent and do not collect sensitive data.

Step 1: Installation and configuration

Create a new Laravel project and add Goutte (Laravel integration).

# 1) Create a new Laravel project
compose create-project laravel/laravel scraper-demo
cd scraper-demo

# 2) Add Goutte (Laravel integration)
composer require weidner/goutte

Step 2: Create a handcrafted order

Generate a command containing your scraping logic:

php artisan make:command ScrapeData

The file is created here : app/Console/Commands/ScrapeData.php.

Step 3: Writing scraper code

In the generated command, add :

  • ✅ One HTTP request to retrieve HTML content.
  • ✅ Des CSS selectors to target data.
  • ✅ One loop to browse items and display results.

Here is an example of complete code To scrape article titles from a blog:

info("Scraping: {$url}");

        // 1) HTTP request to retrieve HTML
        $crawler = Goutte::request('GET', $url);

        // 2) Use CSS selectors
        $nodes = $crawler->filter('h2 a');

        // 3) Loop over elements and display
        $nodes->each(function (Crawler $node, $i) {
            $title = $node->text();
            $link = $node->attr('href');
            $this->line(($i+1) . ". " . $title . " - " . $link);
        });

        return self::SUCCESS;
    }
}

Best practices for web scraping with Laravel

To make web scraping with Laravel, here are a few tips to keep in mind:

1. Task and queue management

Scraping can take several seconds per page. Imagine if you had to scrape 1,000 pages, your Laravel application would be stuck and unusable for quite a while. The solution: the jobs and the game's Laravel tails.

  • A job, It's a task you want to run in the background.
  • A tail (queue) is where these jobs are stored so that they can be executed one by one, without blocking the rest.

👉 Here's an example code to encapsulate the scraping logic in a Job :

// app/Jobs/ScrapePageJob.php
<?php

namespace App\Jobs;

use Goutte\Client; // Ou Guzzle/Http, selon ta stack
use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Queue\SerializesModels;

class ScrapePageJob implements ShouldQueue
{
    use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;

    protected string $url;

    public function __construct(string $url)
    {
        $this->url = $url;
    }

    public function handle(): void
    {
        $client = new Client();

        $crawler = $client-&gt;request('GET', $this-&gt;url);

        // Simple example: extract all <h1>
        $titles = $crawler-&gt;filter('h1')-&gt;each(function ($node) {
            return $node-&gt;text();
        });

        // Persistence / logs / events...
        foreach ($titles as $title) {
            \Log::info("[Scraping] {$this-&gt;url} - H1: {$title}");
        }
    }
}

// app/Http/Controllers/ScraperController.php
onQueue('scraping'); // if you want a dedicated queue
        }

        return response()->json(['status' => 'Scraping launched in background 🚀']);
    }
}

👉 As you have seen, jobs go into a queue. Laravel offers several systems for managing this queue. The most commonly used are:

  • The queue with the database jobs are stored as rows in a SQL table, then executed one by one by a worker.
  • Queuing with Redis Jobs are placed in memory in an ultra-fast queue, ideal for processing large volumes of work.

2. Automation with Laravel's task scheduler

Laravel integrates a task scheduler (scheduler) that allows you to’automate scraping.

So you can plan the execution of a scraping command at regular intervals, for example every hour.

👉 Here's how to run it in app/Console/Kernel.php :

command('scraper:run')->hourly();

    // Useful examples :
    // $schedule->command('scraper:run')->everyFifteenMinutes();
    // $schedule->command('scraper:run')->dailyAt('02:30')->timezone('Indian/Antananarivo');
}

/**
 * Order registration.
 */
protected function commands(): void
{
    $this->load(__DIR__ . '/Commands');
}


}

3. Bypassing anti-scraping protection

Many websites implement protections against scrapers. To avoid being blocked, it is best to:

  • ✅ Change the User-Agent simulate a real browser.
  • Managing deadlines insert pauses (sleep, throttle) between requests to avoid overloading the target server.
  • Using proxies Distribute requests over several IP addresses.

What are the alternatives to web scraping with Laravel?

Although Laravel is useful for integrating scraping into a PHP application, there are other solutions that are often more specialized.

Python is the most widely used language for scraping. It features powerful libraries such as Scrapy and BeautifulSoup.

  • Code-free tools

More and more tools are available for scraping. without coding Where with the help of AI. We quote: Bright Data, Octoparse, Apify, etc.

Solutions such as Bright Data enable you to collect data quickly, without coding.
Solutions like Bright Data make it possible to collect data quickly without coding. ©Christina for Alucare.fr

FAQs

How do I scrape a login-protected website with Laravel?

This is one of the most common challenges in web scraping. To achieve this with Laravel, you need to:

  1. Simulate the connection with a POST request, sending the email address and password.
  2. Manage cookies or session to access protected pages.

How to manage pagination when web scraping with Laravel?

To manage navigation from one page to another with Laravel, you must:

  1. Scrape the first page.
  2. Detect the “next page” link” with a CSS selector.
  3. Loop on each link until the end of the pagination.

How do I export scraped data (to CSV, Excel, or JSON)?

With Laravel, you can use :

  • fputcsv() for CSV.
  • the bookshop Maatwebsite\Excel for Excel.
  • The native function json_encode() to generate a JSON file.

How do you handle errors and exceptions during scraping?

To handle failed requests with Laravel, you need to :

  1. Encapsulating requests in a try/catch.
  2. Check HTTP status codes (404, 500, etc.). In case of error, log or program a new attempt.

Is web scraping legal or illegal?

The legality of web scraping is a complex issue. It all depends on the target site and the use of the data.

📌 Le web scraping in France is often discussed in the context of copyright and database protection.

💬 In short, the web scraping with Laravel is powerful and flexible, but requires good practices to remain effective and legal. Tell us what you think in the comments.

Found this helpful? Share it with a friend!

This content is originally in French (See the editor just below.). It has been translated and proofread in various languages using Deepl and/or the Google Translate API to offer help in as many countries as possible. This translation costs us several thousand euros a month. If it's not 100% perfect, please leave a comment for us to fix. If you're interested in proofreading and improving the quality of translated articles, don't hesitate to send us an e-mail via the contact form!
We appreciate your feedback to improve our content. If you would like to suggest improvements, please use our contact form or leave a comment below. Your feedback always help us to improve the quality of our website Alucare.fr


Alucare is an free independent media. Support us by adding us to your Google News favorites:

Post a comment on the discussion forum