With the Bardeen scraper, you can extract data from any website and send it directly to your favorite apps. This means that you can ditch copy-pasting from your day-to-day processes forever.
The scraper allows you to do things like copying LinkedIn profile data to your Notion database with one click, saving interesting tweets to a Google Doc, and much more.
In this tutorial, you will learn how the scraper works and how you leverage it to save time.
Scraper fundamentals
Let's break down web scraping in simple terms. Think of all the information on the internet as being stored in big digital libraries.
Websites use special tools called APIs to let you access the information stored in these libraries. An API (Application Programming Interface) is like a messenger that takes your request to the library and brings back the information you need. But not all information is easy to get through these tools. For example, when you read a tutorial on a website, it has a title, content, links, and other details. All this information is stored in a digital library, and the website shows it to you in a readable format.
However, you can't directly access the library to get the information in an organized way. But since the information is displayed on the website, we can still get it.
Web scraping is like copying specific parts of a webpage and turning them back into organized data. In short, web scraping is a method of gathering information from websites and organizing it into a structured format using special tools called scrapers.
What’s a scraper template?
A scraper template tells the scraper what information to extract from a webpage and where to find it. These templates only work for specific types of pages. For example, a template for LinkedIn profile pages will only work on those pages, not on LinkedIn company pages. There are two types of scraper templates: one for individual pages and one for lists.
The individual page template grabs one piece of information for each data field, like getting just one "name" from a LinkedIn profile page.
The list scraper template looks for repeating elements on a page. For example, it can extract multiple names from a LinkedIn search results page, where each name appears once per search result.
What can I scrape?
Click: Clicking on the “contact info” link to open a popup.
Input: Such as filling out a form.
You can also get information that isn't directly shown on the page.
Page Link: Get the current website's URL.
Page Title: The title text that shows in Google search or as a browser tab name.
Image
Ex: profile images from LinkedIn
Click
Ex: Click on the “contact info” link to open a popup.
Input
Ex: Fill out a form.
Additionally, you can also get fields that are not displayed on the page.
Page Link
Get the current website URL.
Page Title
This is the meta title text that shows in Google search or as a tab name.
Meta Image
The preview image that appears when you share a link on social media, also called an Open Graph image.
Time Stamp
The exact time when a page was scraped, useful for tracking when you scrape the same page multiple times.
What are the available scraper actions?
Scrape data on an active tab
This action scrapes data from the currently open webpage. Use it when you need to copy one thing at a time. For example, if you're making a gift wish list on Notion from Amazon, find a product, launch Bardeen, and copy it with one click.
Scrape data on URLs in the background - Premium action
This action scrapes data from multiple links in the background which works well if you don't want your computer occupied while extracting data. For example, if you have a list of LinkedIn profile links, Bardeen will scrape the missing info. No more copying and pasting.
Trigger: when website data changes - Premium action
Instead of checking a website a million times a day to get updates, you can set this trigger to do it for you.This trigger scrapes a website every 10 minutes and will return new information, which you can use in your Autobook to send you a notification (email, Slack, or SMS), for example. Use this to track competitor prices, government tenders, and product availability.
Scrape data lists - Premium action
Instead of scraping items one by one, you can choose a list of items on a website, and Bardeen will scrape each list into a row of data in your preferred apps.
Bardeen.ai is a Chrome Extension that can build automation for you with AI. It includes pre-built scraper templates for LinkedIn and Instagram, and 100+ other websites.
Activate with just one click, no coding required.
Creating a scraper template
A scraper template tells the scraper what information to extract and where to find it on the page. Since each website is different, you need a template for each site you want to scrape. Luckily, it's easy. You can create a scraper template in the Playbook builder or the popup window.
All scraper actions need a template to work. You can choose from one of our ready made templates, your existing templates or create a new one.
You can also create or edit templates from the popup window. Click the scraper icon and select “New Scraper Template.” Next, choose either an individual or list scraper type. Name your template so it's easy to find later.
One website might need multiple templates, like one for LinkedIn profiles and another for search results. Name them clearly. Click on an element you want to extract and select the data type.
If you need to select an item that our scraper isn’t picking up, check out our Advanced Scraping Tutorial.
Creating a list scraper - Premium action
When creating a list scraper template, there's an extra step – defining the list. You need to click on the same item in two different list items. This helps Bardeen know which lists to scrape since some pages have multiple lists.
Bardeen will highlight each item with a box to make sure that is the exact data you want. Click on an item inside any box to add it to your template.
Loading more list items (pagination)
After you finish setting up your list scraper template, a new window will open. It will ask if you want to load more items (pagination). Most websites don’t load long lists all at once. Instead, they use infinite scroll or multiple pages.
You have two options for scraping long lists: infinite scroll and click pagination.
Websites like Facebook or Instagram load new items when you scroll to the bottom. For these, choose “infinite scroll.” Other websites like Google or LinkedIn require you to click a button to go to the next page.
For these, choose “click pagination” and select the button that takes you to the next page (usually it is the > icon).
What if a list has a million items but you only need a few hundred? You can set the maximum number of items or pages to scrape. If left blank, the scraper will try to get as many items as possible.
If you want to stop a scraping job in progress, close the app window and click the "stop scraping" button at the bottom right corner of the screen. You can also do this from the Activities tab → Queue.
If you want to stop scraping jobs in-progress, close the app window and click on the "stop scraping" button at the bottom right corner of the screen. You can also do this from the Activities tab→ Queue.
How to edit a scraper template
There are two common reasons to update your scraper template.
The first is when it breaks and doesn’t extract information correctly. This usually happens when a website changes.
The second reason is if you want to extract more data fields. Editing a scraper template is as easy as creating one.
Click the scraper icon in the popup window and choose the template you want to edit. A new window will open with the original web page you used to create the template. From there, you can add new data fields or delete existing ones.
Building Playbooks with scraper
Building automations with the scraper is similar to building any other Playbook. Go to the Builder, add an action, and choose the scraper template.
The scraper outputs data as a table. You can connect this data to other actions.
For example, to add LinkedIn profile data to your Notion database, click on a box next to a column name and map it to the related field from the scraper action.
When you use the list scraper, it will output a table with multiple rows. Bardeen will run every action once per row. In this example, Notion will create a new entry for each LinkedIn profile returned by the list scraper.
Using multiple scraper templates in one Playbook (deep scraper) - Premium action
You can use multiple scraper templates in one Playbook. This is often done to scrape search results and then visit each page to get more data. This combination is called a “deep scraper.”
To build this type of Playbook, set up the first scraper action as usual. Then, use the links from the first scraper as the input for your second scraper action.
Advanced Scraper
While we strive to scrape as many websites as possible effectively, there are certain limitations and technical challenges that we may encounter. Despite our best efforts, some websites or elements remain difficult to scrape due to various constraints. Here are some known limitations:
- Iframes.
- Shadow DOM.
- Pages blocking users (CAPTCHAs and similar).
- Pages making scraping difficult (usually solvable with custom models).
- Airtable is often tricky to scrape.
- Inability to scrape a specific element from a webpage that does not have a selector.
- Inline JavaScript.
Explore scraper use cases
In the next tutorial, we will cover advanced scraper techniques.
Deep scraping LinkedIn profiles
LinkedIn profile scraper (XPath)