How to use the Bardeen scraper

Advanced

Bardeen scraper allows you to extract data from any website without code. We’ve created a point-and-click interface to make it easy for non-technical people to get information from static websites without copy-pasting.


The scraper is most often used as a part of a playbook to automate your day-to-day workflows.


Recruitment specialists may save job candidates from LinkedIn to Google Sheets. Venture capitalists may import promising startups from Crunchbase to Airtable. And makers may save Twitter profiles to Notion.


To build a workflow with scraper, you will need to:

  • Create a scraper model
  • Get data using a scraper model
  • Make playbook to streamline the entire process


1. Creating a scraper model


Navigate to the page that you’d like to scrape. Then create a scraper model by opening Bardeen and writing the following command:

Do create new scraper model


Pick one of the three scraper models and define it.


What’s a scraper model?

A scraper model is a set of rules used to tell the Bardeen engine what information to extract from a web page. 


Scraper models work only for pages of the same type. For example, a scraper model for LinkedIn profile pages will only work on profile pages and will fail if used on a company page.


Because all websites have a unique code structure, you will need to create a new scraper model for each page type and website.


There are three types of scraper models that you can create:

  1. Individual scraper (Ex: LinkedIn profile page)
  2. List scraper (Ex: LinkedIn jobs pages)
  3. Table scraper (Ex: Crunchbase featured funding rounds)


Individual Scraper

If you picked “individual scraper,” you can go ahead and hover over a website element to be scrapped. The element will be shown inside a black box.


Single-click on the desired element. Give it a name and pick the element type.



 There are the following element types: 

  • Text
  • Link
  • Image
  • Click
  • Input


The element will be “text” by default.


You can capture as many elements as you’d like. 


You can also extract the page title and URL by clicking on the plus icon.



When done, give your scraper model a name and hit the check icon to save it.


List Scraper

To build a list scraper model (full article), you will first need to help Bardeen recognize list items.


First, click on a text element inside the first list item. And then click on the same element inside the second list item.


If successful, all list items will be inside a blue box.



From here, you can click on any list element inside the blue box and set the fields that you’d like to scrape, similar to the individual scraper model.


Finally, pick the pagination type. Learn more about list pagination here.

Table Scraper

The table scraper extracts data in the <table> format.


After selecting the table scraper model, simply hover over and click on a table when it shows inside a black box.


That’s it!


2. Get data using a scraper model

After defining a scraper model, you need to extract data from a web page using that model.


Use the following GET command:

Get data using scraper model [model]


The information will be returned in table format.


💡 Note: you have to run this command from the relevant web page (or specify the URL to scrape a page in the background).


3. Create a playbook with scraper

By now, we’ve created a scraper model and extracted information from a website. The data was returned in table format. 


From here, we want to send that data to an app to complete our workflow.


You can output the scraped data as a row in Google Sheets, Airtable, or Notion. Alternatively, you can create a Trello card or a Jira issue with this data.


Most scraper playbooks will consist of the following two commands:

  1. Get data using scraper model [model name]
  2. Do append table rows to Google Sheet [sheet name] from table [last table]



To learn more about creating playbooks, see this article


💡 Note: the data was returned in the “table” format. All apps other than Sheets, Airtable, or Notion take inputs in “string” format. You can convert a table into a string with the “get string from table” command.


How to edit a scraper model

To make changes to a scraper model, navigate to the appropriate web page type for this scraper model (such as LinkedIn profile page).


From there, write the following command:

Do edit scraper model [model name]


Edit the model the same way you created it.


Background Scraper

To scrape pages in the background, add the “on url” parameter to the “get data using scraper model” command. 



The background scraper is most commonly used when you need to scrape the same page multiple times.


You can scrape the same page periodically by creating a WHEN trigger:

WHEN time has passed the duration of 2h, then [playbook]



Deep Scraper

The deep scraper allows you to extract a list using the list scraper and then run an individual scraper model on each list item. We use the output of one scraper model as the input for another.


A playbook with deep scraping includes the following commands:

  1. Get data using scraper model [model name] with pagination limit [limit]
  2. Get array from last table column [column]
  3. Get data from multiple pages using scraper model [model] on urls [last string]


The first command scrapes a list. Make sure to get links for each list item by using the “link” element type.


The second command extracts a column with links from the generated table and turns it into a string. 


The third command scrapes links in the background using the second scraper model.


Examples

Scrape Nasdaq


Scrape Upwork


Scrape LinkedIn