Bardeen scraper allows you to extract data from any website without code. We’ve created a point-and-click interface to make it easy for non-technical people to get information from static websites without copy-pasting.
The scraper is most often used as a part of a playbook to automate your day-to-day workflows.
Recruitment specialists may save job candidates from LinkedIn to Google Sheets. Venture capitalists may import promising startups from Crunchbase to Airtable. And makers may save Twitter profiles to Notion.
To build a workflow with scraper, you will need to:
Navigate to the page that you’d like to scrape. Then create a scraper model by opening Bardeen and writing the following command:
Do create new scraper model
Pick one of the three scraper models and define it.
A scraper model is a set of rules used to tell the Bardeen engine what information to extract from a web page.
Scraper models work only for pages of the same type. For example, a scraper model for LinkedIn profile pages will only work on profile pages and will fail if used on a company page.
Because all websites have a unique code structure, you will need to create a new scraper model for each page type and website.
There are three types of scraper models that you can create:
If you picked “individual scraper,” you can go ahead and hover over a website element to be scrapped. The element will be shown inside a black box.
Single-click on the desired element. Give it a name and pick the element type.
There are the following element types:
The element will be “text” by default.
You can capture as many elements as you’d like.
You can also extract the page title and URL by clicking on the plus icon.
When done, give your scraper model a name and hit the check icon to save it.
To build a list scraper model (full article), you will first need to help Bardeen recognize list items.
First, click on a text element inside the first list item. And then click on the same element inside the second list item.
If successful, all list items will be inside a blue box.
From here, you can click on any list element inside the blue box and set the fields that you’d like to scrape, similar to the individual scraper model.
Finally, pick the pagination type. Learn more about list pagination here.
The table scraper extracts data in the <table> format.
After selecting the table scraper model, simply hover over and click on a table when it shows inside a black box.
After defining a scraper model, you need to extract data from a web page using that model.
Use the following GET command:
Get data using scraper model [model]
The information will be returned in table format.
💡 Note: you have to run this command from the relevant web page (or specify the URL to scrape a page in the background).
By now, we’ve created a scraper model and extracted information from a website. The data was returned in table format.
From here, we want to send that data to an app to complete our workflow.
You can output the scraped data as a row in Google Sheets, Airtable, or Notion. Alternatively, you can create a Trello card or a Jira issue with this data.
Most scraper playbooks will consist of the following two commands:
To learn more about creating playbooks, see this article.
💡 Note: the data was returned in the “table” format. All apps other than Sheets, Airtable, or Notion take inputs in “string” format. You can convert a table into a string with the “get string from table” command.
To make changes to a scraper model, navigate to the appropriate web page type for this scraper model (such as LinkedIn profile page).
From there, write the following command:
Do edit scraper model [model name]
Edit the model the same way you created it.
To scrape pages in the background, add the “on url” parameter to the “get data using scraper model” command.
The background scraper is most commonly used when you need to scrape the same page multiple times.
You can scrape the same page periodically by creating a WHEN trigger:
WHEN time has passed the duration of 2h, then [playbook]
The deep scraper allows you to extract a list using the list scraper and then run an individual scraper model on each list item. We use the output of one scraper model as the input for another.
A playbook with deep scraping includes the following commands:
The first command scrapes a list. Make sure to get links for each list item by using the “link” element type.
The second command extracts a column with links from the generated table and turns it into a string.
The third command scrapes links in the background using the second scraper model.