Now-a-days, the Puppeteer is getting more attention as a web scraping tool. Due to the simplicity , the availability as a open-source tool and ability to develop single page application, Puppeteer is getting the popularity. Prior to start learning on Puppeteer web scraping tool, we should have basic understanding of command line, Javascript, and HTML DOM structure. The Puppeteer tutorial has been broken into few articles which are specified in below table of content.
Puppeteer Tutorial
Tosca Tutorial #1: Puppeteer Overview
Tosca Tutorial #2: Puppeteer Environment Variables
Tosca Tutorial #3: Puppeteer Web Scraping and Puppeteer Test Automation Overview
Tosca Tutorial #4: Install Puppeteer
In this article of Puppeteer Tutorial, we will discuss Puppeteer Web Scraping with an example and Puppeteer Test automation overview.
Puppeteer Web Scraping
The process of data extraction from any web pages is called web scraping. Web scraping has two steps. Firstly, it fetches the web page and then extracts the data. After data extraction, we can use it for any API or store it in a CSV file.
Puppeteer is one of the best tools to support web scraping for Google Chrome or Chromium browser. The puppeteer web scraping is explained in details with the below example –
Basic Puppeteer Web Scraping Example:
Step1# The Puppeteer works on Node JS library. So, the first step is to include the puppeteer library before writing the script for web scraping.
const puppeteerObj = require("puppeteer");
Step2# After including the Puppeteer class, we need to write an async function by using await keyword. It’s required as Puppeteer uses promises. Then call the Puppeteer.launch() method to invoke the browser and call newPage() method to create web page instance.
const browserWeb = await puppeteerObj.launch();
const pageWeb = await browserWeb.newPage();
Step3# Now call the page.goto() method to provide the URL of the desired website.
await pageWeb.goto("https://themachine.science/");
Step4# Use the method page.evaluate() to capture the text of any particular element (in this example, we will capture the header text).
const data = await pageWeb.evaluate(() => {
const header = document.querySelector(".uabb-heading-text").innerText;
return { header };
We will discuss how to identify any object from the web screen in the upcoming tutorial.
Step5# In this last step, we need to process the data and then close the web page. The complete Puppeteer Web Scraping code will be looks like below –
const puppeteer = require("puppeteer");
async function scrap() {
// Launch the browser
const browserApp = await puppeteer.launch();
// Create a page instance
const pageApp = await browserApp.newPage();
// invoke the web page for scraping
await pageApp.goto("https://themachine.science/");
// Select any web element
const data = await pageApp.evaluate(() => {
const header = document.querySelector(".uabb-heading-text").innerText;
return { header };
// Here we can do anything with this data. Here displaying the data
console.log(header);
//We close the browser
await browserApp.close();
}
Scrap();
Step6# Now, we can execute this puppeteer web scraping code using the command: node index.js
Note: In the next article, “Install Puppeteer,” we will discuss the installation setup of Puppeteer and execute the above Puppeteer Web Scraping code.
Puppeteer Test Automation Overview
Apart from web scraping, the Puppeteer has the features to perform the below activities as well,
- Capture the screenshots of web pages.
- We can save the screen of web page as a pdf file.
- Automation of manual steps can be achieved to perform UI testing.
So, combining all the above features, we can use the Puppeteer for test automation. To understand the Puppeteer Test Automation, first, we need to familiar with software testing.
Testing overview:
Testing is required to ensure all the software requirements are fulfilled with out any issues. Different types of testing cycles are available from the beginning of the software development process. Software can be tested manually or through the automated approach.
Purposes of software testing are –
- Verify the quality of the products.
- Find the bugs of the product before the production deployment.
- Checking of requirements are satisfied.
- Testing the product’s performances.
The types of testing are explained here –
Unit Testing – The developers are the responsible to perform unit testing during the code development phase.
Integration Testing – This testing is required after integrating the different components of the software product. The main purpose is to ensure that all the interfaces are working smoothly.
System Testing – It’s a detailed testing which has to be done after the integration to ensure about all the requirements are fulfilled.
User Acceptance Testing – It’s also a detailed testing which has to be done by the end user of the product to ensure the quality.
Regressing Testing – It’s required to ensure the core business process are working smoothly during any software enhancements.
Advantages of Test Automation:
- Reduce the execution cycle.
- Avoid the chances of human errors.
- Minimize the test execution efforts.
- Fast software release.
- Increase the testing coverage to reduce the risk.
- Ability to perform parallel execution.
Why Puppeteer?
Most of the manual operations performed in the Chrome browser can be automated using Puppeteer. So, the Puppeteer is a good choice for unit testing on web applications fast and easier way.
The limitations of Puppeteer as an automation testing tool are –
- Only supports Chrome and Chromium browser.
- Coss-browser testing is not possible.
- Mobile testing can not be done.
Headless Chrome Testing:
The headless browser means the Puppeteer is interacting with a chrome browser as a background application, which means that the chrome UI is not visible on the screen. So, the headless chrome testing means the automation testing is to be performed in a hidden browser. Also, after the headless chrome testing, the Puppeteer is able to capture the web screen properly.
Puppeteer vs Selenium
The comparison between Puppeteer and Selenium as an automation testing tool are explained below –
- Programming language support – Puppeteer supports only JavaScript, where Selenium support Java, Python, Node.js, and C# languages.
- Browser Support – Puppeteer is applicable only for Chrome or Chromium browser, but Selenium supports Chrome, Mozilla, Safari, IE, Opera browsers as well.
- Community Support – Community support restricted to Google Groups, GitHub, and Stack Overflow for the Puppeteer. But for Selenium, wide community support over multiple forums is available.
- Execution Speed – Execution of Puppeteer script is faster than Selenium.
- Installation and Setup – Puppeteer installation and setup is a more easy and simple process.
- Cross-Platform Support – Puppeteer does not support it, but Selenium can.
- Recording – Recording features are not available in Puppeteer. But this feature is available for Selenium IDE.
- Screenshots – Puppeteer has the capability to take a screenshot as an image or pdf format, where Selenium can support only image format.
- Testing Platform Support – Puppeteer only supports web browsers, but Selenium can automate web and mobile with Appium.
- Coding Skills – It is required for Puppeteer Selenium Web driver but not for Selenium IDE.
Based on the above comparison, we can conclude that Puppeteer will make the best choice when we have to perform unit level testing for any web application where a fast and flexible solution is required. The other tool, Selenium will be the better choice when there is a need for mobile application and cross-platform application testing. Click here to learn Selenium from LambdaGeeks.
Conclusion:
In this introductory article on Puppeteer Tutorial, we have learned about Puppeteer Web Scraping and Puppeteer Test Automation overview. We will learn about the step by step guide to install Puppeteer and execute a small script in the next Puppeteer article. Please click here to visit the reference portal for this Puppeteer Tutorial.
Hi, I am K. Mondal, I am associated with a leading organization. I am having 12+ years of working experience across domains e.g., application development, automation testing, IT Consultant. I am very much interested in learning different technologies. I am here to fulfill my aspiration and currently contributing as an Author and Website Developer both in LambdaGeeks.
Lets Connect through LinkedIn-