Puppeteer is an open-source node js library and is used a web automation as well as web scraping tool. You need the basic understanding of Javascript, and HTML DOM structure to start working with Puppeteer. This Puppeteer tutorial series is distributed in the below segments which will equip you with all the necessary experience to start working with Puppeteer.
Puppeteer Tutorial
Tosca Tutorial #1: Puppeteer Overview
Tosca Tutorial #2: Puppeteer Environment Variables
Tosca Tutorial #3: Puppeteer Web Scraping and Puppeteer Test Automation Overview
Tosca Tutorial #4: Install Puppeteer
Tosca Tutorial #5: Sample Puppeteer Project
Tosca Tutorial #6: Puppeteer Automation Testing
Tosca Tutorial #7: Puppeteer Class
Tosca Tutorial #8: Puppeteer Browser Class
Tosca Tutorial #9: Puppeteer Page Class
In this “Puppeteer Browser Class” tutorial, we will have in depth understanding further about the below mentioned classes which consists of the important namespaces, events, and other exhaustive methods that are needed to work with Puppeteer web scraping techniques.
Puppeteer BrowserFetcher Class
Puppeteer BrowserFetcher Class is used to download and manage the different browser versions. BrowserFetcher class operates on a revision string that specifies the version of the chrome browser. The revision number can be obtained from here. In the case of Firefox, it downloads the browser nightly based on the version number.
Below example shows how to download and launch the chrome browser using BrowserFetcher class.
const browserFetcher = puppeteer.createBrowserFetcher(); const revInfo = await browserFetcher.download('766890'); const browserChrome= await puppeteer.launch({executablePath: revInfo.executablePath})
It is not possible to work simultaneously with another instance of BrowserFetcher class. The frequently used methods of BrowserFetcher class are explained in the next sections.
Puppeteer BrowserFetcher Class – Methods:
Below methods are available in puppeteer browserfetcher class,
browserFetcher.canDownload(revision) – With the help of the revision number of the browser, this method checks the availability of the specified browser as a part of the header request. The method returns the boolean value(true or false) based on availability.
const boolVar = browserFetcher.canDownload(‘766890’);
browserFetcher.download(revision[, progressCallback]) – This method downloads the chrome browser using the revision number argument. Here progressCallback is an optional argument that calls the function with two arguments – downloaded bytes and total bytes. This method returns the revision information as a promise object.
const revInfo = browserFetcher.download(‘766890’);
browserFetcher.host() – It returns the hostname, which is used for downloading of browser.
const hostName = browserFetcher.host();
browserFetcher.localRevisions() – It returns the list of all revisions which are available in the local system.
const revList = browserFetcher.localRevisions();
browserFetcher.platform() – It returns the platform name of the host, which will be any of the mac, Linux, win32, or win64.
const platformName = browserFetcher.platform();
browserFetcher.product() – It returns the browser name which will be either chrome or firefox
const productName = browserFetcher.product();
browserFetcher.remove(revision) – This method is used to remove the specified revision for the current product/browser. It returns the promise object, which is resolved after completion of the process.
const revInfo = browserFetcher.remove(‘766890’);
browserFetcher.revisionInfo(revision) – It will return an object on revision information which includes revision, folderPath, executablePath, url, local, and product.
const revInfo = browserFetcher.revisionInfo(‘766890’);
Reference: Click here to learn more on BrowserFetcher Class methods.
Puppeteer Browser Class
The Puppeteer Browser class is created when the puppeteer launched or connected the browser using puppeteer.launch or puppeteer.connect methods.
Below example shows how to create the Browser class and Page using the browser reference.
const puppeteer = require('puppeteer'); (async () => { const browserChrome = await puppeteer.launch(); const pageChrome = await browserChrome.newPage(); await pageChrome.goto('https://www.google.com'); await browserChrome.close(); })();
The frequently used events and methods of Browser class are explained in the next section.
Puppeteer Browser Class – Events:
Below events are available in browser class,
- browser.on(‘disconnected’) – This event is triggered when the browser is closed/crashed or browser.disconnect method is called.
- browser.on(‘targetchanged’) – This event is triggered when the url of the target has changed.
- browser.on(‘targetcreated’) – This event is triggered when the new page opened in a new tab or window by the method browser.newPage or window.open.
- browser.on(‘targetdestroyed’) – This event is triggered when the target is destroyed, i.e., the page is closed.
Puppeteer Browser Class – Methods:
Below methods are available in browser class,
- browser.browserContexts() – It returns the list of all browser contexts. For a newly launched browser, this method will return the single BrowserContext instance.
- browser.close() – This method is used to close all the open chromium-browser pages.
await browser.close();
- browser.createIncognitoBrowserContext() – It creates/returns the incognito browser context, which will never share the cookies or cache with any other browser contexts. In the below example, the web page(google) will be opened in incognito mode.
(async () => {
const chromeBrowser = await puppeteer.launch();
// Create new incognito browser context.
const context = await chromeBrowser.createIncognitoBrowserContext();
const pageChrome = await context.newPage();
await pageChrome.goto(‘https://www.google.com’);
})();
- browser.defaultBrowserContext() – It returns default browser context which can not be destroyed or closed.
- browser.disconnect() – It will disconnect the browser from the puppeteer. But, the browser will remain running in this case.
- browser.isConnected() – This method checks if the browser is connected or not. It will return boolean values based on the check.
const boolFlag = await browser.isConnected();
- browser.newPage() – This method will create a new page and return the instance of the page.
const page = await browser.newPage();
- browser.pages() – This method returns the list of all pages which are currently in the open state.
const pageList = await browser.pages();
- browser.process() – This method returns the created browser process. If the browser is created using browser.connect method, and it will return a null value.
- browser.target() – This method returns the target associated with the browser.
const target = await browser.target();
- browser.targets() – It returns the list of all active targets within the browser.
const targetList = await browser.targets();
- browser.userAgent() – It returns the promise object about the original agent of the browser.
- browser.version() – It returns the version of the browser in the format of ‘HeadlessChrome/xx.x.xxxx.x’ for headless chrome and ‘Chrome/xx.x.xxxx.x’ for non headless chrome. The format can change in a future release.
- browser.waitForTarget(predicate[, options]) – It will search in all the browser contexts and wait for the target.
await pageChrome.evaluate(() => window.open(‘https://themachine.science/’));
const newWindowTarget = await browser.waitForTarget(target => target.url() === ‘https://themachine.science/’);
- browser.wsEndpoint() – It returns the web socket url of the browser.
const wsUrl = await browser.wsEndPoint();
Reference: Click here to learn more on Browser class events and methods.
Puppeteer BrowserContext Class
The BrowserContext class helps to operate multiple browser instances. After launching a browser instance, by default, a single BrowserContext is used. The browserChrome.newPage() method creates a page in the default BrowserContext class object. If a web page invokes another page, then the new page should belong to the browsercontext of the parent page. Here, the new page can be created using the window.open() method.
In the below example, Puppeteer has the ability to create a browser context in ‘incognito’ mode. The ‘incognito’ browser context does not write any data in the storage.
// Incognito browser context creation const contextIncognito = await browserChrome.createIncognitoBrowserContext(); // New page creation through the browser context. const pageChrome = await contextIncognito.newPage(); await pageChrome.goto('https://www.google.com'); //close context after use await contextIncognito.close();
The frequently used events and methods of BrowserContext class are explained in the next section.
Puppeteer BrowserContext Class – Events:
Below events are available in browsercontext class,
- browserContext.on(targetchanged) – This event is triggered when the url of the target within the browser context has changed.
- browserContext.on(targetcreated) – This event is triggered after creation of inside the browser context. The methods window.open and browserContext.newPage are responsible for this event.
- browserContext.on(‘targetdestroyed’) – This event is triggered when the target is destroyed within the browser context.
Puppeteer BrowserContext Class – Methods:
Below methods are available in browsercontext class,
- browserContext.browser() – This method returns the browser object which is available within the browser context.
- browserContext.clearPermissionOverrides() – This method removes all permission overrides from the browser context. The below example shows how to use this method –
const browserContext = browser.defaultBrowserContext();
browserContext.overridePermissions(‘https://www.google.com’, [‘clipboard-read’]);
browserContext.clearPermissionOverrides();
- browserContext.close() – This method is used to close or destroy the browser context. All the browsers available within the browser context will be closed.
browserContext.close();
- browserContext.isIncognito() – This method is used to check if the browser has been created in ‘incognito’ mode or not. It returns a boolean value(true – incognito mode or false – non-incognito mode) based on the browser mode. By default, any browser is invoked in ‘non-incognito’ mode.
const boolIsIncognito = browserContext.isIncognito();
- browserContext.newPage() – This method is used to create a new page in the same browsercontext.
browserContext.newPage();
- browserContext.overridePermissions(origin, permission) – This method is used to grant the specified permission to the origin, i.e., the target url. The different permissions which are available to grant are –
- ‘geolocation’
- ‘midi-sysex’ (system-exclusive midi)
- ‘midi’
- ‘push’
- ‘camera’
- ‘notifications’
- ‘microphone’
- ‘ambient-light-sensor’
- ‘accelerometer’
- ‘background-sync’
- ‘gyroscope’
- ‘accessibility-events’
- ‘clipboard-read’
- ‘magnetometer’
- ‘clipboard-write’
- ‘payment-handler’
The below example shows how to grant permission –
const browserContext = browser.defaultBrowserContext();
await browserContext.overridePermissions(‘https://www.google.com’, [‘geolocation’]);
- browserContext.pages() – This method returns the list of all the open pages available in the browser context. Any non-visible page will not be listed here.
const openPageList = browserContext.pages();
- browserContext.targets() – This method returns the list of all the active targets available in the browser context. Any non-visible page will not be listed here.
const activeTargetList = browserContext.targets();
- browserContext.waitForTarget(predicate[, options]) – This method is used to wait for a target to have appeared and returned the target object. The argument, ‘predicate’ is basically a function call for each of the targets. Also, optionally, we can pass some configuration values such as timeout as a second argument.
await pageChrome.evaluate(() => window.open('https://www.google.com/')); const newWindowTarget = await browserContext.waitForTarget(target => target.url() === 'https://www.google.com/');
Reference: Click here to read more on BrowserContext class events and methods.
Conclusion:
In this “Puppeteer Browser Class” tutorial, we have explained the BrowserFetcher class, BrowserContext class, and Browser class which includes the important namespaces(if any), events(if any), and methods that are frequently used in Puppeteer web scraping techniques with examples. In the next article, we will explain Page, Frame, and Dialog class.