puppeteer/docs/guides/query-selectors.mdx

214 lines
6.3 KiB
Plaintext
Raw Normal View History

# Query Selectors
Queries are the primary mechanism for interacting with the DOM on your site. For example, a typical workflow goes like:
```ts
// Import puppeteer
import puppeteer from 'puppeteer';
(async () => {
// Launch the browser
const browser = await puppeteer.launch();
// Create a page
const page = await browser.newPage();
// Go to your site
await page.goto('YOUR_SITE');
// Query for an element handle.
const element = await page.waitForSelector('div > .class-name');
// Do something with element...
await element.click(); // Just an example.
// Dispose of handle
await element.dispose();
// Close browser.
await browser.close();
})();
```
2023-04-18 19:14:45 +00:00
## `P` Selectors
2023-04-18 19:14:45 +00:00
Puppeteer uses a superset of the CSS selector syntax for querying. We call this syntax _P selectors_ and it's supercharged with extra capabilities such as deep combinators and text selection.
2023-04-18 19:14:45 +00:00
:::caution
2023-04-18 19:14:45 +00:00
Although P selectors look like real CSS selectors (we intentionally designed it this way), they should not be used for actually CSS styling. They are designed only for Puppeteer.
2023-04-18 19:14:45 +00:00
:::
2023-04-18 19:14:45 +00:00
:::note
2023-04-18 19:14:45 +00:00
P selectors only work on the first "depth" of selectors; for example, `:is(div >>> a)` will not work.
2023-04-18 19:14:45 +00:00
:::
2023-04-18 19:14:45 +00:00
### `>>>` and `>>>>` combinators
2023-04-18 19:14:45 +00:00
The `>>>` and `>>>>` are called _deep descendent_ and _deep_ combinators respectively. Both combinators have the effect of going into shadow hosts with `>>>` going into every shadow host under a node and `>>>>` going into the immediate one (if the node is a shadow host; otherwise, it's a no-op).
2023-04-18 19:14:45 +00:00
:::note
2023-04-18 19:14:45 +00:00
A common question is when should `>>>>` be chosen over `>>>` considering the flexibility of `>>>`. A similar question can be asked about `>` and a space; choose `>` if you do not need to query all elements under a given node and a space otherwise. This answer extends to `>>>>` (`>`) and `>>>` (space) naturally.
:::
#### Example
2023-04-18 19:14:45 +00:00
Suppose we have the markup
```html
<custom-element>
<template shadowrootmode="open">
<slot></slot>
</template>
<custom-element>
<template shadowrootmode="open">
<slot></slot>
</template>
<custom-element>
<template shadowrootmode="open">
<slot></slot>
</template>
<h2>Light content</h2>
</custom-element>
</custom-element>
</custom-element>
```
2023-06-06 12:39:54 +00:00
> Note: `<template shadowrootmode="open">` is not supported on Firefox.
> You can read more about it [here](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/template#attributes).
2023-04-18 19:14:45 +00:00
Then `custom-element >>> h2` will return `h2`, but `custom-element >>>> h2` will return nothing since the inner `h2` is in a deeper shadow root.
2023-04-18 19:14:45 +00:00
### `P`-elements
2023-04-18 19:14:45 +00:00
`P` elements are [pseudo-elements](https://developer.mozilla.org/en-US/docs/Web/CSS/Pseudo-elements) with a `-p` vendor prefix. It allows you to enhance your selectors with Puppeteer-specific query engines such as XPath, text queries, and ARIA.
#### Text selectors (`-p-text`)
Text selectors will select "minimal" elements containing the given text, even within (open) shadow roots. Here, "minimum" means the deepest elements that contain a given text, but not their parents (which technically will also contain the given text).
##### Example
```ts
2023-04-18 19:14:45 +00:00
const element = await page.waitForSelector('div ::-p-text(My name is Jun)');
// You can also use escapes.
const element = await page.waitForSelector(
':scope >>> ::-p-text(My name is Jun \\(pronounced like "June"\\))'
);
// or quotes
const element = await page.waitForSelector(
'div >>>> ::-p-text("My name is Jun (pronounced like \\"June\\")"):hover'
);
```
2023-04-18 19:14:45 +00:00
#### XPath selectors (`-p-xpath`)
2023-04-18 19:14:45 +00:00
XPath selectors will use the browser's native [`Document.evaluate`](https://developer.mozilla.org/en-US/docs/Web/API/Document/evaluate) to query for elements.
2023-04-18 19:14:45 +00:00
##### Example
2023-04-18 19:14:45 +00:00
```ts
const element = await page.waitForSelector('::-p-xpath(h2)');
```
2023-04-18 19:14:45 +00:00
#### ARIA selectors (`-p-aria`)
2023-04-18 19:14:45 +00:00
ARIA selectors can be used to find elements with a given ARIA label. These labels are computed using Chrome's internal representation.
2023-04-18 19:14:45 +00:00
##### Example
2023-04-18 19:14:45 +00:00
```ts
const node = await page.waitForSelector('::-p-aria(Submit)');
const node = await page.waitForSelector(
'::-p-aria([name="Click me"][role="button"])'
);
```
2023-04-18 19:14:45 +00:00
### Custom selectors
2023-04-18 19:14:45 +00:00
Puppeteer provides users the ability to add their own query selectors to Puppeteer using [Puppeteer.registerCustomQueryHandler](../api/puppeteer.registercustomqueryhandler.md). This is useful for creating custom selectors based on framework objects or other vendor-specific objects.
#### Custom Selectors
You can register a custom query handler that allows you to create custom selectors. For example, define a query handler for `getById` selectors:
```ts
Puppeteer.registerCustomQueryHandler('getById', {
queryOne: (elementOrDocument, selector) => {
return elementOrDocument.querySelector(`[id="${CSS.escape(selector)}"]`);
},
// Note: for demonstation perpose only `id` should be page unique
queryAll: (elementOrDocument, selector) => {
return elementOrDocument.querySelectorAll(`[id="${CSS.escape(selector)}"]`);
},
});
```
You can now use it as following:
```ts
const node = await page.waitForSelector('::-p-getById(elementId)');
// OR used in conjunction with other selectors
const moreSpecificNode = await page.waitForSelector(
'.side-bar ::-p-getById(elementId)'
);
```
#### Custom framework components selector
:::caution
Be careful when relying on internal APIs of libraries or frameworks. They can change at any time.
:::
Find Vue components by name by using Vue internals for querying:
```ts
Puppeteer.registerCustomQueryHandler('vue', {
queryOne: (element, name) => {
const walker = document.createTreeWalker(element, NodeFilter.SHOW_ELEMENT);
do {
const currentNode = walker.currentNode;
if (
currentNode.__vnode?.ctx?.type?.name.toLowerCase() ===
name.toLocaleLowerCase()
) {
return currentNode;
}
} while (walker.nextNode());
return null;
},
});
```
Query the Vue component as following:
```ts
const element = await page.$('::-p-vue(MyComponent)');
```
#### Web Components
Web Components create their own tag so you can query them by the tag name:
```ts
const element = await page.$('my-web-component');
```
Extend `HTMLElementTagNameMap` to define types for custom tags. This allows Puppeteer to infer the return type for the ElementHandle:
2023-04-18 19:14:45 +00:00
```ts
declare global {
interface HTMLElementTagNameMap {
'my-web-component': MyWebComponent;
}
}
2023-04-18 19:14:45 +00:00
```