All examples on this page work against books.toscrape.com and quotes.toscrape.com, two public sites built for scraping practice. Copy any example, paste it into the API, and it will work as shown.
1. Simple Page Scraping: No Actions Needed
When the content you want is fully visible on one page, you do not need actions at all. Just send the URL with a prompt and let the AI extract what you need.
Use this when: the list is short, fits on one page, and does not require any clicking.
{
"urls": [{ "url": "https://quotes.toscrape.com" }],
"prompt": "List every quote and its author. Return as a JSON array: [{quote, author}]",
"output": "json"
}
Response:
{
"content": [
{ "quote": "The world as we have created it is a process of our thinking...", "author": "Albert Einstein" },
{ "quote": "It is our choices, Harry, that show what we truly are...", "author": "J.K. Rowling" },
...
]
}
The page has 10 quotes. A prompt handles all 10 easily without any forEach. If the same site had 200 quotes across 20 pages, that is when you would reach for forEach with pagination.
2. Dismiss a banner, then scrape
Some pages load a cookie banner or modal that blocks the content. Click it away before the scrape runs.
Use this when: a consent banner or popup is covering the content.
{
"urls": [{
"url": "https://example.com/products",
"actions": [
{ "type": "click", "value": "Accept all cookies button" },
{ "type": "wait", "duration": 800 }
]
}],
"prompt": "List all product names and prices"
}
3. Search, wait for results, then scrape
Type a search query and scrape the results page.
Use this when: the content only appears after submitting a search form.
{
"urls": [{
"url": "https://example.com",
"actions": [
{ "type": "type", "selector": "input[name='q']", "value": "wireless headphones" },
{ "type": "click", "selector": "button[type='submit']" },
{ "type": "wait", "duration": 2000 }
]
}],
"prompt": "Extract every product name, price, and rating from the search results"
}
4. Inline forEach: Collect a List with itemPrompt
Process every card on a page without navigating away from it. Each card’s content is read directly and passed through itemPrompt for AI extraction.
Use this when: all the data is visible on the listing page itself and you want a clean, structured result per item.
{
"urls": [{
"url": "https://books.toscrape.com/catalogue/category/books/mystery_3/index.html",
"actions": [{
"type": "forEach",
"observe": "Find all book cards in the product grid",
"mode": "inline",
"captureSelector": "article.product_pod",
"maxItems": 10,
"itemPrompt": "Extract the book title, price, and star rating (One/Two/Three/Four/Five). Return as JSON: {title, price, star_rating}"
}]
}]
}
Response (markdownContent):
## Item 1
{"title": "Sharp Objects", "price": "£47.82", "star_rating": "Four"}
---
## Item 2
{"title": "In a Dark, Dark Wood", "price": "£19.63", "star_rating": "One"}
---
## Item 3
{"title": "When We Collided", "price": "£31.77", "star_rating": "One"}
When a listing spans multiple pages, use pagination to keep collecting after the first page is exhausted.
Use this when: the catalogue has a “Next” button and you need more items than fit on one page.
{
"urls": [{
"url": "https://quotes.toscrape.com",
"actions": [{
"type": "forEach",
"observe": "Find all quote blocks on the page",
"mode": "inline",
"captureSelector": ".quote",
"maxItems": 30,
"itemPrompt": "Extract the quote text and author name. Return as JSON: {quote, author}",
"pagination": {
"nextSelector": "li.next > a",
"maxPages": 3
}
}]
}]
}
This collects up to 30 quotes, following the Next link across up to 3 extra pages. It stops as soon as it hits 30 total or runs out of pages.
Response (markdownContent):
## Item 1
{"quote": "The world as we have created it is a process of our thinking...", "author": "Albert Einstein"}
---
## Item 2
{"quote": "It is our choices, Harry, that show what we truly are...", "author": "J.K. Rowling"}
---
...
## Item 30
{"quote": "...", "author": "..."}
6. Navigate Mode: Follow Links to Detail Pages
Click into each product page to capture the rich detail that only exists there. This gives you descriptions, specifications, availability, and anything else that is not on the listing page.
Use this when: the listing page only shows a preview and the full content is on the individual item page.
{
"urls": [{
"url": "https://books.toscrape.com/catalogue/category/books/mystery_3/index.html",
"actions": [{
"type": "forEach",
"observe": "Find all book title links in the product grid",
"mode": "navigate",
"captureSelector": "article.product_page",
"maxItems": 6,
"waitAfterClick": 800,
"itemPrompt": "Extract the book title, price, star rating (One through Five), and whether it is in stock. Return as JSON: {title, price, star_rating, availability}"
}]
}]
}
Response (markdownContent):
## Item 1
{
"title": "Sharp Objects",
"price": "£47.82",
"star_rating": "Four",
"availability": "In stock"
}
---
## Item 2
{
"title": "In a Dark, Dark Wood",
"price": "£19.63",
"star_rating": "One",
"availability": "In stock"
}
Navigate mode loads a full page per item, so it is slower than inline. Keep maxItems reasonable (6–10) unless you have a lot of time to spare.
7. Click a category first, then forEach navigate
Use an action to navigate to a specific section of the site before the forEach starts. The forEach runs on whatever page the browser is on after your pre-actions finish.
Use this when: the items you want are behind a category link, tab, or filter on the homepage.
{
"urls": [{
"url": "https://books.toscrape.com",
"actions": [
{ "type": "click", "selector": "a[href='catalogue/category/books/travel_2/index.html']" },
{
"type": "forEach",
"observe": "Find all book title links in the product grid",
"mode": "navigate",
"captureSelector": "article.product_page",
"maxItems": 4,
"waitAfterClick": 800,
"itemPrompt": "Extract the book title and price. Return as JSON: {title, price}"
}
]
}]
}
Response (markdownContent):
## Item 1
{"title": "It's Only the Himalayas", "price": "£45.17"}
---
## Item 2
{"title": "Full Moon over Noah's Ark", "price": "£49.43"}
When you do not know the exact CSS selector for a navigation element, describe it in plain English using the value field on a click action. Spidra uses AI to locate the element and click it, then the forEach runs on the resulting page.
Use this when: the element you need to click does not have a stable CSS selector or is easier to describe in words.
{
"urls": [{
"url": "https://books.toscrape.com",
"actions": [
{ "type": "click", "value": "Science category in the left sidebar" },
{
"type": "forEach",
"observe": "Find all product cards",
"mode": "inline",
"captureSelector": "article.product_pod",
"maxItems": 20,
"itemPrompt": "Extract book title and price. Return as JSON: {title, price}",
"pagination": {
"nextSelector": "li.next > a",
"maxPages": 1
}
}
]
}]
}
9. Scrape multiple categories in one request
Pass up to 3 URLs in a single request. Spidra processes all of them in parallel and returns one result per URL.
Use this when: you need data from several categories, brands, or pages at the same time.
{
"urls": [
{
"url": "https://books.toscrape.com/catalogue/category/books/mystery_3/index.html",
"actions": [{
"type": "forEach",
"observe": "Find all book cards",
"mode": "inline",
"captureSelector": "article.product_pod",
"maxItems": 4,
"itemPrompt": "Return JSON: {title, price, category: 'Mystery'}"
}]
},
{
"url": "https://books.toscrape.com/catalogue/category/books/travel_2/index.html",
"actions": [{
"type": "forEach",
"observe": "Find all book cards",
"mode": "inline",
"captureSelector": "article.product_pod",
"maxItems": 4,
"itemPrompt": "Return JSON: {title, price, category: 'Travel'}"
}]
},
{
"url": "https://books.toscrape.com/catalogue/category/books/historical-fiction_4/index.html",
"actions": [{
"type": "forEach",
"observe": "Find all book cards",
"mode": "inline",
"captureSelector": "article.product_pod",
"maxItems": 4,
"itemPrompt": "Return JSON: {title, price, category: 'Historical Fiction'}"
}]
}
]
}
The response data array has three entries, one per URL, each with their own list of books. The whole thing runs in parallel, not sequentially.
10. Navigate + per-item scroll to reveal hidden content
After navigating to each item’s page, scroll down before capturing. Some pages lazy-load their content or hide a full description below the fold.
Use this when: the destination page has content that only appears after scrolling, such as full product descriptions, reviews, and spec tables.
{
"urls": [{
"url": "https://books.toscrape.com",
"actions": [
{ "type": "click", "selector": "a[href='catalogue/category/books/poetry_23/index.html']" },
{
"type": "forEach",
"observe": "Find all book title links in the product grid",
"mode": "navigate",
"captureSelector": "article.product_page",
"maxItems": 3,
"waitAfterClick": 1000,
"actions": [
{ "type": "scroll", "to": "50%" }
],
"itemPrompt": "Extract the book title, price, star rating (One through Five), and the full product description paragraph. Return as JSON: {title, price, star_rating, description}"
}
]
}]
}
What happens step by step:
- Opens the homepage
- Clicks the Poetry category link
- Finds all book title links
- For each book (up to 3): navigates to the book page, waits 1 second, scrolls to 50% to load the full description, captures the product area, runs AI extraction
- Combines all results into numbered items
Response (markdownContent):
## Item 1
{
"title": "Poetry Unbound: 50 Poems to Open Your World",
"price": "£23.00",
"star_rating": "Five",
"description": "Selected and introduced by Padraig O Tuama, this anthology brings together 50 poems that crack open the world..."
}
11. Click Mode: Expand Items and Capture Modal Content
For pages where clicking an item opens a modal, drawer, or expanded section (such as hotel room cards or FAQ accordions), use click mode to open each one, capture its content, and move on.
Use this when: the detail content only appears after clicking an element on the page, inside the same page.
{
"urls": [{
"url": "https://hotels.example.com/hotel/grand-plaza",
"actions": [{
"type": "forEach",
"observe": "Find all room type cards",
"mode": "click",
"captureSelector": "[role='dialog']",
"maxItems": 8,
"waitAfterClick": 1200,
"itemPrompt": "Extract the room name, bed type, price per night, and list of amenities. Return as JSON: {room, bed_type, price_per_night, amenities}"
}]
}]
}
Response (markdownContent):
## Item 1
{
"room": "Deluxe King Room",
"bed_type": "1 King Bed",
"price_per_night": "$189",
"amenities": ["Free WiFi", "City view", "Air conditioning", "Mini bar"]
}
---
## Item 2
{
"room": "Standard Twin Room",
"bed_type": "2 Twin Beds",
"price_per_night": "$129",
"amenities": ["Free WiFi", "Garden view", "Air conditioning"]
}
If no modal appears after clicking, Spidra falls back to capturing the full page.
12. Full Pipeline: Navigate, Paginate, and Extract
Combine forEach pagination with a top-level prompt to get a final clean output. itemPrompt handles per-item extraction during scraping. The top-level prompt does a final pass to restructure all items together.
Use this when: you want structured data from a multi-page catalogue in one clean API response.
{
"urls": [{
"url": "https://books.toscrape.com/catalogue/category/books/mystery_3/index.html",
"actions": [{
"type": "forEach",
"observe": "Find all book title links in the product grid",
"mode": "navigate",
"captureSelector": "article.product_page",
"maxItems": 20,
"waitAfterClick": 800,
"itemPrompt": "Extract title, price, star rating (One through Five), availability. Return as JSON: {title, price, star_rating, availability}",
"pagination": {
"nextSelector": "li.next > a",
"maxPages": 2
}
}]
}],
"prompt": "Return a clean JSON array of all books. Sort by price ascending.",
"output": "json"
}
itemPrompt runs on each book page as it is scraped. When all 20 are collected, the top-level prompt takes the combined output and sorts the final list.
Response:
{
"content": [
{ "title": "...", "price": "£13.99", "star_rating": "Three", "availability": "In stock" },
{ "title": "...", "price": "£15.00", "star_rating": "Five", "availability": "In stock" },
...
]
}
Choosing the right pattern
| Scenario | Pattern to use |
|---|
| Short list, all on one page | Top-level prompt only, no actions needed |
| Content behind a cookie banner or login | click pre-action to dismiss, then scrape |
| Long list across multiple pages | inline forEach with pagination |
| Detail data only on individual item pages | navigate mode forEach |
| Content hidden behind a click/expand | click mode forEach |
| Category navigation before scraping | click pre-action (CSS selector or plain English), then forEach |
| Multiple categories at once | Up to 3 URLs in one request, each with forEach |
| Lazy-loaded or below-the-fold content | Per-element scroll action inside forEach |
| Large dataset with structured field extraction | itemPrompt on each item + optional top-level prompt for final shaping |