Queue URLs for scraping with optional browser actions and AI extraction
jobId back immediately. You then poll GET /scrape/{jobId} until status is completed and results are ready.
jobId in the response right awayprompt is providedGET /scrape/{jobId} until status: "completed"output: "json" without a prompt still triggers a default AI extraction pass. If you want raw markdown with no AI processing, omit both output and prompt.When AI extraction fails (for example, on a near-empty page), Spidra falls back to returning the raw page markdown in markdownContent. Check the ai_extraction_failed flag in the response to detect this case and handle degraded results in your code.schema to tell the AI exactly what shape to return. Instead of getting whatever JSON the AI decides to produce, you get back a JSON object that matches your schema every time. Nullable fields come back as null rather than being omitted. Field names match exactly what you defined.
output is automatically set to "json" when a schema is provided. The schema is validated before the job is queued and a 422 is returned with descriptive errors if the schema is malformed. Non-fatal issues (unsupported keywords) are returned as schema_warnings in the job status response.
| Action | What it does | Quick example |
|---|---|---|
click | Clicks any element on the page, buttons, links, tabs, toggles | {"type": "click", "selector": "#load-more"} |
type | Types text into an input field or search box | {"type": "type", "selector": "#search", "value": "laptops"} |
check | Checks a checkbox | {"type": "check", "selector": "#in-stock-only"} |
uncheck | Unchecks a checkbox | {"type": "uncheck", "selector": "#newsletter"} |
wait | Pauses for a number of milliseconds | {"type": "wait", "duration": 2000} |
scroll | Scrolls the page to a percentage of its height | {"type": "scroll", "to": "80%"} |
forEach | Finds all matching elements and processes each one individually. Supports navigate, click, and inline modes. | {"type": "forEach", "observe": "Find all product cards", "mode": "navigate"} |
forEach is the most powerful action. It finds a set of repeating elements (product cards, links, accordion rows) and runs a mini-scrape on each one. It supports three modes (click, inline, navigate), automatic pagination, per-item AI extraction, and per-element sub-actions.
"useProxy": true and optionally add "proxyCountry" to target a specific location.
| Option | Description |
|---|---|
screenshot: true | Capture the visible viewport |
fullPageScreenshot: true | Capture the entire scrollable page (requires screenshot: true) |
screenshots array of the response.
Array of URLs to scrape (1-3 URLs per request)
1 - 3 elementsOptional LLM prompt for extracting or transforming the scraped content
Output format for the extracted content
json, markdown Enable stealth mode with proxy rotation to avoid detection
Country code (e.g., 'us', 'uk', 'de') or region ('global', 'asia', 'eu') for geo-targeted proxy routing. Requires useProxy: true
Session cookies for authenticated scraping. Supports standard format (name=value; name2=value2) or raw Chrome DevTools paste format
Capture a screenshot of each page after scraping
Capture full page screenshot instead of just the viewport. Requires screenshot: true
Remove headers, footers, navigation, and other non-content elements from the scraped output
JSON Schema object describing the exact shape of the AI output. When provided, the AI must return JSON matching this schema. Output is automatically set to 'json'. Root must be type 'object'. Maximum nesting depth: 5. Maximum size: 10KB.