Retrieve all pages from a crawl job, including the AI-extracted data and temporary signed URLs to the original HTML and markdown content.
data field containing whatever your transformInstruction asked for.
The response also includes time-limited signed URLs pointing to the raw HTML and markdown files stored in Spidra’s object storage. These URLs are valid for one hour.
| Field | Type | Description |
|---|---|---|
pages | array | List of all pages processed by this job |
pages[].id | string | Unique page ID. Use this when calling POST /crawl//retry/ |
pages[].url | string | The URL of this specific page |
pages[].title | string | Page title as detected during crawling |
pages[].status | string | success, failed, or pending |
pages[].data | object or string | The AI-extracted data for this page. The shape matches your transformInstruction |
pages[].error_message | string or null | Error details if status is failed |
pages[].html_url | string or null | Signed URL to the raw HTML file (valid for 1 hour) |
pages[].markdown_url | string or null | Signed URL to the markdown version of the page (valid for 1 hour) |
pages[].created_at | string | ISO 8601 timestamp when this page was processed |
status: "failed" still appear in the response so you have a full picture of what was and was not processed. You can use the error_message field to understand what went wrong. To re-run extraction on a specific failed page, use the retry endpoint available on your account.