Retrieve all pages from a crawl job, including the AI-extracted data and temporary signed URLs to the original HTML and markdown content.
Once a crawl job completes, this endpoint returns every page that was processed, along with the data extracted by the AI. Each page record includes the original URL, extraction status, and the structuredDocumentation Index
Fetch the complete documentation index at: https://docs.spidra.io/llms.txt
Use this file to discover all available pages before exploring further.
data field containing whatever your transformInstruction asked for.
The response also includes time-limited signed URLs pointing to the raw HTML and markdown files stored in Spidra’s object storage. These URLs are valid for one hour.
| Field | Type | Description |
|---|---|---|
pages | array | List of all pages processed by this job |
pages[].id | string | Unique page ID. Use this when calling POST /crawl//retry/ |
pages[].url | string | The URL of this specific page |
pages[].title | string | Page title as detected during crawling |
pages[].status | string | success, failed, or pending |
pages[].data | object or string | The AI-extracted data for this page. The shape matches your transformInstruction |
pages[].error_message | string or null | Error details if status is failed |
pages[].html_url | string or null | Signed URL to the raw HTML file (valid for 1 hour) |
pages[].markdown_url | string or null | Signed URL to the markdown version of the page (valid for 1 hour) |
pages[].created_at | string | ISO 8601 timestamp when this page was processed |
status: "failed" still appear in the response so you have a full picture of what was and was not processed. You can use the error_message field to understand what went wrong. To re-run extraction on a specific failed page, use the retry endpoint available on your account.