> ## Documentation Index
> Fetch the complete documentation index at: https://docs.spidra.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Get Crawled Pages

> Retrieve all pages from a crawl job, including the AI-extracted data and temporary signed URLs to the original HTML and markdown content.

Once a crawl job completes, this endpoint returns every page that was processed, along with the data extracted by the AI. Each page record includes the original URL, extraction status, and the structured `data` field containing whatever your `transformInstruction` asked for.

The response also includes time-limited signed URLs pointing to the raw HTML and markdown files stored in Spidra's object storage. These URLs are valid for one hour.

## Example Request

<CodeGroup>
  ```bash cURL theme={null}
  curl https://api.spidra.io/api/crawl/abc-123/pages \
    -H "x-api-key: YOUR_API_KEY"
  ```

  ```python Python theme={null}
  import requests

  response = requests.get(
      "https://api.spidra.io/api/crawl/abc-123/pages",
      headers={"x-api-key": "YOUR_API_KEY"}
  )
  ```

  ```javascript Node.js theme={null}
  const response = await fetch("https://api.spidra.io/api/crawl/abc-123/pages", {
    headers: { "x-api-key": "YOUR_API_KEY" }
  });
  ```
</CodeGroup>

## Response Fields

| Field                   | Type             | Description                                                                                                          |
| ----------------------- | ---------------- | -------------------------------------------------------------------------------------------------------------------- |
| `pages`                 | array            | List of all pages processed by this job                                                                              |
| `pages[].id`            | string           | Unique page ID. Use this when calling [POST /crawl/{jobId}/retry/{pageId}](/api-reference/crawling/crawl-retry-page) |
| `pages[].url`           | string           | The URL of this specific page                                                                                        |
| `pages[].title`         | string           | Page title as detected during crawling                                                                               |
| `pages[].status`        | string           | `success`, `failed`, or `pending`                                                                                    |
| `pages[].data`          | object or string | The AI-extracted data for this page. The shape matches your `transformInstruction`                                   |
| `pages[].error_message` | string or null   | Error details if `status` is `failed`                                                                                |
| `pages[].html_url`      | string or null   | Signed URL to the raw HTML file (valid for 1 hour)                                                                   |
| `pages[].markdown_url`  | string or null   | Signed URL to the markdown version of the page (valid for 1 hour)                                                    |
| `pages[].created_at`    | string           | ISO 8601 timestamp when this page was processed                                                                      |

## Example Response

```json theme={null}
{
  "pages": [
    {
      "id": "page-uuid-1",
      "url": "https://example.com/blog/how-to-scrape",
      "title": "How to Scrape the Web Without Getting Blocked",
      "status": "success",
      "data": {
        "title": "How to Scrape the Web Without Getting Blocked",
        "author": "Jane Smith",
        "published": "2025-11-20",
        "summary": "A guide to rotating proxies and handling JavaScript-heavy pages."
      },
      "error_message": null,
      "html_url": "https://storage.spidra.io/signed/...",
      "markdown_url": "https://storage.spidra.io/signed/...",
      "created_at": "2025-12-17T15:02:10Z"
    },
    {
      "id": "page-uuid-2",
      "url": "https://example.com/blog/javascript-rendering",
      "title": "JavaScript Rendering Explained",
      "status": "failed",
      "data": null,
      "error_message": "AI transformation failed: content too short to extract",
      "html_url": null,
      "markdown_url": null,
      "created_at": "2025-12-17T15:02:55Z"
    }
  ]
}
```

## Handling Failed Pages

Pages with `status: "failed"` still appear in the response so you have a full picture of what was and was not processed. You can use the `error_message` field to understand what went wrong. To re-run extraction on a specific failed page, use the retry endpoint available on your account.

<Tip>
  The signed URLs for `html_url` and `markdown_url` expire after one hour. If you need permanent access to the raw content, download it promptly or use [Download Crawl Results](/api-reference/crawling/crawl-download) to get a full ZIP archive.
</Tip>


## OpenAPI

````yaml GET /crawl/{jobId}/pages
openapi: 3.1.0
info:
  title: Spidra API
  version: 1.0.0
  description: >-
    Public API endpoints for web scraping via Spidra. Authentication is via API
    key passed in the `x-api-key` header.
servers:
  - url: https://api.spidra.io/api
security:
  - ApiKeyAuth: []
paths:
  /crawl/{jobId}/pages:
    get:
      tags:
        - Crawling
      summary: Get Crawled Pages
      parameters:
        - name: jobId
          in: path
          required: true
          schema:
            type: string
      responses:
        '200':
          description: List of crawled pages with extracted data
          content:
            application/json:
              schema:
                type: object
                properties:
                  pages:
                    type: array
                    items:
                      type: object
                      properties:
                        id:
                          type: string
                        url:
                          type: string
                        title:
                          type: string
                        status:
                          type: string
                          enum:
                            - success
                            - failed
                            - pending
                        data:
                          type: object
                          description: Extracted data from this page
                        error_message:
                          type: string
                          nullable: true
                        created_at:
                          type: string
                          format: date-time
              example:
                pages:
                  - id: page-1
                    url: https://example.com/blog/post-1
                    title: First Post
                    status: success
                    data:
                      title: First Post
                      author: John
                      date: '2025-01-01'
                    error_message: null
                    created_at: '2025-12-17T15:00:00Z'
        '401':
          description: Invalid or missing API key
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
              example:
                status: error
                message: Access token invalid or expired
        '403':
          description: Not authorized to access this job
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
              example:
                status: error
                message: Unauthorized access or job not found
components:
  schemas:
    ErrorResponse:
      type: object
      properties:
        status:
          type: string
          enum:
            - error
        message:
          type: string
      required:
        - status
        - message
  securitySchemes:
    ApiKeyAuth:
      type: apiKey
      in: header
      name: x-api-key

````