> ## Documentation Index
> Fetch the complete documentation index at: https://docs.spidra.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Get Scrape Job Status

> Poll for job progress and results

## Polling Pattern

Scrape jobs are processed asynchronously. When you submit a job you get a `jobId` back immediately. You then poll this endpoint every 2-5 seconds until `status` is `completed` or `failed`.

```javascript theme={null}
async function waitForResult(jobId) {
  while (true) {
    const res = await fetch(`https://api.spidra.io/api/scrape/${jobId}`, {
      headers: { 'x-api-key': 'YOUR_API_KEY' }
    });
    const data = await res.json();

    if (data.status === 'completed') return data.result;
    if (data.status === 'failed') throw new Error(data.error);

    await new Promise(r => setTimeout(r, 3000));
  }
}
```

***

## Status Values

| Status      | Meaning                             |
| ----------- | ----------------------------------- |
| `waiting`   | In queue, not started yet           |
| `active`    | Running right now                   |
| `completed` | Done, results are ready             |
| `failed`    | Something went wrong, check `error` |

***

## Response Structure

When `status` is `completed`, everything you need is inside `result`.

```json theme={null}
{
  "status": "completed",
  "progress": {
    "message": "Scrape completed successfully",
    "progress": 1
  },
  "result": {
    "content": "...",
    "data": [
      {
        "url": "https://example.com",
        "title": "Example Domain",
        "markdownContent": "...",
        "success": true,
        "screenshotUrl": null
      }
    ],
    "screenshots": [],
    "ai_extraction_failed": false,
    "stats": {
      "durationMs": 4200,
      "captchaSolvedCount": 0,
      "inputTokens": 312,
      "outputTokens": 84,
      "totalTokens": 396
    }
  },
  "error": null
}
```

### result.content

This is the main output field. What it contains depends on whether you provided a `prompt`:

* **With `prompt`**: the AI-extracted result, formatted according to `output` (`"markdown"` or `"json"`)
* **Without `prompt`**: the raw scraped page content as markdown

If AI extraction fails for any reason, `content` still returns the raw markdown as a fallback, and `ai_extraction_failed` is set to `true` so you can detect this.

### result.data

An array with one entry per URL you submitted. Each entry contains:

| Field             | Description                                                                                                                                                     |
| ----------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `url`             | The URL that was scraped                                                                                                                                        |
| `title`           | The page title from the browser                                                                                                                                 |
| `markdownContent` | The full raw scraped content for this URL as markdown. If you used `forEach`, this contains all the collected items formatted as `## Item 1`, `## Item 2`, etc. |
| `success`         | `true` if the page was scraped successfully, `false` if it failed                                                                                               |
| `screenshotUrl`   | URL to the screenshot on S3, or `null` if you did not request one                                                                                               |

### result.stats

Timing and usage information for the job.

| Field                | Description                                       |
| -------------------- | ------------------------------------------------- |
| `durationMs`         | How long the whole job took in milliseconds       |
| `captchaSolvedCount` | Number of CAPTCHAs that were automatically solved |
| `inputTokens`        | Tokens sent to the AI model                       |
| `outputTokens`       | Tokens returned from the AI model                 |
| `totalTokens`        | Total tokens used (input + output)                |

***

## Failed Jobs

When `status` is `failed`, the `error` field contains the reason:

```json theme={null}
{
  "status": "failed",
  "error": "Failed to scrape https://example.com — net::ERR_NAME_NOT_RESOLVED"
}
```


## OpenAPI

````yaml GET /scrape/{jobId}
openapi: 3.1.0
info:
  title: Spidra API
  version: 1.0.0
  description: >-
    Public API endpoints for web scraping via Spidra. Authentication is via API
    key passed in the `x-api-key` header.
servers:
  - url: https://api.spidra.io/api
security:
  - ApiKeyAuth: []
paths:
  /scrape/{jobId}:
    get:
      tags:
        - Scraping
      summary: Get Scrape Job Status
      parameters:
        - name: jobId
          in: path
          required: true
          schema:
            type: string
          description: The job ID returned from POST /scrape
      responses:
        '200':
          description: Job status and results
          content:
            application/json:
              schema:
                type: object
                properties:
                  status:
                    $ref: '#/components/schemas/JobStatus'
                  progress:
                    type: object
                    properties:
                      message:
                        type: string
                        description: Human-readable progress message
                      progress:
                        type: number
                        minimum: 0
                        maximum: 1
                        description: Progress from 0 to 1
                  result:
                    $ref: '#/components/schemas/ScrapeResult'
                    nullable: true
                    description: Present only when status is 'completed'
                  error:
                    type: string
                    nullable: true
                    description: Error message if status is 'failed'
              examples:
                in_progress:
                  summary: Job in progress
                  value:
                    status: active
                    progress:
                      message: Processing content with AI...
                      progress: 0.6
                    result: null
                    error: null
                completed:
                  summary: Job completed
                  value:
                    status: completed
                    progress:
                      message: Scrape completed successfully
                      progress: 1
                    result:
                      content:
                        heading: Example Domain
                        paragraph: This domain is for use in examples.
                      screenshots:
                        - https://storage.spidra.io/screenshots/abc123.png
                      stats:
                        durationMs: 5420
                        captchaSolvedCount: 0
                        inputTokens: 1234
                        outputTokens: 89
                        totalTokens: 1323
                    error: null
        '403':
          description: Not authorized to access this job
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
              example:
                status: error
                message: You do not have permission to access this job.
        '404':
          description: Job not found
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
              example:
                status: error
                message: Scrape not found
components:
  schemas:
    JobStatus:
      type: string
      enum:
        - waiting
        - active
        - completed
        - failed
        - delayed
      description: Current status of the scrape job
    ScrapeResult:
      type: object
      properties:
        content:
          type:
            - string
            - object
          description: Scraped content (or AI-processed content if prompt was provided)
        screenshots:
          type: array
          items:
            type: string
            format: uri
          description: URLs of page screenshots
        stats:
          type: object
          properties:
            durationMs:
              type: number
              description: Total scrape duration in milliseconds
            captchaSolvedCount:
              type: number
              description: Number of CAPTCHAs solved
            inputTokens:
              type: number
              description: LLM input tokens used
            outputTokens:
              type: number
              description: LLM output tokens used
            totalTokens:
              type: number
              description: Total LLM tokens used
    ErrorResponse:
      type: object
      properties:
        status:
          type: string
          enum:
            - error
        message:
          type: string
      required:
        - status
        - message
  securitySchemes:
    ApiKeyAuth:
      type: apiKey
      in: header
      name: x-api-key

````