> ## Documentation Index
> Fetch the complete documentation index at: https://docs.spidra.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Get Crawl Job Details

> Retrieve the full configuration and statistics for a crawl job, including the instructions used, token consumption, and credit cost.

Returns the complete record for a crawl job: the configuration you submitted, current status, token usage, and credit cost. It does not return the extracted page data. For the actual content from each page, use [GET /crawl/{jobId}/pages](/api-reference/crawling/crawl-pages).

## When to Use This Endpoint

* Inspect the instructions and options that were used for a job
* Check token and credit costs for accounting or reporting
* Confirm job configuration before re-running extraction with [POST /crawl/{jobId}/extract](/api-reference/crawling/crawl-extract)

## Example Request

<CodeGroup>
  ```bash cURL theme={null}
  curl https://api.spidra.io/api/crawl/job/abc-123 \
    -H "Authorization: Bearer YOUR_API_KEY"
  ```

  ```python Python theme={null}
  import requests

  response = requests.get(
      "https://api.spidra.io/api/crawl/job/abc-123",
      headers={"Authorization": "Bearer YOUR_API_KEY"}
  )
  ```

  ```javascript Node.js theme={null}
  const response = await fetch("https://api.spidra.io/api/crawl/job/abc-123", {
    headers: { Authorization: "Bearer YOUR_API_KEY" }
  });
  ```
</CodeGroup>

## Response Fields

### Job configuration

| Field                   | Type              | Description                                                                                                                             |
| ----------------------- | ----------------- | --------------------------------------------------------------------------------------------------------------------------------------- |
| `id`                    | string            | Unique job identifier.                                                                                                                  |
| `base_url`              | string            | The starting URL that was crawled.                                                                                                      |
| `crawl_instruction`     | string            | The instruction used to decide which pages to follow.                                                                                   |
| `transform_instruction` | string or null    | The prompt used to extract data from each page. `null` if extraction used a schema or if no extraction was configured.                  |
| `output_schema`         | object or null    | The JSON Schema used for structured extraction. `null` if a transform instruction was used instead, or if no extraction was configured. |
| `max_pages`             | integer           | The page limit set for this job.                                                                                                        |
| `max_depth`             | integer or null   | The link depth limit. `null` if not set (unlimited).                                                                                    |
| `include_paths`         | string\[] or null | Path patterns that were included. `null` if not set.                                                                                    |
| `exclude_paths`         | string\[] or null | Path patterns that were excluded. `null` if not set.                                                                                    |
| `allow_subdomains`      | boolean           | Whether subdomain links were followed.                                                                                                  |
| `crawl_entire_domain`   | boolean           | Whether all paths on the root domain were eligible.                                                                                     |
| `ignore_query_params`   | boolean           | Whether URLs differing only by query string were deduplicated.                                                                          |
| `webhook_url`           | string or null    | The webhook URL that received per-page events, if configured.                                                                           |

### Status and stats

| Field           | Type           | Description                                                                                  |
| --------------- | -------------- | -------------------------------------------------------------------------------------------- |
| `status`        | string         | `waiting`, `active`, `completed`, `failed`, or `cancelled`.                                  |
| `pages_crawled` | integer        | Number of pages successfully processed.                                                      |
| `input_tokens`  | number or null | Total input tokens consumed by AI across all pages. `null` when no AI extraction was used.   |
| `output_tokens` | number or null | Total output tokens generated by AI across all pages. `null` when no AI extraction was used. |
| `credits_used`  | number or null | Total credits charged for this job.                                                          |
| `created_at`    | string         | ISO 8601 timestamp when the job was created.                                                 |
| `updated_at`    | string         | ISO 8601 timestamp of the last status change.                                                |

## Example Response

```json theme={null}
{
  "id": "abc-123",
  "base_url": "https://example.com/blog",
  "crawl_instruction": "Crawl all blog post pages",
  "transform_instruction": "Extract title, author, publish date, and the first 200 words of the body",
  "output_schema": null,
  "max_pages": 10,
  "max_depth": 2,
  "include_paths": ["/blog/*"],
  "exclude_paths": ["/blog/tag/*"],
  "allow_subdomains": false,
  "crawl_entire_domain": false,
  "ignore_query_params": true,
  "webhook_url": null,
  "status": "completed",
  "pages_crawled": 8,
  "input_tokens": 14820,
  "output_tokens": 3210,
  "credits_used": 25,
  "created_at": "2025-12-17T15:00:00Z",
  "updated_at": "2025-12-17T15:03:42Z"
}
```

<Note>
  For the actual extracted content from each page, call [GET /crawl/{jobId}/pages](/api-reference/crawling/crawl-pages).
</Note>


## OpenAPI

````yaml GET /crawl/job/{jobId}
openapi: 3.1.0
info:
  title: Spidra API
  version: 1.0.0
  description: >-
    Public API endpoints for web scraping via Spidra. Authenticate with
    `Authorization: Bearer YOUR_API_KEY`.
servers:
  - url: https://api.spidra.io/api
security:
  - BearerAuth: []
  - ApiKeyAuth: []
paths:
  /crawl/job/{jobId}:
    get:
      tags:
        - Crawling
      summary: Get Crawl Job Details
      parameters:
        - name: jobId
          in: path
          required: true
          schema:
            type: string
      responses:
        '200':
          description: Job details
          content:
            application/json:
              schema:
                type: object
                properties:
                  id:
                    type: string
                  base_url:
                    type: string
                  crawl_instruction:
                    type: string
                  transform_instruction:
                    type: string
                    nullable: true
                  output_schema:
                    type: object
                    nullable: true
                    description: >-
                      The JSON Schema used for structured extraction, if one was
                      provided.
                  max_pages:
                    type: integer
                  max_depth:
                    type: integer
                    nullable: true
                  include_paths:
                    type: array
                    items:
                      type: string
                    nullable: true
                  exclude_paths:
                    type: array
                    items:
                      type: string
                    nullable: true
                  allow_subdomains:
                    type: boolean
                  crawl_entire_domain:
                    type: boolean
                  ignore_query_params:
                    type: boolean
                  webhook_url:
                    type: string
                    nullable: true
                  pages_crawled:
                    type: integer
                  status:
                    type: string
                    enum:
                      - waiting
                      - active
                      - completed
                      - failed
                      - cancelled
                  created_at:
                    type: string
                    format: date-time
                  updated_at:
                    type: string
                    format: date-time
                  input_tokens:
                    type: number
                    nullable: true
                  output_tokens:
                    type: number
                    nullable: true
                  credits_used:
                    type: number
                    nullable: true
              example:
                id: abc-123
                base_url: https://example.com/blog
                crawl_instruction: Crawl all blog post pages
                transform_instruction: Extract title, author, and publish date
                output_schema: null
                max_pages: 10
                max_depth: 2
                include_paths:
                  - /blog/*
                exclude_paths:
                  - /blog/tag/*
                allow_subdomains: false
                crawl_entire_domain: false
                ignore_query_params: true
                webhook_url: null
                status: completed
                pages_crawled: 8
                input_tokens: 14820
                output_tokens: 3210
                credits_used: 25
                created_at: '2025-12-17T15:00:00Z'
                updated_at: '2025-12-17T15:03:42Z'
        '401':
          description: Invalid or missing API key
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
              example:
                status: error
                message: Access token invalid or expired
        '404':
          description: Job not found
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
              example:
                status: error
                message: Crawl job not found
components:
  schemas:
    ErrorResponse:
      type: object
      properties:
        status:
          type: string
          enum:
            - error
        message:
          type: string
      required:
        - status
        - message
  securitySchemes:
    BearerAuth:
      type: http
      scheme: bearer
      bearerFormat: API key
    ApiKeyAuth:
      type: apiKey
      in: header
      name: x-api-key

````