> ## Documentation Index
> Fetch the complete documentation index at: https://docs.spidra.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Submit a Batch Scrape Job

> Queue up to 50 URLs for parallel scraping in a single request

## How It Works

Batch scrape jobs are asynchronous. Submitting returns a `batchId` immediately. Each URL is processed in parallel by independent workers.

1. **Submit** — Send your URL list. Receive `batchId` in the response.
2. **Process** — Each URL is opened in a real browser, CAPTCHAs solved, content extracted.
3. **Poll** — Call `GET /api/batch/scrape/{batchId}` every 2–5 seconds until `status` is terminal.

<Note>
  Credits are reserved upfront when you submit. The final amount is reconciled per item once processing completes.
</Note>

***

## Minimal Example

<CodeGroup>
  ```bash cURL theme={null}
  curl -X POST https://api.spidra.io/api/batch/scrape \
    -H "Authorization: Bearer YOUR_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "urls": ["https://example.com/page-1", "https://example.com/page-2"],
      "prompt": "Extract the headline and summary",
      "output": "json"
    }'
  ```

  ```javascript Node.js theme={null}
  const res = await fetch("https://api.spidra.io/api/batch/scrape", {
    method: "POST",
    headers: {
      Authorization: "Bearer YOUR_API_KEY",
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      urls: ["https://example.com/page-1", "https://example.com/page-2"],
      prompt: "Extract the headline and summary",
      output: "json",
    }),
  });

  const { batchId, total } = await res.json();
  // batchId → poll GET /api/batch/scrape/{batchId}
  ```

  ```python Python theme={null}
  import requests

  resp = requests.post(
      "https://api.spidra.io/api/batch/scrape",
      headers={"Authorization": "Bearer YOUR_API_KEY"},
      json={
          "urls": ["https://example.com/page-1", "https://example.com/page-2"],
          "prompt": "Extract the headline and summary",
          "output": "json",
      },
  )
  batch_id = resp.json()["batchId"]
  ```
</CodeGroup>

**Response `202 Accepted`:**

```json theme={null}
{
  "status": "queued",
  "batchId": "f3a2b1c0-0000-0000-0000-000000000000",
  "total": 2
}
```

The `Location` response header is also set to `/api/batch/scrape/{batchId}` for convenience.

***

## With Structured Output

Pass a `schema` to receive a consistent JSON shape for every item:

```json theme={null}
{
  "urls": [
    "https://shop.example.com/item/100",
    "https://shop.example.com/item/101"
  ],
  "prompt": "Extract the product details",
  "schema": {
    "type": "object",
    "required": ["name", "price"],
    "properties": {
      "name":      { "type": "string" },
      "price":     { "type": "number" },
      "currency":  { "type": ["string", "null"] },
      "available": { "type": ["boolean", "null"] }
    }
  }
}
```

When `schema` is provided, `output` is automatically forced to `"json"`. Non-fatal schema issues are returned as `schema_warnings` in the submission response.

<Tip>
  Use the [Spidra JSON Schema Generator](https://spidra.io/tools/json-schema-generator) to build and preview your schema visually before pasting it here.
</Tip>

***

## With Proxy

```json theme={null}
{
  "urls": ["https://amazon.de/dp/B123", "https://amazon.de/dp/B456"],
  "prompt": "Extract price and availability",
  "output": "json",
  "useProxy": true,
  "proxyCountry": "de"
}
```

***

## With Screenshots

```json theme={null}
{
  "urls": ["https://example.com"],
  "screenshot": true,
  "fullPageScreenshot": true
}
```

Screenshot URLs are returned in each item's `screenshotUrl` field once processing is complete.

***

## Request Body

| Field                | Type                     | Required | Default  | Description                                                                                                          |
| -------------------- | ------------------------ | -------- | -------- | -------------------------------------------------------------------------------------------------------------------- |
| `urls`               | `string[]`               | Yes      | —        | URLs to scrape. **1–50 URLs per request.** Must be `http://` or `https://`. Private/internal IPs are rejected.       |
| `prompt`             | `string`                 | No       | —        | AI extraction instruction applied to every URL in the batch                                                          |
| `output`             | `"json"` \| `"markdown"` | No       | `"json"` | Output format for extracted content. Automatically `"json"` when `schema` is set                                     |
| `schema`             | `object`                 | No       | —        | JSON Schema object that constrains the AI output. Validated before queuing — returns `422` if invalid                |
| `useProxy`           | `boolean`                | No       | `false`  | Route each URL through residential stealth proxies                                                                   |
| `proxyCountry`       | `string`                 | No       | —        | ISO country code (`"us"`, `"de"`, `"gb"`) or region (`"eu"`, `"global"`). Requires `useProxy: true`                  |
| `extractContentOnly` | `boolean`                | No       | `false`  | Strip navigation, headers, and sidebars — keeps only the main content                                                |
| `cookies`            | `string`                 | No       | —        | Session cookies for authenticated pages. **Never persisted to the database** — passed ephemerally to the worker only |
| `screenshot`         | `boolean`                | No       | `false`  | Capture a viewport screenshot of each page                                                                           |
| `fullPageScreenshot` | `boolean`                | No       | `false`  | Capture the full scrollable page. Requires `screenshot: true`                                                        |

***

## Response

| Field             | Type       | Description                                                                              |
| ----------------- | ---------- | ---------------------------------------------------------------------------------------- |
| `status`          | `"queued"` | Always `"queued"` on a successful submission                                             |
| `batchId`         | `string`   | UUID — use this to poll status and manage the batch                                      |
| `total`           | `number`   | Number of URLs accepted into the batch                                                   |
| `schema_warnings` | `string[]` | Non-fatal schema issues (e.g., unsupported keywords). Only present if there are warnings |

***

## Errors

| Code  | Reason                                                                                                     |
| ----- | ---------------------------------------------------------------------------------------------------------- |
| `400` | `urls` is missing, not an array, empty, or exceeds 50 items                                                |
| `401` | Missing API key authentication header                                                                      |
| `402` | Payment overdue — update your payment method                                                               |
| `403` | Monthly credit limit reached                                                                               |
| `422` | One or more URLs are invalid, or `schema` is malformed. An `errors` array is returned with per-URL details |
| `429` | More than 20 batch submissions per minute                                                                  |

**Validation error example:**

```json theme={null}
{
  "status": "error",
  "message": "Request validation failed. Fix the errors below and try again.",
  "errors": [
    "URL 2: \"ftp://example.com\" is not a valid URL — must use http or https",
    "URL 4: private and internal URLs are not allowed"
  ]
}
```

***

<CardGroup cols={2}>
  <Card title="Get Batch Status" icon="magnifying-glass" href="/api-reference/scraping/batch-scrape-status">
    Poll for results
  </Card>

  <Card title="Batch Scraping Guide" icon="book" href="/features/batch-scraping">
    Full feature walkthrough
  </Card>
</CardGroup>


## OpenAPI

````yaml POST /batch/scrape
openapi: 3.1.0
info:
  title: Spidra API
  version: 1.0.0
  description: >-
    Public API endpoints for web scraping via Spidra. Authenticate with
    `Authorization: Bearer YOUR_API_KEY`.
servers:
  - url: https://api.spidra.io/api
security:
  - BearerAuth: []
  - ApiKeyAuth: []
paths:
  /batch/scrape:
    post:
      tags:
        - Batch Scraping
      summary: Submit a Batch Scrape Job
      description: >-
        Queue up to 50 URLs for parallel scraping. Returns a batchId immediately
        — poll GET /batch/scrape/{batchId} for results.
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              required:
                - urls
              properties:
                urls:
                  type: array
                  minItems: 1
                  maxItems: 50
                  items:
                    type: string
                    format: uri
                  description: >-
                    URLs to scrape. 1–50 per request. Must be http:// or
                    https://. Private/internal IPs are rejected.
                prompt:
                  type: string
                  description: AI extraction instruction applied to every URL in the batch
                output:
                  type: string
                  enum:
                    - json
                    - markdown
                  default: json
                  description: >-
                    Output format. Automatically set to 'json' when schema is
                    provided
                schema:
                  type: object
                  description: >-
                    JSON Schema that constrains AI output shape. Validated
                    before queuing — returns 422 if invalid
                useProxy:
                  type: boolean
                  default: false
                  description: >-
                    Route each URL through residential stealth proxies. Usage is
                    billed from your bandwidth quota.
                proxyCountry:
                  type: string
                  description: >-
                    ISO country code (e.g. 'us', 'de', 'gb') or region ('eu',
                    'global'). Requires useProxy: true
                crawlerMode:
                  type: string
                  default: default
                  description: 'Browser rendering mode: ''default'', ''fast'', or ''ai'''
                extractContentOnly:
                  type: boolean
                  default: false
                  description: >-
                    Strip navigation, headers, and sidebars — keep only the main
                    content
                cookies:
                  type: string
                  description: >-
                    Session cookies for authenticated pages. Never persisted —
                    passed ephemerally to the worker
                screenshot:
                  type: boolean
                  default: false
                  description: Capture a viewport screenshot of each page
                fullPageScreenshot:
                  type: boolean
                  default: false
                  description: 'Capture the full scrollable page. Requires screenshot: true'
            example:
              urls:
                - https://example.com/product/1
                - https://example.com/product/2
              prompt: Extract the product name, price, and availability
              output: json
      responses:
        '202':
          description: Batch accepted and queued
          headers:
            Location:
              description: URL to poll for batch status
              schema:
                type: string
          content:
            application/json:
              schema:
                type: object
                properties:
                  status:
                    type: string
                    enum:
                      - queued
                  batchId:
                    type: string
                    format: uuid
                  total:
                    type: integer
                  schema_warnings:
                    type: array
                    items:
                      type: string
                    description: >-
                      Non-fatal schema issues. Only present if there are
                      warnings
              example:
                status: queued
                batchId: f3a2b1c0-0000-0000-0000-000000000000
                total: 2
        '400':
          description: urls is missing, not an array, empty, or exceeds 50 items
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '401':
          description: Missing API key authentication header
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '402':
          description: Payment overdue
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '403':
          description: Monthly credit limit reached
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
              example:
                status: error
                message: You have exceeded your monthly credit limit.
        '422':
          description: One or more URLs are invalid, or schema is malformed
          content:
            application/json:
              schema:
                allOf:
                  - $ref: '#/components/schemas/ErrorResponse'
                  - type: object
                    properties:
                      errors:
                        type: array
                        items:
                          type: string
              example:
                status: error
                message: Request validation failed. Fix the errors below and try again.
                errors:
                  - >-
                    URL 2: "ftp://example.com" is not a valid URL — must use
                    http or https
                  - 'URL 4: private and internal URLs are not allowed'
        '429':
          description: Rate limit exceeded — more than 20 batch submissions per minute
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
      security:
        - BearerAuth: []
        - ApiKeyAuth: []
components:
  schemas:
    ErrorResponse:
      type: object
      properties:
        status:
          type: string
          enum:
            - error
        message:
          type: string
      required:
        - status
        - message
  securitySchemes:
    BearerAuth:
      type: http
      scheme: bearer
      bearerFormat: API key
    ApiKeyAuth:
      type: apiKey
      in: header
      name: x-api-key

````