> ## Documentation Index
> Fetch the complete documentation index at: https://docs.spidra.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Batch Scrape

> Scrape up to 50 URLs in a single request with full per-item results, credit tracking, and retry support

Batch scraping lets you queue up to **50 URLs** in one API call. Each URL is processed independently and in parallel. You get back per-item results with status, content, credits used, and timestamps — all under a single `batchId`.

Use batch scraping when:

* You have a list of product, article, or listing URLs to extract
* You want one API call per dataset rather than managing dozens of individual jobs
* You need to retry only the URLs that failed without re-running the whole set

***

## How It Works

<Steps>
  <Step title="Submit">
    Send a `POST /api/batch/scrape` with your URL list and extraction options. You get a `batchId` back immediately — the job is queued.
  </Step>

  <Step title="Process">
    Spidra processes each URL independently using a real browser. CAPTCHA solving, proxy routing, and AI extraction all run per-item.
  </Step>

  <Step title="Poll">
    Call `GET /api/batch/scrape/{batchId}` every few seconds. The response includes live progress counters (`completedCount`, `failedCount`) and per-item results.
  </Step>

  <Step title="Handle failures">
    If any items fail, call `POST /api/batch/scrape/{batchId}/retry`. Only the failed items are re-queued — successful ones are untouched.
  </Step>
</Steps>

***

## Quick Start

<CodeGroup>
  ```bash cURL theme={null}
  # 1. Submit
  curl -X POST https://api.spidra.io/api/batch/scrape \
    -H "Authorization: Bearer YOUR_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "urls": [
        "https://example.com/product/1",
        "https://example.com/product/2",
        "https://example.com/product/3"
      ],
      "prompt": "Extract the product name, price, and availability",
      "output": "json"
    }'

  # Response:
  # { "status": "queued", "batchId": "abc-123", "total": 3 }

  # 2. Poll until done
  curl https://api.spidra.io/api/batch/scrape/abc-123 \
    -H "Authorization: Bearer YOUR_API_KEY"
  ```

  ```javascript Node.js theme={null}
  async function scrapeProducts(urls) {
    // Submit
    const submit = await fetch("https://api.spidra.io/api/batch/scrape", {
      method: "POST",
      headers: {
        Authorization: "Bearer YOUR_API_KEY",
        "Content-Type": "application/json",
      },
      body: JSON.stringify({
        urls,
        prompt: "Extract the product name, price, and availability",
        output: "json",
      }),
    });

    const { batchId } = await submit.json();

    // Poll
    while (true) {
      const status = await fetch(
        `https://api.spidra.io/api/batch/scrape/${batchId}`,
        { headers: { Authorization: "Bearer YOUR_API_KEY" } }
      );
      const data = await status.json();

      if (["completed", "failed", "cancelled"].includes(data.status)) {
        return data.items;
      }

      console.log(`${data.completedCount}/${data.totalUrls} done...`);
      await new Promise((r) => setTimeout(r, 3000));
    }
  }
  ```

  ```python Python theme={null}
  import time
  import requests

  BASE = "https://api.spidra.io/api"
  HEADERS = {"Authorization": "Bearer YOUR_API_KEY"}

  def scrape_products(urls):
      # Submit
      resp = requests.post(
          f"{BASE}/batch/scrape",
          headers=HEADERS,
          json={
              "urls": urls,
              "prompt": "Extract the product name, price, and availability",
              "output": "json",
          },
      )
      batch_id = resp.json()["batchId"]

      # Poll
      while True:
          status = requests.get(
              f"{BASE}/batch/scrape/{batch_id}", headers=HEADERS
          ).json()

          if status["status"] in ("completed", "failed", "cancelled"):
              return status["items"]

          print(f"{status['completedCount']}/{status['totalUrls']} done...")
          time.sleep(3)
  ```
</CodeGroup>

***

## Polling Pattern

Batch jobs are asynchronous. Poll `GET /api/batch/scrape/{batchId}` every **2–5 seconds** until `status` is a terminal value.

| `status`    | Meaning                                                         |
| ----------- | --------------------------------------------------------------- |
| `pending`   | Queued, no items have started yet                               |
| `running`   | At least one item is being processed                            |
| `completed` | All items finished (some may have failed — check `failedCount`) |
| `failed`    | The entire batch failed unexpectedly                            |
| `cancelled` | You cancelled it via `DELETE /api/batch/scrape/{batchId}`       |

<Note>
  `completed` does not mean every URL succeeded. A batch is `completed` when all items have reached a terminal state (`completed` or `failed`). Always check `failedCount` and inspect individual item statuses.
</Note>

***

## Per-Item Results

Each item in the `items` array represents one URL:

```json theme={null}
{
  "uuid": "item-uuid",
  "url": "https://example.com/product/1",
  "jobId": "bull-job-id",
  "status": "completed",
  "result": { "name": "Widget Pro", "price": 49.99, "available": true },
  "error": null,
  "creditsUsed": 3,
  "startedAt": "2024-01-15T10:00:01Z",
  "finishedAt": "2024-01-15T10:00:06Z",
  "screenshotUrl": null
}
```

| Field           | Description                                                                    |
| --------------- | ------------------------------------------------------------------------------ |
| `uuid`          | Unique ID for this batch item                                                  |
| `url`           | The URL that was scraped                                                       |
| `status`        | `pending`, `running`, `completed`, or `failed`                                 |
| `result`        | Extracted content (object if JSON, string if markdown). `null` until completed |
| `error`         | Error message if `status` is `failed`, otherwise `null`                        |
| `creditsUsed`   | Credits consumed by this item. `0` for failed items                            |
| `startedAt`     | When the worker picked up this item                                            |
| `finishedAt`    | When this item reached a terminal state                                        |
| `screenshotUrl` | S3 URL if `screenshot: true` was set, otherwise `null`                         |

***

## Structured Output

Pass a `schema` to enforce a specific output shape across all URLs in the batch. The AI will return JSON matching your schema for every item.

```json theme={null}
{
  "urls": [
    "https://shop.example.com/item/1",
    "https://shop.example.com/item/2"
  ],
  "prompt": "Extract the product details",
  "schema": {
    "type": "object",
    "required": ["name", "price"],
    "properties": {
      "name":      { "type": "string" },
      "price":     { "type": "number" },
      "currency":  { "type": ["string", "null"] },
      "available": { "type": ["boolean", "null"] }
    }
  }
}
```

When a `schema` is provided, `output` is automatically set to `"json"`. The schema is validated before the batch is queued — a `422` is returned if it is malformed.

<Card title="Structured Output Guide" icon="brackets-curly" href="/features/structured-output">
  Full guide on nested objects, arrays, nullable fields, and schema limits
</Card>

***

## Retrying Failed Items

When a batch completes with some failures, retry only those items — no need to re-run the whole batch:

<CodeGroup>
  ```bash cURL theme={null}
  curl -X POST https://api.spidra.io/api/batch/scrape/abc-123/retry \
    -H "Authorization: Bearer YOUR_API_KEY"

  # Response:
  # { "status": "queued", "retriedCount": 2 }
  ```

  ```javascript Node.js theme={null}
  const res = await fetch(
    `https://api.spidra.io/api/batch/scrape/${batchId}/retry`,
    { method: "POST", headers: { Authorization: "Bearer YOUR_API_KEY" } }
  );
  const { retriedCount } = await res.json();
  console.log(`${retriedCount} items re-queued`);
  ```

  ```python Python theme={null}
  resp = requests.post(
      f"{BASE}/batch/scrape/{batch_id}/retry",
      headers=HEADERS,
  )
  print(f"{resp.json()['retriedCount']} items re-queued")
  ```
</CodeGroup>

The batch status resets to `running` and you poll the same `batchId` until it completes again. Successfully completed items are never touched.

***

## Cancelling a Batch

Cancel a running or pending batch to stop processing and refund credits for items that have not started yet:

```bash theme={null}
curl -X DELETE https://api.spidra.io/api/batch/scrape/abc-123 \
  -H "Authorization: Bearer YOUR_API_KEY"
```

```json theme={null}
{
  "status": "cancelled",
  "cancelledItems": 8,
  "creditsRefunded": 16
}
```

Items already running will complete normally. Only pending items are cancelled and refunded.

***

## Proxy & Geo-Targeting

Apply stealth proxy routing to every URL in the batch with `useProxy` and `proxyCountry`:

```json theme={null}
{
  "urls": ["https://amazon.de/dp/B123", "https://amazon.de/dp/B456"],
  "prompt": "Extract price and availability",
  "output": "json",
  "useProxy": true,
  "proxyCountry": "de"
}
```

<Card title="Stealth Mode & Geo-Targeting" icon="shield-halved" href="/features/stealth-mode">
  Full country list, EU rotation, and billing details
</Card>

***

## Cookies & Authenticated Pages

Pass session cookies to scrape pages behind a login. Cookies are never stored — they are passed ephemerally to the worker and discarded after processing.

```json theme={null}
{
  "urls": [
    "https://app.example.com/reports/q1",
    "https://app.example.com/reports/q2"
  ],
  "cookies": "session=eyJ...; auth_token=abc123",
  "prompt": "Extract the report summary",
  "output": "json"
}
```

<Card title="Authenticated Scraping" icon="lock-open" href="/features/authenticated-scraping">
  Full guide on obtaining and formatting cookies
</Card>

***

<CardGroup cols={2}>
  <Card title="Submit a Batch" icon="plus" href="/api-reference/scraping/batch-scrape">
    Full request reference
  </Card>

  <Card title="Get Batch Status" icon="magnifying-glass" href="/api-reference/scraping/batch-scrape-status">
    Polling and response shape
  </Card>

  <Card title="List Batches" icon="list" href="/api-reference/scraping/batch-scrape-list">
    See all your batch jobs
  </Card>

  <Card title="Cancel & Retry" icon="rotate" href="/api-reference/scraping/batch-scrape-cancel">
    Stop a batch or re-run failures
  </Card>
</CardGroup>