Batch Scrape - Spidra

Batch scraping lets you queue up to 50 URLs in one API call. Each URL is processed independently and in parallel. You get back per-item results with status, content, credits used, and timestamps — all under a single batchId. Use batch scraping when:

You have a list of product, article, or listing URLs to extract
You want one API call per dataset rather than managing dozens of individual jobs
You need to retry only the URLs that failed without re-running the whole set

How It Works

Submit

Send a POST /api/batch/scrape with your URL list and extraction options. You get a batchId back immediately — the job is queued.

Process

Spidra processes each URL independently using a real browser. CAPTCHA solving, proxy routing, and AI extraction all run per-item.

Poll

Call GET /api/batch/scrape/{batchId} every few seconds. The response includes live progress counters (completedCount, failedCount) and per-item results.

Handle failures

If any items fail, call POST /api/batch/scrape/{batchId}/retry. Only the failed items are re-queued — successful ones are untouched.

Quick Start

# 1. Submit
curl -X POST https://api.spidra.io/api/batch/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [
      "https://example.com/product/1",
      "https://example.com/product/2",
      "https://example.com/product/3"
    ],
    "prompt": "Extract the product name, price, and availability",
    "output": "json"
  }'

# Response:
# { "status": "queued", "batchId": "abc-123", "total": 3 }

# 2. Poll until done
curl https://api.spidra.io/api/batch/scrape/abc-123 \
  -H "Authorization: Bearer YOUR_API_KEY"

async function scrapeProducts(urls) {
  // Submit
  const submit = await fetch("https://api.spidra.io/api/batch/scrape", {
    method: "POST",
    headers: {
      Authorization: "Bearer YOUR_API_KEY",
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      urls,
      prompt: "Extract the product name, price, and availability",
      output: "json",
    }),
  });

  const { batchId } = await submit.json();

  // Poll
  while (true) {
    const status = await fetch(
      `https://api.spidra.io/api/batch/scrape/${batchId}`,
      { headers: { Authorization: "Bearer YOUR_API_KEY" } }
    );
    const data = await status.json();

    if (["completed", "failed", "cancelled"].includes(data.status)) {
      return data.items;
    }

    console.log(`${data.completedCount}/${data.totalUrls} done...`);
    await new Promise((r) => setTimeout(r, 3000));
  }
}

import time
import requests

BASE = "https://api.spidra.io/api"
HEADERS = {"Authorization": "Bearer YOUR_API_KEY"}

def scrape_products(urls):
    # Submit
    resp = requests.post(
        f"{BASE}/batch/scrape",
        headers=HEADERS,
        json={
            "urls": urls,
            "prompt": "Extract the product name, price, and availability",
            "output": "json",
        },
    )
    batch_id = resp.json()["batchId"]

    # Poll
    while True:
        status = requests.get(
            f"{BASE}/batch/scrape/{batch_id}", headers=HEADERS
        ).json()

        if status["status"] in ("completed", "failed", "cancelled"):
            return status["items"]

        print(f"{status['completedCount']}/{status['totalUrls']} done...")
        time.sleep(3)

Polling Pattern

Batch jobs are asynchronous. Poll GET /api/batch/scrape/{batchId} every 2–5 seconds until status is a terminal value.

`status`	Meaning
`pending`	Queued, no items have started yet
`running`	At least one item is being processed
`completed`	All items finished (some may have failed — check `failedCount`)
`failed`	The entire batch failed unexpectedly
`cancelled`	You cancelled it via `DELETE /api/batch/scrape/{batchId}`

completed does not mean every URL succeeded. A batch is completed when all items have reached a terminal state (completed or failed). Always check failedCount and inspect individual item statuses.

Per-Item Results

Each item in the items array represents one URL:

{
  "uuid": "item-uuid",
  "url": "https://example.com/product/1",
  "jobId": "bull-job-id",
  "status": "completed",
  "result": { "name": "Widget Pro", "price": 49.99, "available": true },
  "error": null,
  "creditsUsed": 3,
  "startedAt": "2024-01-15T10:00:01Z",
  "finishedAt": "2024-01-15T10:00:06Z",
  "screenshotUrl": null
}

Field	Description
`uuid`	Unique ID for this batch item
`url`	The URL that was scraped
`status`	`pending`, `running`, `completed`, or `failed`
`result`	Extracted content (object if JSON, string if markdown). `null` until completed
`error`	Error message if `status` is `failed`, otherwise `null`
`creditsUsed`	Credits consumed by this item. `0` for failed items
`startedAt`	When the worker picked up this item
`finishedAt`	When this item reached a terminal state
`screenshotUrl`	S3 URL if `screenshot: true` was set, otherwise `null`

Structured Output

Pass a schema to enforce a specific output shape across all URLs in the batch. The AI will return JSON matching your schema for every item.

{
  "urls": [
    "https://shop.example.com/item/1",
    "https://shop.example.com/item/2"
  ],
  "prompt": "Extract the product details",
  "schema": {
    "type": "object",
    "required": ["name", "price"],
    "properties": {
      "name":      { "type": "string" },
      "price":     { "type": "number" },
      "currency":  { "type": ["string", "null"] },
      "available": { "type": ["boolean", "null"] }
    }
  }
}

When a schema is provided, output is automatically set to "json". The schema is validated before the batch is queued — a 422 is returned if it is malformed.

Structured Output Guide

Full guide on nested objects, arrays, nullable fields, and schema limits

Retrying Failed Items

When a batch completes with some failures, retry only those items — no need to re-run the whole batch:

curl -X POST https://api.spidra.io/api/batch/scrape/abc-123/retry \
  -H "Authorization: Bearer YOUR_API_KEY"

# Response:
# { "status": "queued", "retriedCount": 2 }

const res = await fetch(
  `https://api.spidra.io/api/batch/scrape/${batchId}/retry`,
  { method: "POST", headers: { Authorization: "Bearer YOUR_API_KEY" } }
);
const { retriedCount } = await res.json();
console.log(`${retriedCount} items re-queued`);

resp = requests.post(
    f"{BASE}/batch/scrape/{batch_id}/retry",
    headers=HEADERS,
)
print(f"{resp.json()['retriedCount']} items re-queued")

The batch status resets to running and you poll the same batchId until it completes again. Successfully completed items are never touched.

Cancelling a Batch

Cancel a running or pending batch to stop processing and refund credits for items that have not started yet:

curl -X DELETE https://api.spidra.io/api/batch/scrape/abc-123 \
  -H "Authorization: Bearer YOUR_API_KEY"

{
  "status": "cancelled",
  "cancelledItems": 8,
  "creditsRefunded": 16
}

Items already running will complete normally. Only pending items are cancelled and refunded.

Proxy & Geo-Targeting

Apply stealth proxy routing to every URL in the batch with useProxy and proxyCountry:

{
  "urls": ["https://amazon.de/dp/B123", "https://amazon.de/dp/B456"],
  "prompt": "Extract price and availability",
  "output": "json",
  "useProxy": true,
  "proxyCountry": "de"
}

Stealth Mode & Geo-Targeting

Full country list, EU rotation, and billing details

Cookies & Authenticated Pages

Pass session cookies to scrape pages behind a login. Cookies are never stored — they are passed ephemerally to the worker and discarded after processing.

{
  "urls": [
    "https://app.example.com/reports/q1",
    "https://app.example.com/reports/q2"
  ],
  "cookies": "session=eyJ...; auth_token=abc123",
  "prompt": "Extract the report summary",
  "output": "json"
}

Authenticated Scraping

Full guide on obtaining and formatting cookies

Submit a Batch

Full request reference

Get Batch Status

Polling and response shape

List Batches

See all your batch jobs

Cancel & Retry

Stop a batch or re-run failures

​How It Works

​Quick Start

​Polling Pattern

​Per-Item Results

​Structured Output

Structured Output Guide

​Retrying Failed Items

​Cancelling a Batch

​Proxy & Geo-Targeting

Stealth Mode & Geo-Targeting

​Cookies & Authenticated Pages

Authenticated Scraping

Submit a Batch

Get Batch Status

List Batches

Cancel & Retry

How It Works

Quick Start

Polling Pattern

Per-Item Results

Structured Output

Retrying Failed Items

Cancelling a Batch

Proxy & Geo-Targeting

Cookies & Authenticated Pages