Skip to main content
Batch scraping lets you queue up to 50 URLs in one API call. Each URL is processed independently and in parallel. You get back per-item results with status, content, credits used, and timestamps — all under a single batchId. Use batch scraping when:
  • You have a list of product, article, or listing URLs to extract
  • You want one API call per dataset rather than managing dozens of individual jobs
  • You need to retry only the URLs that failed without re-running the whole set

How It Works

1

Submit

Send a POST /api/batch/scrape with your URL list and extraction options. You get a batchId back immediately — the job is queued.
2

Process

Spidra processes each URL independently using a real browser. CAPTCHA solving, proxy routing, and AI extraction all run per-item.
3

Poll

Call GET /api/batch/scrape/{batchId} every few seconds. The response includes live progress counters (completedCount, failedCount) and per-item results.
4

Handle failures

If any items fail, call POST /api/batch/scrape/{batchId}/retry. Only the failed items are re-queued — successful ones are untouched.

Quick Start

# 1. Submit
curl -X POST https://api.spidra.io/api/batch/scrape \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [
      "https://example.com/product/1",
      "https://example.com/product/2",
      "https://example.com/product/3"
    ],
    "prompt": "Extract the product name, price, and availability",
    "output": "json"
  }'

# Response:
# { "status": "queued", "batchId": "abc-123", "total": 3 }

# 2. Poll until done
curl https://api.spidra.io/api/batch/scrape/abc-123 \
  -H "x-api-key: YOUR_API_KEY"

Polling Pattern

Batch jobs are asynchronous. Poll GET /api/batch/scrape/{batchId} every 2–5 seconds until status is a terminal value.
statusMeaning
pendingQueued, no items have started yet
runningAt least one item is being processed
completedAll items finished (some may have failed — check failedCount)
failedThe entire batch failed unexpectedly
cancelledYou cancelled it via DELETE /api/batch/scrape/{batchId}
completed does not mean every URL succeeded. A batch is completed when all items have reached a terminal state (completed or failed). Always check failedCount and inspect individual item statuses.

Per-Item Results

Each item in the items array represents one URL:
{
  "uuid": "item-uuid",
  "url": "https://example.com/product/1",
  "jobId": "bull-job-id",
  "status": "completed",
  "result": { "name": "Widget Pro", "price": 49.99, "available": true },
  "error": null,
  "creditsUsed": 3,
  "startedAt": "2024-01-15T10:00:01Z",
  "finishedAt": "2024-01-15T10:00:06Z",
  "screenshotUrl": null
}
FieldDescription
uuidUnique ID for this batch item
urlThe URL that was scraped
statuspending, running, completed, failed, or cancelled
resultExtracted content (object if JSON, string if markdown). null until completed
errorError message if status is failed, otherwise null
creditsUsedCredits consumed by this item. 0 for failed items
startedAtWhen the worker picked up this item
finishedAtWhen this item reached a terminal state
screenshotUrlS3 URL if screenshot: true was set, otherwise null

Structured Output

Pass a schema to enforce a specific output shape across all URLs in the batch. The AI will return JSON matching your schema for every item.
{
  "urls": [
    "https://shop.example.com/item/1",
    "https://shop.example.com/item/2"
  ],
  "prompt": "Extract the product details",
  "schema": {
    "type": "object",
    "required": ["name", "price"],
    "properties": {
      "name":      { "type": "string" },
      "price":     { "type": "number" },
      "currency":  { "type": ["string", "null"] },
      "available": { "type": ["boolean", "null"] }
    }
  }
}
When a schema is provided, output is automatically set to "json". The schema is validated before the batch is queued — a 422 is returned if it is malformed.

Structured Output Guide

Full guide on nested objects, arrays, nullable fields, and schema limits

Retrying Failed Items

When a batch completes with some failures, retry only those items — no need to re-run the whole batch:
curl -X POST https://api.spidra.io/api/batch/scrape/abc-123/retry \
  -H "x-api-key: YOUR_API_KEY"

# Response:
# { "status": "queued", "retriedCount": 2 }
The batch status resets to running and you poll the same batchId until it completes again. Successfully completed items are never touched.

Cancelling a Batch

Cancel a running or pending batch to stop processing and refund credits for items that have not started yet:
curl -X DELETE https://api.spidra.io/api/batch/scrape/abc-123 \
  -H "x-api-key: YOUR_API_KEY"
{
  "status": "cancelled",
  "cancelledItems": 8,
  "creditsRefunded": 16
}
Items already running will complete normally. Only pending items are cancelled and refunded.

Proxy & Geo-Targeting

Apply stealth proxy routing to every URL in the batch with useProxy and proxyCountry:
{
  "urls": ["https://amazon.de/dp/B123", "https://amazon.de/dp/B456"],
  "prompt": "Extract price and availability",
  "output": "json",
  "useProxy": true,
  "proxyCountry": "de"
}

Stealth Mode & Geo-Targeting

Full country list, EU rotation, and billing details

Cookies & Authenticated Pages

Pass session cookies to scrape pages behind a login. Cookies are never stored — they are passed ephemerally to the worker and discarded after processing.
{
  "urls": [
    "https://app.example.com/reports/q1",
    "https://app.example.com/reports/q2"
  ],
  "cookies": "session=eyJ...; auth_token=abc123",
  "prompt": "Extract the report summary",
  "output": "json"
}

Authenticated Scraping

Full guide on obtaining and formatting cookies

Submit a Batch

Full request reference

Get Batch Status

Polling and response shape

List Batches

See all your batch jobs

Cancel & Retry

Stop a batch or re-run failures