Skip to main content
POST
/
crawl
Submit a Crawl Job
curl --request POST \
  --url https://api.spidra.io/api/crawl \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '
{
  "baseUrl": "https://example.com/blog",
  "crawlInstruction": "Crawl all blog post pages",
  "transformInstruction": "Extract title, author, date, and content from each post",
  "maxPages": 10
}
'
{
  "status": "queued",
  "jobId": "abc-123",
  "message": "Crawl job queued. Poll /api/crawl/abc-123 for results."
}

How It Works

Crawl jobs are asynchronous. You get a jobId back immediately when you submit and then poll GET /crawl/{jobId} until the job finishes.
  1. Submit - Send your request, receive a jobId right away
  2. Discover - Spidra loads your base URL and finds links matching your crawlInstruction
  3. Crawl - Visits each page (up to maxPages)
  4. Solve - Automatically handles CAPTCHAs
  5. Transform - Runs your transformInstruction on each page to extract structured data
  6. Poll - Check GET /crawl/{jobId} until status is completed

Proxy and Geo-Targeting

Route crawl requests through residential proxies to bypass bot detection or access geo-restricted content.
{
  "baseUrl": "https://example.com",
  "crawlInstruction": "Find all product pages",
  "transformInstruction": "Extract product name and price",
  "useProxy": true,
  "proxyCountry": "us"
}
Use "proxyCountry": "global" (or omit it) for no country preference. Use "eu" to rotate across all 27 EU member states. For a specific country pass its two-letter ISO code.

Stealth Mode & Geo-Targeting Guide

Full country list, EU rotation, examples, and credit costs

Authentication

Crawl protected pages by providing session cookies:
{
  "baseUrl": "https://app.example.com/dashboard",
  "crawlInstruction": "Find all report pages",
  "transformInstruction": "Extract report titles and dates",
  "maxPages": 10,
  "cookies": "session_id=abc123; auth_token=xyz789"
}

Authenticated Scraping

Full guide on getting cookies and formats

Authorizations

x-api-key
string
header
required

Body

application/json
baseUrl
string<uri>
required

The starting URL to crawl from

crawlInstruction
string
required

Instruction for which pages to crawl (e.g., 'all product pages', 'blog posts only')

transformInstruction
string
required

How to extract/transform data from each crawled page

maxPages
integer
default:5

Maximum number of pages to crawl

Required range: 1 <= x <= 20
useProxy
boolean
default:false

Enable stealth mode with proxy rotation

proxyCountry
string

Country code (e.g., 'us', 'uk', 'de') or region ('global', 'asia', 'eu') for geo-targeted proxy routing. Requires useProxy: true

cookies
string

Session cookies for authenticated crawling. Supports standard format (name=value; name2=value2) or raw Chrome DevTools paste format

Response

Crawl job queued

status
enum<string>
Available options:
queued
jobId
string
message
string