Submit a Crawl Job

curl --request POST \
  --url https://api.spidra.io/api/crawl \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '
{
  "baseUrl": "https://example.com/blog",
  "crawlInstruction": "Crawl all blog post pages",
  "transformInstruction": "Extract title, author, date, and content from each post",
  "maxPages": 10
}
'

{
  "status": "queued",
  "jobId": "abc-123",
  "message": "Crawl job queued. Poll /api/crawl/abc-123 for results."
}

POST

crawl

Submit a Crawl Job

curl --request POST \
  --url https://api.spidra.io/api/crawl \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '
{
  "baseUrl": "https://example.com/blog",
  "crawlInstruction": "Crawl all blog post pages",
  "transformInstruction": "Extract title, author, date, and content from each post",
  "maxPages": 10
}
'

{
  "status": "queued",
  "jobId": "abc-123",
  "message": "Crawl job queued. Poll /api/crawl/abc-123 for results."
}

How It Works

Start - Loads your base URL
Discover - Finds links matching your instructions
Crawl - Visits each page (up to maxPages)
Solve - Automatically handles CAPTCHAs
Transform - Extracts data from each page

Authentication

Crawl protected pages by providing session cookies:

{
  "baseUrl": "https://app.example.com/dashboard",
  "crawlInstruction": "Find all report pages",
  "transformInstruction": "Extract report titles and dates",
  "maxPages": 10,
  "cookies": "session_id=abc123; auth_token=xyz789"
}

Authenticated Scraping

Full guide on getting cookies and formats

Authorizations

x-api-key

string

header

required

Body

application/json

baseUrl

string<uri>

required

The starting URL to crawl from

crawlInstruction

string

required

Instruction for which pages to crawl (e.g., 'all product pages', 'blog posts only')

transformInstruction

string

required

How to extract/transform data from each crawled page

maxPages

integer

default:5

Maximum number of pages to crawl

Required range: 1 <= x <= 20

useProxy

boolean

default:false

Enable stealth mode with proxy rotation

string

Session cookies for authenticated crawling. Supports standard format (name=value; name2=value2) or raw Chrome DevTools paste format

Response

Crawl job queued

status

enum<string>

Available options:

queued

jobId

string

message

string

Get Scrape Log Details Get Crawl Job Status

Using the API

Scrape Endpoints

Crawl Endpoints

Account Endpoints

Submit a Crawl Job

How It Works

Authentication

Authenticated Scraping

Authorizations

Body

Response

Using the API

Scrape Endpoints

Crawl Endpoints

Account Endpoints

​How It Works

​Authentication

Authenticated Scraping

Authorizations

Body

Response

How It Works

Authentication