Skip to main content
POST
/
crawl
Submit a Crawl Job
curl --request POST \
  --url https://api.spidra.io/api/crawl \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '
{
  "baseUrl": "https://example.com/blog",
  "crawlInstruction": "Crawl all blog post pages",
  "transformInstruction": "Extract title, author, date, and content from each post",
  "maxPages": 10
}
'
{
  "status": "queued",
  "jobId": "abc-123",
  "message": "Crawl job queued. Poll /api/crawl/abc-123 for results."
}

How It Works

  1. Start - Loads your base URL
  2. Discover - Finds links matching your instructions
  3. Crawl - Visits each page (up to maxPages)
  4. Solve - Automatically handles CAPTCHAs
  5. Transform - Extracts data from each page

Authentication

Crawl protected pages by providing session cookies:
{
  "baseUrl": "https://app.example.com/dashboard",
  "crawlInstruction": "Find all report pages",
  "transformInstruction": "Extract report titles and dates",
  "maxPages": 10,
  "cookies": "session_id=abc123; auth_token=xyz789"
}

Authenticated Scraping

Full guide on getting cookies and formats

Authorizations

x-api-key
string
header
required

Body

application/json
baseUrl
string<uri>
required

The starting URL to crawl from

crawlInstruction
string
required

Instruction for which pages to crawl (e.g., 'all product pages', 'blog posts only')

transformInstruction
string
required

How to extract/transform data from each crawled page

maxPages
integer
default:5

Maximum number of pages to crawl

Required range: 1 <= x <= 20
useProxy
boolean
default:false

Enable stealth mode with proxy rotation

cookies
string

Session cookies for authenticated crawling. Supports standard format (name=value; name2=value2) or raw Chrome DevTools paste format

Response

Crawl job queued

status
enum<string>
Available options:
queued
jobId
string
message
string