Skip to main content
POST
/
crawl
Submit a Crawl Job
curl --request POST \
  --url https://api.spidra.io/api/crawl \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '
{
  "baseUrl": "https://example.com/blog",
  "crawlInstruction": "Crawl all blog post pages",
  "transformInstruction": "Extract title, author, date, and content from each post",
  "maxPages": 10
}
'
{
  "status": "queued",
  "jobId": "abc-123",
  "message": "Crawl job queued. Poll /api/crawl/abc-123 for results."
}

How It Works

  1. Start - Loads your base URL
  2. Discover - Finds links matching your instructions
  3. Crawl - Visits each page (up to maxPages)
  4. Solve - Automatically handles CAPTCHAs
  5. Transform - Extracts data from each page

Authentication (Optional)

Crawl protected pages by providing session cookies from your logged-in browser:
{
  "baseUrl": "https://app.example.com/dashboard",
  "crawlInstruction": "Find all report pages",
  "transformInstruction": "Extract report titles and dates",
  "maxPages": 10,
  "cookies": "session_id=abc123; auth_token=xyz789"
}

How to Get Cookies

  1. Log into the target website in your browser
  2. Open DevTools (F12) → Application → Cookies
  3. Copy the relevant cookie names and values
  4. Format as name=value; name2=value2
Legal Responsibility: You are solely responsible for ensuring your authenticated crawling complies with applicable laws and the target website’s Terms of Service. Only crawl content you’re authorized to access. Cookies are processed transiently and never stored by Spidra.

Authorizations

x-api-key
string
header
required

Body

application/json
baseUrl
string<uri>
required

The starting URL to crawl from

crawlInstruction
string
required

Instruction for which pages to crawl (e.g., 'all product pages', 'blog posts only')

transformInstruction
string
required

How to extract/transform data from each crawled page

maxPages
integer
default:5

Maximum number of pages to crawl

Required range: 1 <= x <= 20
useProxy
boolean
default:false

Enable stealth mode with proxy rotation

Response

Crawl job queued

status
enum<string>
Available options:
queued
jobId
string
message
string