Skip to main content
POST
/
scrape
curl --request POST \
  --url https://api.spidra.io/api/scrape \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '
{
  "urls": [
    {
      "url": "https://example.com"
    }
  ],
  "prompt": "Extract the main heading and first paragraph",
  "output": "json"
}
'
{
  "status": "queued",
  "jobId": "550e8400-e29b-41d4-a716-446655440000",
  "message": "Scrape job has been queued. Poll /api/scrape/550e8400-e29b-41d4-a716-446655440000 to get the result."
}

How It Works

  1. Load - Opens each URL in a real browser
  2. Execute - Runs your browser actions (clicks, scrolls, etc.)
  3. Solve - Automatically handles CAPTCHAs
  4. Process - Runs AI extraction (if prompt provided)

Browser Actions

Interact with pages before scraping:
{
  "urls": [{
    "url": "https://example.com/products",
    "actions": [
      {"type": "click", "selector": "#accept-cookies"},
      {"type": "wait", "value": 1000},
      {"type": "click", "selector": ".load-more-btn"},
      {"type": "scroll", "value": "50%"}
    ]
  }],
  "prompt": "List all product names and prices",
  "output": "json"
}

Available Actions

ActionDescriptionExample
clickClick an element{"type": "click", "selector": "#btn"}
typeType into input{"type": "type", "selector": "#search", "value": "query"}
waitWait (ms){"type": "wait", "value": 2000}
scrollScroll page{"type": "scroll", "value": "50%"}
Use browser DevTools (right-click → Inspect) to find CSS selectors.

Authentication (Optional)

Scrape protected pages by providing session cookies from your logged-in browser. We support two formats: The traditional name=value format, separated by semicolons:
{
  "urls": [{"url": "https://example.com/company/stripe"}],
  "prompt": "Extract company details",
  "output": "json",
  "cookies": "authcookie=eyJ...; cf_clearance=2B08..."
}

Format 2: Raw DevTools Paste

Copy-paste directly from Chrome DevTools (Application → Cookies → Select All → Copy):
{
  "urls": [{"url": "https://example.com/company"}],
  "prompt": "Extract company details",
  "output": "json",
  "cookies": "authcookie\teyJhbGciOiJIUzUxMiJ9...\t.example.com\t/\t2026-06-30T14:29:30.522Z\t881\t\t\tLax\t\t\tMedium\ncf_clearance\tQnuFniylefl3k3FTfCbnp...\t.example.com\t/\t2027-01-01T14:29:32.591Z\t310\t\t\tNone\thttps://example.com\t\tMedium\n_ga\tGA1.1.1832229719.1766335524\t.example.com\t/\t2027-01-27T11:29:56.430Z\t30\t\t\t\t\t\tMedium"
}
The API auto-detects the format based on the presence of tabs and newlines. It extracts Name, Value, Domain, and Path from each row - extra columns (Size, HttpOnly, Secure, etc.) are ignored.

How to Get Cookies

  1. Log into the target website in your browser
  2. Open DevTools (F12) → Application → Cookies
  3. For Standard format: Copy individual cookie names and values, format as name=value; name2=value2
  4. For Raw format: Select all rows (Ctrl/Cmd+A), copy (Ctrl/Cmd+C), and paste directly
Legal Responsibility: You are solely responsible for ensuring your authenticated scraping complies with applicable laws and the target website’s Terms of Service. Only scrape content you’re authorized to access. When uncertain, seek written permission from the platform. Misuse of session cookies may violate terms of service or applicable laws. Cookies are processed transiently and never stored by Spidra.

Authorizations

x-api-key
string
header
required

Body

application/json
urls
object[]
required

Array of URLs to scrape (1-3 URLs per request)

Required array length: 1 - 3 elements
prompt
string

Optional LLM prompt for extracting or transforming the scraped content

output
enum<string>
default:json

Output format for the extracted content

Available options:
json,
markdown
useProxy
boolean
default:false

Enable stealth mode with proxy rotation to avoid detection

Response

Job successfully queued

status
enum<string>
Available options:
queued
jobId
string

Unique job identifier for polling

message
string