> ## Documentation Index
> Fetch the complete documentation index at: https://docs.spidra.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Submit a Crawl Job

> Crawl multiple pages and extract data from each

## How It Works

Crawl jobs are asynchronous. You get a `jobId` back immediately when you submit and then poll `GET /crawl/{jobId}` until the job finishes.

1. **Submit** - Send your request, receive a `jobId` right away
2. **Discover** - Spidra loads your base URL and finds links matching your `crawlInstruction`
3. **Crawl** - Visits each page (up to `maxPages`)
4. **Solve** - Automatically handles CAPTCHAs
5. **Transform** - Runs your `transformInstruction` on each page to extract structured data
6. **Poll** - Check `GET /crawl/{jobId}` until `status` is `completed`

***

## Proxy and Geo-Targeting

Route crawl requests through residential proxies to bypass bot detection or access geo-restricted content.

```json theme={null}
{
  "baseUrl": "https://example.com",
  "crawlInstruction": "Find all product pages",
  "transformInstruction": "Extract product name and price",
  "useProxy": true,
  "proxyCountry": "us"
}
```

Use `"proxyCountry": "global"` (or omit it) for no country preference. Use `"eu"` to rotate across all 27 EU member states. For a specific country pass its two-letter ISO code.

<Card title="Stealth Mode & Geo-Targeting Guide" icon="shield-halved" href="/features/stealth-mode">
  Full country list, EU rotation, examples, and credit costs
</Card>

***

## Authentication

Crawl protected pages by providing session cookies:

```json theme={null}
{
  "baseUrl": "https://app.example.com/dashboard",
  "crawlInstruction": "Find all report pages",
  "transformInstruction": "Extract report titles and dates",
  "maxPages": 10,
  "cookies": "session_id=abc123; auth_token=xyz789"
}
```

<Card title="Authenticated Scraping" icon="lock-open" href="/features/authenticated-scraping">
  Full guide on getting cookies and formats
</Card>


## OpenAPI

````yaml POST /crawl
openapi: 3.1.0
info:
  title: Spidra API
  version: 1.0.0
  description: >-
    Public API endpoints for web scraping via Spidra. Authentication is via API
    key passed in the `x-api-key` header.
servers:
  - url: https://api.spidra.io/api
security:
  - ApiKeyAuth: []
paths:
  /crawl:
    post:
      tags:
        - Crawling
      summary: Submit a Crawl Job
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              required:
                - baseUrl
                - crawlInstruction
                - transformInstruction
              properties:
                baseUrl:
                  type: string
                  format: uri
                  description: The starting URL to crawl from
                crawlInstruction:
                  type: string
                  description: >-
                    Instruction for which pages to crawl (e.g., 'all product
                    pages', 'blog posts only')
                transformInstruction:
                  type: string
                  description: How to extract/transform data from each crawled page
                maxPages:
                  type: integer
                  default: 5
                  minimum: 1
                  maximum: 20
                  description: Maximum number of pages to crawl
                useProxy:
                  type: boolean
                  default: false
                  description: Enable stealth mode with proxy rotation
                proxyCountry:
                  type: string
                  description: >-
                    Country code (e.g., 'us', 'uk', 'de') or region ('global',
                    'asia', 'eu') for geo-targeted proxy routing. Requires
                    useProxy: true
                cookies:
                  type: string
                  description: >-
                    Session cookies for authenticated crawling. Supports
                    standard format (name=value; name2=value2) or raw Chrome
                    DevTools paste format
            example:
              baseUrl: https://example.com/blog
              crawlInstruction: Crawl all blog post pages
              transformInstruction: Extract title, author, date, and content from each post
              maxPages: 10
      responses:
        '202':
          description: Crawl job queued
          content:
            application/json:
              schema:
                type: object
                properties:
                  status:
                    type: string
                    enum:
                      - queued
                  jobId:
                    type: string
                  message:
                    type: string
              example:
                status: queued
                jobId: abc-123
                message: Crawl job queued. Poll /api/crawl/abc-123 for results.
        '400':
          description: Invalid request
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
              example:
                status: error
                message: >-
                  Missing required fields: baseUrl, crawlInstruction,
                  transformInstruction
        '403':
          description: Quota exceeded
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
components:
  schemas:
    ErrorResponse:
      type: object
      properties:
        status:
          type: string
          enum:
            - error
        message:
          type: string
      required:
        - status
        - message
  securitySchemes:
    ApiKeyAuth:
      type: apiKey
      in: header
      name: x-api-key

````