> ## Documentation Index
> Fetch the complete documentation index at: https://docs.spidra.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Submit a Scrape Job

> Queue URLs for scraping with optional browser actions and AI extraction

## How It Works

Spidra runs scrape jobs asynchronously. When you submit a request, you get a `jobId` back immediately. You then poll `GET /scrape/{jobId}` until `status` is `completed` and results are ready.

1. **Submit** - Send your request, receive a `jobId` in the response right away
2. **Load** - Spidra opens each URL in a real browser
3. **Execute** - Runs your browser actions (clicks, scrolls, etc.)
4. **Solve** - Automatically handles CAPTCHAs
5. **Process** - Runs AI extraction if a `prompt` is provided
6. **Poll** - Check `GET /scrape/{jobId}` until `status: "completed"`

<Note>
  Setting `output: "json"` without a `prompt` still triggers a default AI extraction pass. If you want raw markdown with no AI processing, omit both `output` and `prompt`.

  When AI extraction fails (for example, on a near-empty page), Spidra falls back to returning the raw page markdown in `result.content`. Check the `ai_extraction_failed` flag in the response to detect this case and handle degraded results in your code.
</Note>

***

## Structured Output

Pass a `schema` to tell the AI exactly what shape to return. Instead of getting whatever JSON the AI decides to produce, you get back a JSON object that matches your schema every time. Nullable fields come back as `null` rather than being omitted. Field names match exactly what you defined.

```json theme={null}
{
  "urls": [{ "url": "https://jobs.example.com/engineer" }],
  "prompt": "Extract the job details. Normalize salary to a plain number in USD.",
  "schema": {
    "type": "object",
    "required": ["title", "company", "remote", "employment_type"],
    "properties": {
      "title":           { "type": "string" },
      "company":         { "type": "string" },
      "remote":          { "type": ["boolean", "null"] },
      "salary_min":      { "type": ["number", "null"] },
      "salary_max":      { "type": ["number", "null"] },
      "employment_type": {
        "type": ["string", "null"],
        "enum": ["full_time", "part_time", "contract", null]
      }
    }
  }
}
```

`output` is automatically set to `"json"` when a schema is provided. The schema is validated before the job is queued and a `422` is returned with descriptive errors if the schema is malformed. Non-fatal issues (unsupported keywords) are returned as `schema_warnings` in the job status response.

<CardGroup cols={3}>
  <Card title="Structured Output Guide" icon="brackets-curly" href="/features/structured-output">
    Full guide: nested objects, arrays, nullable fields, the required rule, and schema limits
  </Card>

  <Card title="Zod and Pydantic" icon="code" href="/features/structured-output#generating-schemas-with-zod-or-pydantic">
    Generate your schema from an existing Zod or Pydantic model
  </Card>

  <Card title="Schema Generator" icon="wand-magic-sparkles" href="https://spidra.io/tools/json-schema-generator">
    Build and preview your schema visually — no JSON knowledge required
  </Card>
</CardGroup>

***

## Browser Actions

Interact with the page before scraping — click buttons, fill forms, scroll to load content, dismiss modals, and iterate over lists of elements.

```json theme={null}
{
  "urls": [{
    "url": "https://example.com/products",
    "actions": [
      {"type": "click", "selector": "#accept-cookies"},
      {"type": "wait", "duration": 1500},
      {"type": "scroll", "to": "80%"}
    ]
  }],
  "prompt": "List all product names and prices",
  "output": "json"
}
```

### Available Actions

| Action    | What it does                                                                                                 | Quick example                                                                  |
| --------- | ------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------ |
| `click`   | Clicks any element on the page, buttons, links, tabs, toggles                                                | `{"type": "click", "selector": "#load-more"}`                                  |
| `type`    | Types text into an input field or search box                                                                 | `{"type": "type", "selector": "#search", "value": "laptops"}`                  |
| `check`   | Checks a checkbox                                                                                            | `{"type": "check", "selector": "#in-stock-only"}`                              |
| `uncheck` | Unchecks a checkbox                                                                                          | `{"type": "uncheck", "selector": "#newsletter"}`                               |
| `wait`    | Pauses for a number of milliseconds                                                                          | `{"type": "wait", "duration": 2000}`                                           |
| `scroll`  | Scrolls the page to a percentage of its height                                                               | `{"type": "scroll", "to": "80%"}`                                              |
| `forEach` | Finds all matching elements and processes each one individually. Supports navigate, click, and inline modes. | `{"type": "forEach", "observe": "Find all product cards", "mode": "navigate"}` |

<Tip>
  For `click`, `check`, and `uncheck` you can target elements by CSS selector like `"selector": "#id"` or by a plain English description like `"value": "Accept cookies button"`. Both work.
</Tip>

`forEach` is the most powerful action. It finds a set of repeating elements (product cards, links, accordion rows) and runs a mini-scrape on each one. It supports three modes (`click`, `inline`, `navigate`), automatic pagination, per-item AI extraction, and per-element sub-actions.

<Card title="Full Browser Actions Guide" icon="arrow-right" href="/features/actions">
  Detailed explanations for every action with real examples, all forEach options, chaining patterns, and sample responses
</Card>

***

## Proxy and Geo-Targeting

Route requests through residential proxies to avoid detection or access geo-restricted content. Set `"useProxy": true` and optionally add `"proxyCountry"` to target a specific location.

```json theme={null}
{
  "urls": [{"url": "https://amazon.de/dp/B123456"}],
  "prompt": "Extract the product price in euros",
  "output": "json",
  "useProxy": true,
  "proxyCountry": "de"
}
```

<Card title="Stealth Mode & Geo-Targeting Guide" icon="shield-halved" href="/features/stealth-mode">
  Full guide: country list, EU rotation, examples, and credit costs
</Card>

***

## Extract Content Only

Remove navigation, headers, footers, and sidebars before processing. Useful when you only want the main article or product content.

```json theme={null}
{
  "urls": [{"url": "https://blog.example.com/article"}],
  "prompt": "Summarize this article",
  "output": "json",
  "extractContentOnly": true
}
```

***

## Screenshots

Capture screenshots of scraped pages for debugging or archival.

```json theme={null}
{
  "urls": [{"url": "https://example.com"}],
  "prompt": "Extract page title",
  "screenshot": true,
  "fullPageScreenshot": true
}
```

| Option                     | Description                                                      |
| -------------------------- | ---------------------------------------------------------------- |
| `screenshot: true`         | Capture the visible viewport                                     |
| `fullPageScreenshot: true` | Capture the entire scrollable page (requires `screenshot: true`) |

Screenshot URLs are returned in the `screenshots` array of the response.

***

## Authentication

Scrape protected pages by providing session cookies:

```json theme={null}
{
  "urls": [{"url": "https://example.com/dashboard"}],
  "prompt": "Extract account details",
  "output": "json",
  "cookies": "session=eyJ...; auth_token=abc123..."
}
```

<Card title="Authenticated Scraping" icon="lock-open" href="/features/authenticated-scraping">
  Full guide on getting cookies and formats
</Card>

***

<CardGroup cols={2}>
  <Card title="Check Job Status" icon="magnifying-glass" href="/api-reference/scraping/scrape-status">
    Poll for results
  </Card>

  <Card title="View Logs" icon="list" href="/api-reference/logs/scrape-logs">
    See your scrape history
  </Card>
</CardGroup>


## OpenAPI

````yaml POST /scrape
openapi: 3.1.0
info:
  title: Spidra API
  version: 1.0.0
  description: >-
    Public API endpoints for web scraping via Spidra. Authenticate with
    `Authorization: Bearer YOUR_API_KEY`.
servers:
  - url: https://api.spidra.io/api
security:
  - BearerAuth: []
  - ApiKeyAuth: []
paths:
  /scrape:
    post:
      tags:
        - Scraping
      summary: Submit a Scrape Job
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              required:
                - urls
              properties:
                urls:
                  type: array
                  minItems: 1
                  maxItems: 3
                  items:
                    $ref: '#/components/schemas/ScrapeUrl'
                  description: Array of URLs to scrape (1-3 URLs per request)
                prompt:
                  type: string
                  description: >-
                    Optional LLM prompt for extracting or transforming the
                    scraped content
                output:
                  type: string
                  enum:
                    - json
                    - markdown
                  default: json
                  description: Output format for the extracted content
                useProxy:
                  type: boolean
                  default: false
                  description: Enable stealth mode with proxy rotation to avoid detection
                proxyCountry:
                  type: string
                  description: >-
                    Country code (e.g., 'us', 'uk', 'de') or region ('global',
                    'asia', 'eu') for geo-targeted proxy routing. Requires
                    useProxy: true
                cookies:
                  type: string
                  description: >-
                    Session cookies for authenticated scraping. Supports
                    standard format (name=value; name2=value2) or raw Chrome
                    DevTools paste format
                screenshot:
                  type: boolean
                  default: false
                  description: Capture a screenshot of each page after scraping
                fullPageScreenshot:
                  type: boolean
                  default: false
                  description: >-
                    Capture full page screenshot instead of just the viewport.
                    Requires screenshot: true
                extractContentOnly:
                  type: boolean
                  default: false
                  description: >-
                    Remove headers, footers, navigation, and other non-content
                    elements from the scraped output
                schema:
                  type: object
                  description: >-
                    JSON Schema object describing the exact shape of the AI
                    output. When provided, the AI must return JSON matching this
                    schema. Output is automatically set to 'json'. Root must be
                    type 'object'. Maximum nesting depth: 5. Maximum size: 10KB.
            examples:
              simple:
                summary: Simple scrape
                value:
                  urls:
                    - url: https://example.com
                  prompt: Extract the main heading and first paragraph
                  output: json
              with_actions:
                summary: Scrape with browser actions
                value:
                  urls:
                    - url: https://example.com/products
                      actions:
                        - type: click
                          selector: '#load-more'
                        - type: wait
                          value: 2000
                        - type: scroll
                          value: 50%
                  prompt: List all product names and prices
                  output: json
              with_forEach:
                summary: Iterate over expandable elements
                value:
                  urls:
                    - url: https://booking.com/hotel/example
                      actions:
                        - type: forEach
                          value: Find all room category cards
                  prompt: Extract room name, price, and amenities for each room
                  output: json
              geo_targeted:
                summary: Geo-targeted proxy
                value:
                  urls:
                    - url: https://amazon.co.uk/dp/B123456
                  prompt: Extract the product price
                  output: json
                  useProxy: true
                  proxyCountry: uk
              with_screenshot:
                summary: Capture full page screenshot
                value:
                  urls:
                    - url: https://example.com
                  prompt: Extract page title
                  screenshot: true
                  fullPageScreenshot: true
              with_schema:
                summary: Structured output with JSON Schema
                value:
                  urls:
                    - url: https://jobs.example.com/senior-engineer
                  prompt: >-
                    Extract the job details. Normalize salary to a plain number
                    in USD.
                  schema:
                    type: object
                    required:
                      - title
                      - company
                      - remote
                      - employment_type
                    properties:
                      title:
                        type: string
                      company:
                        type: string
                      remote:
                        type:
                          - boolean
                          - 'null'
                      salary_min:
                        type:
                          - number
                          - 'null'
                      salary_max:
                        type:
                          - number
                          - 'null'
                      employment_type:
                        type:
                          - string
                          - 'null'
                        enum:
                          - full_time
                          - part_time
                          - contract
                          - null
      responses:
        '202':
          description: Job successfully queued
          content:
            application/json:
              schema:
                type: object
                properties:
                  status:
                    type: string
                    enum:
                      - queued
                  jobId:
                    type: string
                    description: Unique job identifier for polling
                  message:
                    type: string
                  deduplicated:
                    type: boolean
                    description: >-
                      True if an identical request was made within the last 5
                      seconds and this returns the existing job ID
              example:
                status: queued
                jobId: 550e8400-e29b-41d4-a716-446655440000
                message: >-
                  Scrape job has been queued. Poll
                  /api/scrape/550e8400-e29b-41d4-a716-446655440000 to get the
                  result.
        '400':
          description: Invalid request
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
              example:
                status: error
                message: You must pass 1 to 3 URLs with actions.
        '403':
          description: Quota exceeded or plan limit reached
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
              example:
                status: error
                message: Monthly scrape limit exceeded for your plan.
components:
  schemas:
    ScrapeUrl:
      type: object
      required:
        - url
      properties:
        url:
          type: string
          format: uri
          description: The URL to scrape
        actions:
          type: array
          items:
            $ref: '#/components/schemas/ScrapeAction'
          description: Optional browser actions to perform before scraping
        useProxy:
          type: boolean
          description: Override the request-level useProxy setting for this specific URL
    ErrorResponse:
      type: object
      properties:
        status:
          type: string
          enum:
            - error
        message:
          type: string
      required:
        - status
        - message
    ScrapeAction:
      type: object
      required:
        - type
      properties:
        type:
          type: string
          enum:
            - click
            - type
            - wait
            - scroll
            - check
            - uncheck
            - forEach
          description: Action type to perform on the page
        selector:
          type: string
          description: >-
            CSS selector or XPath for the target element (e.g., '#button',
            '.submit-btn')
        value:
          type:
            - string
            - number
          description: >-
            For click/check/uncheck: plain English element description for AI
            targeting. For type: the text to type. For wait: milliseconds. For
            scroll: percentage ('50%') or pixels. For forEach: natural language
            instruction describing elements to find.
        timeout:
          type: number
          description: Timeout in milliseconds for element visibility
  securitySchemes:
    BearerAuth:
      type: http
      scheme: bearer
      bearerFormat: API key
    ApiKeyAuth:
      type: apiKey
      in: header
      name: x-api-key

````