> ## Documentation Index
> Fetch the complete documentation index at: https://docs.spidra.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Download Crawl Results

> Download the results of a completed crawl job as a ZIP archive containing HTML, markdown, and extracted data files.

Downloads a ZIP archive of all successfully crawled pages from a completed job. Each page is saved as one or more files inside the archive, organized by hostname and path. Use the `include` parameter to control which content types are bundled in the ZIP.

## Content Types

| Value      | What is included                                                                                              |
| ---------- | ------------------------------------------------------------------------------------------------------------- |
| `html`     | Raw HTML file for each page                                                                                   |
| `markdown` | Markdown version of each page                                                                                 |
| `data`     | AI-extracted data in JSON, CSV, or Markdown format (format is auto-detected from your `transformInstruction`) |

If you omit the `include` parameter, all three types are included by default.

## Example Requests

<CodeGroup>
  ```bash cURL theme={null}
  # Download everything (HTML, markdown, and extracted data)
  curl -OJ "https://api.spidra.io/api/crawl/abc-123/download" \
    -H "x-api-key: YOUR_API_KEY"

  # Download only the extracted data
  curl -OJ "https://api.spidra.io/api/crawl/abc-123/download?include=data" \
    -H "x-api-key: YOUR_API_KEY"

  # Download markdown and data only
  curl -OJ "https://api.spidra.io/api/crawl/abc-123/download?include=markdown,data" \
    -H "x-api-key: YOUR_API_KEY"
  ```

  ```python Python theme={null}
  import requests

  # Download everything
  response = requests.get(
      "https://api.spidra.io/api/crawl/abc-123/download",
      headers={"x-api-key": "YOUR_API_KEY"}
  )
  with open("crawl-abc-123.zip", "wb") as f:
      f.write(response.content)

  # Download only extracted data
  response = requests.get(
      "https://api.spidra.io/api/crawl/abc-123/download",
      headers={"x-api-key": "YOUR_API_KEY"},
      params={"include": "data"}
  )
  with open("crawl-abc-123.zip", "wb") as f:
      f.write(response.content)
  ```

  ```javascript Node.js theme={null}
  import { writeFileSync } from "fs";

  // Download everything
  const response = await fetch(
    "https://api.spidra.io/api/crawl/abc-123/download",
    { headers: { "x-api-key": "YOUR_API_KEY" } }
  );
  const buffer = await response.arrayBuffer();
  writeFileSync("crawl-abc-123.zip", Buffer.from(buffer));

  // Download only extracted data
  const dataOnly = await fetch(
    "https://api.spidra.io/api/crawl/abc-123/download?include=data",
    { headers: { "x-api-key": "YOUR_API_KEY" } }
  );
  const buf = await dataOnly.arrayBuffer();
  writeFileSync("crawl-abc-123.zip", Buffer.from(buf));
  ```
</CodeGroup>

## ZIP Archive Structure

When a single content type is requested, files are placed at the root of the archive with appropriate extensions:

```
crawl-abc-123.zip
  example.com_blog_post-one.json
  example.com_blog_post-two.json
```

When multiple content types are requested, each page gets its own folder:

```
crawl-abc-123.zip
  example.com_blog_post-one/
    data.json
    index.html
    markdown.md
  example.com_blog_post-two/
    data.json
    index.html
    markdown.md
```

## Response

The response is a binary ZIP file with the following headers:

| Header                | Value                                    |
| --------------------- | ---------------------------------------- |
| `Content-Type`        | `application/zip`                        |
| `Content-Disposition` | `attachment; filename=crawl-{jobId}.zip` |

<Note>
  Only pages with `status: "success"` are included in the download. If no successful pages exist, the API returns a 404 error.
</Note>


## OpenAPI

````yaml GET /crawl/{jobId}/download
openapi: 3.1.0
info:
  title: Spidra API
  version: 1.0.0
  description: >-
    Public API endpoints for web scraping via Spidra. Authentication is via API
    key passed in the `x-api-key` header.
servers:
  - url: https://api.spidra.io/api
security:
  - ApiKeyAuth: []
paths:
  /crawl/{jobId}/download:
    get:
      tags:
        - Crawling
      summary: Download Crawl Results as ZIP
      description: >-
        Download the results of a completed crawl job as a ZIP archive. Each
        successfully crawled page is included as a separate file. You can choose
        which content types to include using the `include` query parameter.
      parameters:
        - name: jobId
          in: path
          required: true
          schema:
            type: string
          description: The ID of the completed crawl job to download
        - name: include
          in: query
          required: false
          schema:
            type: string
          description: >-
            Comma-separated list of content types to include in the ZIP.
            Accepted values: `html`, `markdown`, `data`. Defaults to all three.
            Example: `include=data,markdown`
      responses:
        '200':
          description: ZIP archive containing the crawl results
          content:
            application/zip:
              schema:
                type: string
                format: binary
          headers:
            Content-Disposition:
              schema:
                type: string
              description: Attachment filename in the format `crawl-{jobId}.zip`
        '401':
          description: Invalid or missing API key
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
              example:
                status: error
                message: Access token invalid or expired
        '403':
          description: Not authorized to access this job
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
              example:
                status: error
                message: Unauthorized access or job not found
        '404':
          description: No successful pages found for this crawl
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
              example:
                status: error
                message: No successful pages found for this crawl
components:
  schemas:
    ErrorResponse:
      type: object
      properties:
        status:
          type: string
          enum:
            - error
        message:
          type: string
      required:
        - status
        - message
  securitySchemes:
    ApiKeyAuth:
      type: apiKey
      in: header
      name: x-api-key

````