> ## Documentation Index
> Fetch the complete documentation index at: https://docs.spidra.io/llms.txt
> Use this file to discover all available pages before exploring further.

# n8n

> Trigger Spidra web scraping and crawling jobs directly from n8n workflows. No code required — connect extracted data to any downstream node.

The Spidra n8n node lets you trigger scrape jobs, batch process URLs, and crawl entire websites as steps in any n8n workflow. No code required. Configure your extraction prompt, connect the output to whatever comes next, and you're done.

## Installation

In your n8n instance, go to **Settings > Community Nodes** and install:

```
n8n-nodes-spidra
```

<Note>
  Requires n8n 1.0 or higher. After installing, restart n8n for the node to appear in the editor.
</Note>

## Authentication

Add a new **Spidra API** credential and enter your API key. You can get your key from [app.spidra.io](https://app.spidra.io) under **Settings > API Keys**.

If you are running a self-hosted Spidra instance, change the **Base URL** field to point at your server. The default is `https://api.spidra.io/api`.

## Resources and operations

The node has five resources. Each one maps directly to the Spidra API.

| Resource         | Operations                                           |
| ---------------- | ---------------------------------------------------- |
| **Scrape**       | Run, Submit, Get Status                              |
| **Batch Scrape** | Run, Submit, Get Status, List, Cancel, Retry Failed  |
| **Crawl**        | Run, Submit, Get Status, Get Pages, Extract, History |
| **Logs**         | List, Get                                            |
| **Usage**        | Get Stats                                            |

## Run vs Submit + Get Status

Every resource that creates a job has two ways to handle it.

**Run** submits the job and keeps the workflow waiting until results come back. This is the simplest option and works well for short jobs. You set a **Max Wait Time** (default 120 seconds). If the job finishes in time, the node outputs the full result. If it times out, the node outputs a `{ status: "timeout", jobId: "..." }` response so you can chain a **Get Status** node and check progress later.

**Submit** returns the job ID immediately without waiting. Use this when you want to kick off a long job and check it in a later step or a separate workflow run.

## Scraping

Select **Resource: Scrape** and **Operation: Run** to scrape up to three URLs in one request. Add your URLs using the **Add URL** button. Each URL can include an optional **Browser Actions** JSON array for interactions like clicking, scrolling, or filling a form before the AI extracts.

Set **Output Format** to JSON or Markdown. Use the **Options** collection to add an extraction prompt, a JSON schema for structured output, proxy settings, cookies, and screenshot capture.

**Extraction Prompt** tells the AI what to pull from the page in plain English. **Extraction Schema** enforces an exact output shape and takes precedence over the prompt when both are set.

## Batch scraping

Select **Resource: Batch Scrape** to process a large list of URLs in one job. Add each URL as a separate line in the **URLs** field. The batch supports up to 50 URLs per job and processes them all in parallel.

The same options available in Scrape (prompt, schema, proxy, cookies, screenshots) are available here too.

If some items fail, use **Retry Failed** with the batch ID to re-queue only the failed URLs without re-running the ones that already completed. Use **Cancel** to stop a running batch and get credits refunded for anything that has not started yet.

## Crawling

Select **Resource: Crawl** to start from a URL and let Spidra discover and process pages on its own.

Three fields are required:

* **Start URL**: the root page the crawler starts from
* **Navigation Instruction**: plain English instructions for which links to follow and which to skip
* **Extraction Instruction**: what data to pull from each page the crawler visits

Under **Options**, set **Max Pages** to cap how many pages the crawl visits. Proxy and cookie options work the same as in Scrape.

Once a crawl completes, use **Get Pages** with the job ID to retrieve signed download URLs for the raw HTML and Markdown of every crawled page. URLs expire after one hour.

Use **Extract** to re-run AI extraction on a completed crawl with a new instruction, without re-crawling any pages. This only charges transformation credits.

## Logs and Usage

**Logs: List** returns paginated scrape logs for your account. Filter by status (success, error, in progress) and search by URL or preset name using the **Filters** collection.

**Logs: Get** returns the full detail of a single log entry including the AI extraction output.

**Usage: Get Stats** returns credit usage, request counts, and bandwidth broken down by day or week. Choose the time window from the **Time Range** dropdown: Last 7 Days, Last 30 Days, or This Week.

## Using as an AI tool

The Spidra node has `usableAsTool` enabled, which means you can connect it directly to an **AI Agent** node in n8n. The agent can call Spidra to fetch live web data as part of its reasoning without any additional setup on your end.

## Error handling

Enable **Continue On Fail** on the node to prevent a single failed item from stopping the whole workflow. When enabled, errors are returned as `{ error: "..." }` in the output and execution continues with the next item.