> ## Documentation Index
> Fetch the complete documentation index at: https://docs.spidra.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Node

> Scrape pages, run browser actions, batch-process URLs, and crawl entire sites using the Spidra Node SDK.

Scrape pages, run browser actions, batch-process URLs, and crawl entire sites. All results come back as structured data ready to feed into your LLM pipelines or store directly.

## Installation

To install the Spidra Node SDK, you can use npm:

```bash theme={null}
npm install spidra-js
```

<Note>
  Get your API key from [app.spidra.io](https://app.spidra.io) under **Settings**
  → **API Keys**. Never hardcode it in source files — use an environment variable
  instead.
</Note>

## Setup

Here’s an example of initializing the Spidra client in a Node.js or TypeScript project.

```typescript theme={null}
import { SpidraClient } from 'spidra-js';

const spidra = new SpidraClient({ apiKey: process.env.SPIDRA_API_KEY });
```

## Scraping

All scrape jobs run asynchronously. `run()` submits a job and polls until it finishes. For manual control, use `submit()` and `get()` directly. Up to **3 URLs** can be passed per request and are processed in parallel.

### Scrape a web page

Submit a scrape job and wait for results.

```typescript theme={null}
const job = await spidra.scrape.run({
	urls: [{ url: 'https://example.com/pricing' }],
	prompt: 'Extract all pricing plans with name, price, and included features',
	output: 'json',
});

console.log(job.result.content);
// { plans: [{ name: "Starter", price: "$9/mo", features: [...] }, ...] }
```

**Parameters**

| Parameter            | Type                                    | Description                                                         |
| -------------------- | --------------------------------------- | ------------------------------------------------------------------- |
| `urls`               | `{ url: string, actions?: Action[] }[]` | URLs to scrape, with optional per-URL browser actions               |
| `prompt`             | `string`                                | AI extraction instruction                                           |
| `output`             | `"markdown"` \| `"json"`                | Response format. Defaults to `"markdown"`                           |
| `schema`             | `object`                                | JSON Schema for guaranteed output shape (use with `output: "json"`) |
| `useProxy`           | `boolean`                               | Route through a residential proxy                                   |
| `proxyCountry`       | `string`                                | Two-letter country code, e.g. `"us"`, `"de"`, `"jp"`                |
| `extractContentOnly` | `boolean`                               | Strip navigation, ads, and boilerplate before AI extraction         |
| `screenshot`         | `boolean`                               | Capture a screenshot of the page                                    |
| `fullPageScreenshot` | `boolean`                               | Capture a full-page (scrolled) screenshot                           |
| `cookies`            | `string`                                | Raw `Cookie` header string for authenticated pages                  |

### Fire-and-forget approach

Fire-and-forget approach: submit a job immediately and poll on your own schedule.

```typescript theme={null}
// Submit — returns immediately with a jobId
const { jobId } = await spidra.scrape.submit({
	urls: [{ url: 'https://example.com' }],
	prompt: 'Extract the main headline',
});

// Check status at any time
const status = await spidra.scrape.get(jobId);

if (status.status === 'completed') {
	console.log(status.result.content);
} else if (status.status === 'failed') {
	console.error(status.error);
}
```

**Job statuses:** `waiting` · `active` · `completed` · `failed`

### Structured JSON output

Pass a `schema` to enforce an exact output shape. Missing fields come back as `null` rather than hallucinated values.

```typescript theme={null}
const job = await spidra.scrape.run({
	urls: [{ url: 'https://jobs.example.com/senior-engineer' }],
	prompt: 'Extract the job listing details',
	output: 'json',
	schema: {
		type: 'object',
		required: ['title', 'company', 'remote'],
		properties: {
			title: { type: 'string' },
			company: { type: 'string' },
			remote: { type: ['boolean', 'null'] },
			salary_min: { type: ['number', 'null'] },
			salary_max: { type: ['number', 'null'] },
			skills: { type: 'array', items: { type: 'string' } },
		},
	},
});
```

### Geo-targeted scraping

Route through a residential proxy in a specific country for geo-restricted content or localized pricing.

```typescript theme={null}
const job = await spidra.scrape.run({
	urls: [{ url: 'https://www.amazon.de/gp/bestsellers' }],
	prompt: 'List the top 10 products with name and price',
	useProxy: true,
	proxyCountry: 'de',
});
```

Supported codes include `us`, `gb`, `de`, `fr`, `jp`, `au`, `ca`, `br`, `in`, `nl`, `sg`, `es`, `it`, `mx`, and [40+ more](/features/stealth-mode#country-targeting). Use `"global"` or `"eu"` for regional routing.

### Authenticated pages

Pass session cookies as a raw header string to scrape pages behind a login.

```typescript theme={null}
const job = await spidra.scrape.run({
	urls: [{ url: 'https://app.example.com/dashboard' }],
	prompt: 'Extract the monthly revenue and active user count',
	cookies: 'session=abc123; auth_token=xyz789',
});
```

### Browser actions

Run actions against the page before extraction. They execute in order — the scrape happens after all actions complete.

```typescript theme={null}
const job = await spidra.scrape.run({
	urls: [
		{
			url: 'https://example.com/products',
			actions: [
				{ type: 'click', selector: '#accept-cookies' },
				{ type: 'wait', duration: 1000 },
				{ type: 'scroll', to: '80%' },
			],
		},
	],
	prompt: 'Extract all product names and prices',
});
```

**Available actions**

| Action    | Required fields       | Description                                          |
| --------- | --------------------- | ---------------------------------------------------- |
| `click`   | `selector` or `value` | Click a button, link, or any element                 |
| `type`    | `selector`, `value`   | Type text into an input or textarea                  |
| `check`   | `selector` or `value` | Check a checkbox                                     |
| `uncheck` | `selector` or `value` | Uncheck a checkbox                                   |
| `wait`    | `duration` (ms)       | Pause for a set number of milliseconds               |
| `scroll`  | `to` (`0–100%`)       | Scroll the page to a percentage of its height        |
| `forEach` | `observe`             | Loop over every matched element and process each one |

Use `selector` for a CSS selector or XPath. Use `value` for plain English — Spidra locates the element using AI.

```typescript theme={null}
{ type: 'click', selector: "button[data-testid='submit']" }   // CSS selector
{ type: 'click', value: 'Accept all cookies button' }          // plain English
{ type: 'type',  selector: "input[name='q']", value: 'wireless headphones' }
{ type: 'wait',  duration: 2000 }
{ type: 'scroll', to: '100%' }
```

### forEach — loop over every element

`forEach` finds a set of matching elements on the page and processes each one individually. Use it when you need to collect data from a list of items, paginate across pages, or click into each item's detail page.

<Tip>
  You don't need `forEach` if all the data fits on a single page — a plain
  `prompt` is simpler and works just as well.
</Tip>

**Use `forEach` when:**

* The list spans multiple pages and you need `pagination`
* You need to click into each item's detail page (`navigate` mode)
* You have 20+ items and want consistent per-item AI extraction (`itemPrompt`)

#### inline mode

Read each element's content directly without navigating away. Best for product cards, search results, and table rows.

```typescript theme={null}
const job = await spidra.scrape.run({
	urls: [
		{
			url: 'https://books.toscrape.com',
			actions: [
				{
					type: 'forEach',
					observe: 'Find all book cards in the product grid',
					mode: 'inline',
					captureSelector: 'article.product_pod',
					maxItems: 20,
					itemPrompt:
						'Extract title, price, and star rating as JSON: {title, price, star_rating}',
				},
			],
		},
	],
	prompt: 'Return a clean JSON array of all books',
	output: 'json',
});
```

#### navigate mode

Follow each element's link to its destination page and capture content there. Best for product listings where full details are only on individual pages.

```typescript theme={null}
{
  type:            'forEach',
  observe:         'Find all book title links in the product grid',
  mode:            'navigate',
  captureSelector: 'article.product_page',
  maxItems:        10,
  waitAfterClick:  800,
  itemPrompt:      'Extract title, price, star rating, and availability as JSON',
}
```

#### click mode

Click each element, capture the content that appears (modal, drawer, or expanded section), then move on. Best for hotel room cards, FAQ accordions, or any UI where clicking reveals hidden content.

```typescript theme={null}
{
  type:            'forEach',
  observe:         'Find all room type cards',
  mode:            'click',
  captureSelector: "[role='dialog']",
  maxItems:        8,
  waitAfterClick:  1200,
  itemPrompt:      'Extract room name, bed type, price per night, and amenities as JSON',
}
```

#### Pagination

After processing all elements on the current page, follow the next-page link and continue.

```typescript theme={null}
{
  type:     'forEach',
  observe:  'Find all book title links',
  mode:     'navigate',
  maxItems: 40,
  pagination: {
    nextSelector: 'li.next > a',
    maxPages:     3,   // 3 additional pages beyond the first
  },
}
```

`maxItems` applies across all pages combined. The loop stops when you hit `maxItems`, run out of elements, or reach `maxPages`.

#### Per-element actions

Run extra browser actions on each item after navigating or clicking into it, before content is captured. Useful for scrolling below the fold or expanding collapsed sections.

```typescript theme={null}
{
  type:            'forEach',
  observe:         'Find all book title links',
  mode:            'navigate',
  captureSelector: 'article.product_page',
  maxItems:        5,
  waitAfterClick:  1000,
  actions: [
    { type: 'scroll', to: '50%' },
  ],
  itemPrompt: 'Extract title, price, and full description as JSON',
}
```

#### itemPrompt vs top-level prompt

Both are optional and serve different purposes:

|        | `itemPrompt`                    | `prompt`                      |
| ------ | ------------------------------- | ----------------------------- |
| Runs   | During scraping, once per item  | After all items are collected |
| Sees   | One item's content              | All items combined            |
| Output | `result.data[].markdownContent` | `result.content`              |

Use `itemPrompt` to extract fields from each item individually. Use the top-level `prompt` to filter, sort, or reshape the combined output. They can be used together.

### Poll options

`scrape.run()`, `batch.run()`, and `crawl.run()` accept a second argument to control polling behavior.

```typescript theme={null}
const job = await spidra.scrape.run(params, {
	pollInterval: 3000, // ms between status checks (default: 3000)
	timeout: 120_000, // max wait in ms before throwing (default: 120000)
});
```

***

## Batch scraping

Submit up to 50 URLs in one request. All URLs are processed in parallel. Each URL is a plain string (not an object).

### batch.run()

```typescript theme={null}
const batch = await spidra.batch.run({
	urls: [
		'https://shop.example.com/product/1',
		'https://shop.example.com/product/2',
		'https://shop.example.com/product/3',
	],
	prompt: 'Extract product name, price, and availability',
	output: 'json',
	useProxy: true,
});

console.log(`${batch.completedCount}/${batch.totalUrls} succeeded`);

for (const item of batch.items) {
	if (item.status === 'completed') console.log(item.url, item.result);
	if (item.status === 'failed') console.error(item.url, item.error);
}
```

**Item statuses:** `pending` · `running` · `completed` · `failed`

**Batch statuses:** `pending` · `running` · `completed` · `failed` · `cancelled`

### batch.submit() + batch.get()

```typescript theme={null}
const { batchId } = await spidra.batch.submit({
	urls: ['https://example.com/1', 'https://example.com/2'],
	prompt: 'Extract the page title',
});

const result = await spidra.batch.get(batchId);
console.log(result.status, result.completedCount, '/', result.totalUrls);
```

### Retry failed items

Re-queue only the items that failed — successful items are not re-run.

```typescript theme={null}
const result = await spidra.batch.get(batchId);

if (result.failedCount > 0) {
	const { retriedCount } = await spidra.batch.retry(batchId);
	console.log(`Retrying ${retriedCount} items`);
}
```

### Cancel a batch

Stops all pending items and refunds credits for unprocessed work.

```typescript theme={null}
const { cancelledItems, creditsRefunded } = await spidra.batch.cancel(batchId);
console.log(
	`Cancelled ${cancelledItems} items, refunded ${creditsRefunded} credits`,
);
```

### List past batches

```typescript theme={null}
const { jobs, pagination } = await spidra.batch.list({ page: 1, limit: 20 });

for (const job of jobs) {
	console.log(job.uuid, job.status, `${job.completedCount}/${job.totalUrls}`);
}
```

## Crawling

Give Spidra a starting URL and instructions for which links to follow. It discovers pages automatically, extracts structured data from each one, and returns everything when the crawl is done.

### crawl.run()

```typescript theme={null}
const job = await spidra.crawl.run({
	baseUrl: 'https://competitor.com/blog',
	crawlInstruction: 'Follow blog post links only, skip tag and category pages',
	transformInstruction:
		'Extract the title, author, publish date, and a one-sentence summary',
	maxPages: 30,
	useProxy: true,
});

for (const page of job.result) {
	console.log(page.url, page.data);
}
```

**Parameters**

| Parameter              | Type      | Description                                        |
| ---------------------- | --------- | -------------------------------------------------- |
| `baseUrl`              | `string`  | Starting URL for the crawl                         |
| `crawlInstruction`     | `string`  | Which links to follow and which to skip            |
| `transformInstruction` | `string`  | What to extract from each page                     |
| `maxPages`             | `number`  | Maximum number of pages to crawl                   |
| `useProxy`             | `boolean` | Route through a residential proxy                  |
| `proxyCountry`         | `string`  | Two-letter country code, e.g. `"us"`               |
| `cookies`              | `string`  | Raw `Cookie` header string for authenticated sites |

### crawl.submit() + crawl.get()

```typescript theme={null}
const { jobId } = await spidra.crawl.submit({
	baseUrl: 'https://example.com/docs',
	crawlInstruction: 'Find all documentation pages',
	transformInstruction: 'Extract the page title and main content summary',
	maxPages: 50,
});

// Poll manually
const status = await spidra.crawl.get(jobId);
// status: "waiting" | "active" | "running" | "completed" | "failed"
```

### crawl.pages() — download crawled content

Returns signed S3 URLs for the raw HTML and Markdown of each crawled page. Links expire after **1 hour**.

```typescript theme={null}
const { pages } = await spidra.crawl.pages(jobId);

for (const page of pages) {
	console.log(page.url, page.status);
	// page.html_url     — download raw HTML
	// page.markdown_url — download Markdown version
}
```

### crawl.extract() — re-extract without re-crawling

Apply a new AI prompt to an existing completed crawl without fetching the pages again. Only transformation credits are charged.

```typescript theme={null}
const { jobId: newJobId } = await spidra.crawl.extract(
	sourceJobId,
	'Extract only the product SKUs and prices as a CSV',
);

// Poll the new extraction job
const result = await spidra.crawl.get(newJobId);
```

### History and stats

```typescript theme={null}
// List past crawl jobs
const { jobs, total, page, totalPages } = await spidra.crawl.history({
	page: 1,
	limit: 10,
});

// Total crawl job count for your account
const { total: totalCrawls } = await spidra.crawl.stats();
```

## Logs

Every API scrape job is logged automatically. Access your full history with optional filters.

### logs.list()

```typescript theme={null}
const { logs, total } = await spidra.logs.list({
	status: 'failed', // "success" | "failed"
	searchTerm: 'amazon.com',
	channel: 'api', // "api" | "playground"
	dateStart: '2024-01-01',
	dateEnd: '2024-12-31',
	page: 1,
	limit: 20,
});

for (const log of logs) {
	console.log(log.urls[0]?.url, log.status, log.credits_used);
}
```

**Filter parameters**

| Parameter    | Type                      | Description                                   |
| ------------ | ------------------------- | --------------------------------------------- |
| `status`     | `"success"` \| `"failed"` | Filter by outcome                             |
| `searchTerm` | `string`                  | Search by URL or prompt                       |
| `channel`    | `string`                  | `"api"` or `"playground"`                     |
| `dateStart`  | `string`                  | ISO date — return logs on or after this date  |
| `dateEnd`    | `string`                  | ISO date — return logs on or before this date |
| `page`       | `number`                  | Page number (default: 1)                      |
| `limit`      | `number`                  | Results per page (default: 20)                |

### logs.get()

Get a single log entry including the full AI extraction result.

```typescript theme={null}
const log = await spidra.logs.get(logUuid);
console.log(log.result_data); // full AI output for that job
```

## Usage statistics

Returns credit and request usage broken down by day or week.

```typescript theme={null}
// Range options: "7d" | "30d" | "weekly"
const rows = await spidra.usage.get('30d');

for (const row of rows) {
	console.log(row.date, row.requests, row.credits);
}
```

| Range      | Description                    |
| ---------- | ------------------------------ |
| `"7d"`     | Last 7 days, one row per day   |
| `"30d"`    | Last 30 days, one row per day  |
| `"weekly"` | Last 7 weeks, one row per week |

## Error handling

Every non-2xx response throws a typed error class. Catch the specific class you care about, or fall back to the base `SpidraError`.

```typescript theme={null}
import {
	SpidraClient,
	SpidraError,
	SpidraAuthenticationError,
	SpidraInsufficientCreditsError,
	SpidraRateLimitError,
	SpidraServerError,
} from 'spidra-js';

try {
	await spidra.scrape.run({
		urls: [{ url: 'https://example.com' }],
		prompt: '...',
	});
} catch (err) {
	if (err instanceof SpidraAuthenticationError) {
		console.error('Invalid or missing API key'); // 401
	} else if (err instanceof SpidraInsufficientCreditsError) {
		console.error('Out of credits — top up your account'); // 403
	} else if (err instanceof SpidraRateLimitError) {
		console.error('Rate limited — back off and retry'); // 429
	} else if (err instanceof SpidraServerError) {
		console.error('Server error — try again shortly'); // 500
	} else if (err instanceof SpidraError) {
		console.error(`API error ${err.status}: ${err.message}`);
	}
}
```

**Error classes**

| Class                            | Status | When                                  |
| -------------------------------- | ------ | ------------------------------------- |
| `SpidraAuthenticationError`      | 401    | Missing or invalid `x-api-key` header |
| `SpidraInsufficientCreditsError` | 403    | Account has no remaining credits      |
| `SpidraRateLimitError`           | 429    | Too many requests                     |
| `SpidraServerError`              | 500    | Unexpected error on Spidra's side     |
| `SpidraError`                    | other  | Any other non-2xx response            |

All error classes expose `err.status` (HTTP status code) and `err.message`.

## AI agent integration

Spidra works as a tool inside AI agent pipelines. Here is an example using the Vercel AI SDK with Claude:

```typescript theme={null}
import { generateText, tool } from 'ai';
import { anthropic } from '@ai-sdk/anthropic';
import { SpidraClient } from 'spidra-js';
import { z } from 'zod';

const spidra = new SpidraClient({ apiKey: process.env.SPIDRA_API_KEY });

const result = await generateText({
	model: anthropic('claude-opus-4-6'),
	maxSteps: 5,
	tools: {
		scrapeUrl: tool({
			description: 'Fetch and extract structured data from a URL',
			parameters: z.object({
				url: z.string().describe('The URL to scrape'),
				prompt: z.string().describe('What data to extract'),
			}),
			execute: async ({ url, prompt }) => {
				const job = await spidra.scrape.run({ urls: [{ url }], prompt });
				return JSON.stringify(job.result.content);
			},
		}),
	},
	prompt: 'What are the top 3 trending repositories on GitHub today?',
});

console.log(result.text);
```

<CardGroup cols={2}>
  <Card title="SDKs Overview" icon="puzzle-piece" href="/sdks/overview">
    Browse all official Spidra SDKs in one place.
  </Card>

  <Card title="PHP" icon="php" href="/sdks/php">
    Official PHP SDK — idiomatic helpers, typed exceptions, and configurable polling.
  </Card>
</CardGroup>
