Node - Spidra

The SDK handles the plumbing you’d otherwise write yourself like job submission and polling, automatic retries with backoff for transient failures, typed errors for everything that can go wrong, and streaming results for long-running jobs. It has zero dependencies and runs anywhere fetch exists like Node 18+, browsers, and edge runtimes like Cloudflare Workers.

Installation

To install the Spidra Node SDK, you can use npm:

npm install spidra

Get your API key from app.spidra.io under Settings > API Keys. Never hardcode it in source files. Use an environment variable instead.

Setup

Here’s an example of initializing the Spidra client in a Node.js or TypeScript project.

import { SpidraClient } from 'spidra';

const spidra = new SpidraClient({ apiKey: process.env.SPIDRA_API_KEY });

Scraping

All scrape jobs run asynchronously. run() submits a job and polls until it finishes. For manual control, use submit() and get() directly. Up to 3 URLs can be passed per request and are processed in parallel.

Scrape a web page

Submit a scrape job and wait for results.

const job = await spidra.scrape.run({
	urls: [{ url: 'https://example.com/pricing' }],
	prompt: 'Extract all pricing plans with name, price, and included features',
	output: 'json',
});

console.log(job.result.content);
// { plans: [{ name: "Starter", price: "$9/mo", features: [...] }, ...] }

Parameters

Parameter	Type	Description
`urls`	`{ url: string, actions?: Action[] }[]`	URLs to scrape, with optional per-URL browser actions
`prompt`	`string`	AI extraction instruction
`output`	`"markdown"` \| `"json"`	Response format. Defaults to `"markdown"`
`schema`	`object`	JSON Schema for guaranteed output shape (use with `output: "json"`)
`useProxy`	`boolean`	Route through a residential proxy
`proxyCountry`	`string`	Two-letter country code, e.g. `"us"`, `"de"`, `"jp"`
`extractContentOnly`	`boolean`	Strip navigation, ads, and boilerplate before AI extraction
`screenshot`	`boolean`	Capture a screenshot of the page
`fullPageScreenshot`	`boolean`	Capture a full-page (scrolled) screenshot
`cookies`	`string`	Raw `Cookie` header string for authenticated pages

Fire-and-forget approach

Fire-and-forget approach: submit a job immediately and poll on your own schedule.

// Submit — returns immediately with a jobId
const { jobId } = await spidra.scrape.submit({
	urls: [{ url: 'https://example.com' }],
	prompt: 'Extract the main headline',
});

// Check status at any time
const status = await spidra.scrape.get(jobId);

if (status.status === 'completed') {
	console.log(status.result.content);
} else if (status.status === 'failed') {
	console.error(status.error);
}

Job statuses: queued · waiting · active · completed · failed

Structured JSON output

Pass a schema to enforce an exact output shape. Missing fields come back as null rather than hallucinated values — which matters when the output feeds a database or a typed pipeline downstream.

Define every field you want extracted. An untyped object with no properties (or an array of them) gives the AI nothing to fill in, so those members come back empty.

Use the Spidra JSON Schema Generator to build and preview your schema visually before pasting it here.

const job = await spidra.scrape.run({
	urls: [{ url: 'https://jobs.example.com/senior-engineer' }],
	prompt: 'Extract the job listing details',
	output: 'json',
	schema: {
		type: 'object',
		required: ['title', 'company', 'remote'],
		properties: {
			title: { type: 'string' },
			company: { type: 'string' },
			remote: { type: ['boolean', 'null'] },
			salary_min: { type: ['number', 'null'] },
			salary_max: { type: ['number', 'null'] },
			skills: { type: 'array', items: { type: 'string' } },
		},
	},
});

Structured output with Zod

If you already use Zod, skip the JSON Schema entirely — pass your Zod schema and the SDK converts it for you. The result is typed from your schema, so content needs no casting:

import { z } from 'zod';

const JobListing = z.object({
	title: z.string(),
	company: z.string(),
	remote: z.boolean().nullable(),
	skills: z.array(z.string()),
});

const job = await spidra.scrape.run({
	urls: [{ url: 'https://jobs.example.com/senior-engineer' }],
	prompt: 'Extract the job listing details',
	output: 'json',
	schema: JobListing,
});

job.result.content.title; // typed as string — no casting needed

The same works on batch.run() (types each item’s result) and crawl.run() (types each page’s data). Zod is an optional peer dependency — install it only if you use this. Passing MySchema.shape by mistake throws a helpful error instead of failing silently.

Geo-targeted scraping

Route through a residential proxy in a specific country for geo-restricted content or localized pricing.

const job = await spidra.scrape.run({
	urls: [{ url: 'https://www.amazon.de/gp/bestsellers' }],
	prompt: 'List the top 10 products with name and price',
	useProxy: true,
	proxyCountry: 'de',
});

Supported codes include us, gb, de, fr, jp, au, ca, br, in, nl, sg, es, it, mx, and 40+ more. Use "global" or "eu" for regional routing.

Authenticated pages

Pass session cookies as a raw header string to scrape pages behind a login.

const job = await spidra.scrape.run({
	urls: [{ url: 'https://app.example.com/dashboard' }],
	prompt: 'Extract the monthly revenue and active user count',
	cookies: 'session=abc123; auth_token=xyz789',
});

Browser actions

Run actions against the page before extraction. They execute in order — the scrape happens after all actions complete.

const job = await spidra.scrape.run({
	urls: [
		{
			url: 'https://example.com/products',
			actions: [
				{ type: 'click', selector: '#accept-cookies' },
				{ type: 'wait', duration: 1000 },
				{ type: 'scroll', to: '80%' },
			],
		},
	],
	prompt: 'Extract all product names and prices',
});

Available actions

Action	Required fields	Description
`click`	`selector` or `value`	Click a button, link, or any element
`type`	`selector`, `value`	Type text into an input or textarea
`check`	`selector` or `value`	Check a checkbox
`uncheck`	`selector` or `value`	Uncheck a checkbox
`wait`	`duration` (ms)	Pause for a set number of milliseconds
`scroll`	`to` (`0–100%`)	Scroll the page to a percentage of its height
`forEach`	`observe`	Loop over every matched element and process each one

Use selector for a CSS selector or XPath. Use value for plain English — Spidra locates the element using AI.

{ type: 'click', selector: "button[data-testid='submit']" }   // CSS selector
{ type: 'click', value: 'Accept all cookies button' }          // plain English
{ type: 'type',  selector: "input[name='q']", value: 'wireless headphones' }
{ type: 'wait',  duration: 2000 }
{ type: 'scroll', to: '100%' }

forEach — loop over every element

forEach finds a set of matching elements on the page and processes each one individually. Use it when you need to collect data from a list of items, paginate across pages, or click into each item’s detail page.

You don’t need forEach if all the data fits on a single page — a plain prompt is simpler and works just as well.

Use forEach when:

The list spans multiple pages and you need pagination
You need to click into each item’s detail page (navigate mode)
You have 20+ items and want consistent per-item AI extraction (itemPrompt)

inline mode

Read each element’s content directly without navigating away. Best for product cards, search results, and table rows.

const job = await spidra.scrape.run({
	urls: [
		{
			url: 'https://books.toscrape.com',
			actions: [
				{
					type: 'forEach',
					observe: 'Find all book cards in the product grid',
					mode: 'inline',
					captureSelector: 'article.product_pod',
					maxItems: 20,
					itemPrompt:
						'Extract title, price, and star rating as JSON: {title, price, star_rating}',
				},
			],
		},
	],
	prompt: 'Return a clean JSON array of all books',
	output: 'json',
});

navigate mode

Follow each element’s link to its destination page and capture content there. Best for product listings where full details are only on individual pages.

{
  type:            'forEach',
  observe:         'Find all book title links in the product grid',
  mode:            'navigate',
  captureSelector: 'article.product_page',
  maxItems:        10,
  waitAfterClick:  800,
  itemPrompt:      'Extract title, price, star rating, and availability as JSON',
}

click mode

Click each element, capture the content that appears (modal, drawer, or expanded section), then move on. Best for hotel room cards, FAQ accordions, or any UI where clicking reveals hidden content.

{
  type:            'forEach',
  observe:         'Find all room type cards',
  mode:            'click',
  captureSelector: "[role='dialog']",
  maxItems:        8,
  waitAfterClick:  1200,
  itemPrompt:      'Extract room name, bed type, price per night, and amenities as JSON',
}

Pagination

After processing all elements on the current page, follow the next-page link and continue.

{
  type:     'forEach',
  observe:  'Find all book title links',
  mode:     'navigate',
  maxItems: 40,
  pagination: {
    nextSelector: 'li.next > a',
    maxPages:     3,   // 3 additional pages beyond the first
  },
}

maxItems applies across all pages combined. The loop stops when you hit maxItems, run out of elements, or reach maxPages.

Per-element actions

Run extra browser actions on each item after navigating or clicking into it, before content is captured. Useful for scrolling below the fold or expanding collapsed sections.

{
  type:            'forEach',
  observe:         'Find all book title links',
  mode:            'navigate',
  captureSelector: 'article.product_page',
  maxItems:        5,
  waitAfterClick:  1000,
  actions: [
    { type: 'scroll', to: '50%' },
  ],
  itemPrompt: 'Extract title, price, and full description as JSON',
}

itemPrompt vs top-level prompt

Both are optional and serve different purposes:

	`itemPrompt`	`prompt`
Runs	During scraping, once per item	After all items are collected
Sees	One item’s content	All items combined
Output	Feeds into the top-level `prompt`	`result.content`

Use itemPrompt to extract fields from each item individually. Use the top-level prompt to filter, sort, or reshape the combined output. They can be used together.

Controlling how long run() waits

By default run() waits until the job finishes, however long that takes — so a big crawl just works. If you’d rather cap the wait, pass a timeout; when it fires, a SpidraTimeoutError is thrown and the job keeps running server-side, so you can check it later with get() or cancel it.

const controller = new AbortController();

const job = await spidra.scrape.run(params, {
	pollInterval: 3000,              // ms between status checks (default: 3000)
	timeout: 600_000,                // max wait in ms (default: none — wait until done)
	signal: controller.signal,       // stop waiting without cancelling the job
});

Transient hiccups mid-wait — a 502 blip, a dropped connection, a rate limit — don’t kill the wait; the SDK keeps polling unless several happen in a row. The same options work on batch.run() and crawl.run().

Batch scraping

When you have a list of URLs to process — a product catalog, a set of listings, a pile of article links — batch is the right tool. Submit up to 50 URLs in one request and they all run in parallel. Each URL is a plain string (not an object).

Scrape a list of URLs

const batch = await spidra.batch.run({
	urls: [
		'https://shop.example.com/product/1',
		'https://shop.example.com/product/2',
		'https://shop.example.com/product/3',
	],
	prompt: 'Extract product name, price, and availability',
	output: 'json',
	useProxy: true,
});

console.log(`${batch.completedCount}/${batch.totalUrls} succeeded`);

for (const item of batch.items) {
	if (item.status === 'completed') console.log(item.url, item.result);
	if (item.status === 'failed') console.error(item.url, item.error);
}

Item statuses: pending · running · completed · failed Batch statuses: pending · running · completed · failed · cancelled

Submit now, check later

If you don’t want to hold a connection open while 50 pages scrape, submit the batch and come back whenever:

const { batchId } = await spidra.batch.submit({
	urls: ['https://example.com/1', 'https://example.com/2'],
	prompt: 'Extract the page title',
});

// Later...
const result = await spidra.batch.get(batchId);
console.log(result.status, result.completedCount, '/', result.totalUrls);

Stream results as they finish

Rather than waiting for the whole batch, watch() hands you each item the moment it completes — useful for writing results to a database as they arrive or updating a progress bar:

const { batchId } = await spidra.batch.submit({ urls, prompt: 'Extract product data' });

const watcher = spidra.batch.watch(batchId);

watcher.on('item', (item) => {
	console.log(item.url, item.status, item.result); // fires once per finished URL
});

const final = await watcher.wait(); // resolves with the terminal batch state

Every item is delivered exactly once — including ones that already finished before you started watching. watcher.stop() stops listening without cancelling the batch.

Retry failed items

Re-queue only the items that failed — successful items are not re-run.

const result = await spidra.batch.get(batchId);

if (result.failedCount > 0) {
	const { retriedCount } = await spidra.batch.retry(batchId);
	console.log(`Retrying ${retriedCount} items`);
}

Cancel a batch

Stops all pending items and refunds credits for unprocessed work.

const { cancelledItems, creditsRefunded } = await spidra.batch.cancel(batchId);
console.log(
	`Cancelled ${cancelledItems} items, refunded ${creditsRefunded} credits`,
);

List past batches

const { jobs, pagination } = await spidra.batch.list({ page: 1, limit: 20 });

for (const job of jobs) {
	console.log(job.uuid, job.status, `${job.completedCount}/${job.totalUrls}`);
}

Crawling

Crawling is different from scraping: you give it a starting URL and it discovers pages on its own, following links according to your instructions. Good for indexing a docs site, monitoring a competitor’s blog, or building a structured dataset from an entire section of a site.

Crawl a site

const job = await spidra.crawl.run({
	baseUrl: 'https://competitor.com/blog',
	crawlInstruction: 'Follow blog post links only, skip tag and category pages',
	transformInstruction: 'Extract the title, author, publish date, and a one-sentence summary',
	maxPages: 30,
	useProxy: true,
});

for (const page of job.result) {
	console.log(page.url, page.data);
}

Parameters

Parameter	Type	Default	Description
`baseUrl`	`string`	required	Starting URL for the crawl
`crawlInstruction`	`string`	required	Which links to follow, in plain language
`transformInstruction`	`string`	—	What to extract from each page. Omit for raw markdown mode (no AI, no token credits)
`schema`	`object`	—	JSON Schema defining the exact output structure per page
`maxPages`	`number`	`5`	Maximum pages to crawl (1–50)
`maxDepth`	`number`	unlimited	Max link depth from the base URL. `0` = base URL only
`includePaths`	`string[]`	—	URL path patterns to include, e.g. `["/blog/*"]`
`excludePaths`	`string[]`	—	URL path patterns to skip, e.g. `["/tag/*"]`
`allowSubdomains`	`boolean`	`false`	Follow links to subdomains of the base domain
`crawlEntireDomain`	`boolean`	`false`	Follow any link on the same root domain
`ignoreQueryParams`	`boolean`	`false`	Treat URLs differing only by query string as the same page
`webhookUrl`	`string`	—	Receive a POST request for each processed page and on job completion
`useProxy`	`boolean`	`false`	Route through a residential proxy
`proxyCountry`	`string`	`"global"`	Two-letter country code, e.g. `"us"`. Requires `useProxy: true`
`cookies`	`string`	—	Raw `Cookie` header string for authenticated sites

Raw content mode

Omit both transformInstruction and schema to get the raw markdown of each page with no AI processing. No token credits are charged:

const job = await spidra.crawl.run({
	baseUrl: 'https://docs.example.com',
	crawlInstruction: 'Crawl all documentation pages',
	maxPages: 50,
});

for (const page of job.result) {
	// page.data contains the raw markdown of each page
	console.log(page.url, page.data);
}

Structured output with schema

Use schema when you need every page to return the same fields in the same format:

Use the Spidra JSON Schema Generator to build and preview your schema visually before pasting it here.

const job = await spidra.crawl.run({
	baseUrl: 'https://example.com/jobs',
	crawlInstruction: 'Crawl all job listing pages',
	schema: {
		type: 'object',
		properties: {
			title:    { type: 'string' },
			location: { type: 'string' },
			salary:   { type: 'string' },
			remote:   { type: 'boolean' },
		},
	},
	maxPages: 20,
});

Scoped crawling with path filters

Use includePaths and excludePaths to keep crawls focused. Both accept glob-style patterns:

const job = await spidra.crawl.run({
	baseUrl: 'https://example.com',
	crawlInstruction: 'Crawl all documentation pages',
	transformInstruction: 'Extract the page title and main content',
	includePaths: ['/docs/*'],
	excludePaths: ['/docs/changelog/*', '/docs/legacy/*'],
	maxPages: 30,
});

Submit now, check later

const { jobId } = await spidra.crawl.submit({
	baseUrl: 'https://example.com/docs',
	crawlInstruction: 'Find all documentation pages',
	transformInstruction: 'Extract the page title and main content summary',
	maxPages: 50,
});

// Poll manually
const status = await spidra.crawl.get(jobId);
// status.status: "waiting" | "active" | "running" | "completed" | "failed" | "cancelled"

Watch a crawl page-by-page

A 50-page crawl can take a while. Instead of waiting for the whole thing, watch() streams each page to you the moment it’s crawled:

const { jobId } = await spidra.crawl.submit({
	baseUrl: 'https://competitor.com/blog',
	crawlInstruction: 'Follow blog post links only',
	transformInstruction: 'Extract title, author, and publish date',
	maxPages: 50,
});

const watcher = spidra.crawl.watch(jobId);

watcher.on('page', (page) => {
	console.log(page.url, page.data); // fires once per crawled page
});
watcher.on('error', (err) => console.error(err));

const final = await watcher.wait();

Every page arrives exactly once — including pages crawled before you started watching — and the SDK only re-fetches page content when the crawl actually makes progress, so watching stays cheap. watcher.stop() stops listening without cancelling the crawl.

Cancel a crawl

Cancel a queued or running job at any time. Pages already processed are preserved:

await spidra.crawl.cancel(jobId);

// Retrieve whatever completed before cancellation
const { pages } = await spidra.crawl.pages(jobId);
for (const page of pages) {
	if (page.status === 'success') console.log(page.url, page.data);
}

Download the raw HTML and Markdown

crawl.pages() returns signed URLs for the raw HTML and Markdown of each crawled page. Links expire after 1 hour.

const { pages } = await spidra.crawl.pages(jobId);

for (const page of pages) {
	console.log(page.url, page.status);
	// page.html     — signed URL for the raw HTML snapshot
	// page.markdown — signed URL for the Markdown version
}

Re-extract with a different prompt

Crawled a site and want to pull out different information? You don’t have to re-crawl. crawl.extract() runs a new AI pass over the already-crawled content and charges only transformation credits.

const { jobId: newJobId } = await spidra.crawl.extract(
	sourceJobId,
	'Extract only the product SKUs and prices as a CSV',
);

const result = await spidra.crawl.get(newJobId);

History and stats

// List past crawl jobs
const { jobs, total, page, totalPages } = await spidra.crawl.history({
	page: 1,
	limit: 10,
});

// Total crawl job count for your account
const { total: totalCrawls } = await spidra.crawl.stats();

Logs

Every API scrape job is logged automatically. Access your full history with optional filters.

List and filter your logs

const { logs, total } = await spidra.logs.list({
	status: 'failed', // "success" | "failed"
	searchTerm: 'amazon.com',
	channel: 'api', // "api" | "playground"
	dateStart: '2024-01-01',
	dateEnd: '2024-12-31',
	page: 1,
	limit: 20,
});

for (const log of logs) {
	console.log(log.urls[0]?.url, log.status, log.credits_used);
}

Filter parameters

Parameter	Type	Description
`status`	`"success"` \| `"failed"`	Filter by outcome
`searchTerm`	`string`	Search by URL or prompt
`channel`	`string`	`"api"` or `"playground"`
`dateStart`	`string`	ISO date — return logs on or after this date
`dateEnd`	`string`	ISO date — return logs on or before this date
`page`	`number`	Page number (default: 1)
`limit`	`number`	Results per page (default: 20)

Get one log with its full output

Fetch a single log entry, including the complete AI extraction result for that job.

const log = await spidra.logs.get(logUuid);
console.log(log.result_data); // full AI output for that job

Usage statistics

Returns credit and request usage broken down by day or week.

// Range options: "7d" | "30d" | "weekly"
const rows = await spidra.usage.get('30d');

for (const row of rows) {
	console.log(row.date, row.requests, row.credits);
}

Range	Description
`"7d"`	Last 7 days, one row per day
`"30d"`	Last 30 days, one row per day
`"weekly"`	Last 7 weeks, one row per week

Retries and reliability

You don’t have to write retry loops. Transient failures — network blips, 502/503/504 gateway errors — are retried automatically with exponential backoff, so a single hiccup never fails your call. Both knobs are configurable:

const spidra = new SpidraClient({
	apiKey: process.env.SPIDRA_API_KEY,
	maxRetries: 3,      // retry attempts for transient failures (default: 3, 0 disables)
	backoffFactor: 500, // base ms — delay is backoffFactor * 2^attempt (default: 500)
});

The retry policy is designed so it can never double-charge you: 4xx client errors are never retried, and job submissions are only retried when the server explicitly rejected them — never on network errors or gateway timeouts, where the job may already have been queued. When the server sends a Retry-After hint, the SDK honors it instead of its own backoff.

Error handling

Every non-2xx response throws a typed error class. Catch the specific class you care about, or fall back to the base SpidraError.

import {
	SpidraClient,
	SpidraError,
	SpidraAuthenticationError,
	SpidraInsufficientCreditsError,
	SpidraValidationError,
	SpidraRateLimitError,
	SpidraServerError,
	SpidraJobError,
	SpidraTimeoutError,
} from 'spidra';

try {
	await spidra.scrape.run({
		urls: [{ url: 'https://example.com' }],
		prompt: '...',
	});
} catch (err) {
	if (err instanceof SpidraAuthenticationError) {
		console.error('Invalid or missing API key'); // 401
	} else if (err instanceof SpidraInsufficientCreditsError) {
		console.error('Out of credits — top up your account'); // 403
	} else if (err instanceof SpidraValidationError) {
		console.error('Bad request:', err.errors); // 422 — one entry per problem
	} else if (err instanceof SpidraRateLimitError) {
		// 429 — the error tells you exactly how long to wait
		console.error(`Rate limited: ${err.remaining}/${err.limit} left, retry in ${err.retryAfterMs}ms`);
	} else if (err instanceof SpidraJobError) {
		// the job itself failed or was cancelled (not a transport error)
		console.error(`Job ${err.jobId} ${err.jobStatus}: ${err.message}`);
	} else if (err instanceof SpidraTimeoutError) {
		// your poll timeout elapsed — the job is still running server-side
		console.error(`Still running after ${err.timeoutMs}ms, check ${err.jobId} later`);
	} else if (err instanceof SpidraServerError) {
		console.error('Server error — already retried automatically'); // 5xx
	} else if (err instanceof SpidraError) {
		console.error(`API error ${err.status}: ${err.message}`);
	}
}

Error classes

Class	Status	When
`SpidraAuthenticationError`	401	Missing or invalid API key
`SpidraPaymentRequiredError`	402	Subscription payment overdue
`SpidraInsufficientCreditsError`	403	Account has no remaining credits
`SpidraNotFoundError`	404	Job, batch, or log does not exist
`SpidraValidationError`	422	Request body failed validation — `err.errors` lists each problem
`SpidraRateLimitError`	429	Too many requests — carries `err.limit`, `err.remaining`, `err.resetAt`, `err.retryAfterMs`
`SpidraServerError`	5xx	Unexpected error on Spidra’s side
`SpidraJobError`	—	The job itself failed or was cancelled — carries `err.jobId`, `err.jobStatus`
`SpidraTimeoutError`	—	Your poll `timeout` elapsed; the job is still running — carries `err.jobId`
`SpidraError`	other	Any other non-2xx response

All error classes expose err.status (the HTTP status code, or 0 for non-HTTP errors like job failures and timeouts) and err.message. API errors also carry err.code (a machine-readable identifier like SERVICE_BUSY) and err.details (the raw error body).

Verifying webhooks

Crawl jobs can push crawl.page, crawl.completed, and crawl.failed events to your webhookUrl. Spidra signs each delivery with HMAC-SHA256 in the X-Spidra-Signature header, and the SDK ships a helper so you never accept a forged event:

import { verifySpidraWebhook } from 'spidra';

// Express example — verify against the RAW body, not the parsed JSON
app.post('/webhooks/spidra', express.raw({ type: 'application/json' }), async (req, res) => {
	const valid = await verifySpidraWebhook(
		req.body,
		req.header('x-spidra-signature'),
		process.env.SPIDRA_WEBHOOK_SECRET!,
	);
	if (!valid) return res.status(401).end();

	const event = JSON.parse(req.body.toString());
	if (event.event === 'crawl.page') {
		console.log('New page:', event.page.url);
	}
	res.status(200).end();
});

The comparison is constant-time, and the helper works in Node, browsers, and edge runtimes. Always pass the raw request body, re-serializing parsed JSON produces different bytes and fails verification.

AI agent integration

Spidra works as a tool inside AI agent pipelines. Here is an example using the Vercel AI SDK with Claude:

import { generateText, tool } from 'ai';
import { anthropic } from '@ai-sdk/anthropic';
import { SpidraClient } from 'spidra';
import { z } from 'zod';

const spidra = new SpidraClient({ apiKey: process.env.SPIDRA_API_KEY });

const result = await generateText({
	model: anthropic('claude-opus-4-6'),
	maxSteps: 5,
	tools: {
		scrapeUrl: tool({
			description: 'Fetch and extract structured data from a URL',
			parameters: z.object({
				url: z.string().describe('The URL to scrape'),
				prompt: z.string().describe('What data to extract'),
			}),
			execute: async ({ url, prompt }) => {
				const job = await spidra.scrape.run({ urls: [{ url }], prompt });
				return JSON.stringify(job.result.content);
			},
		}),
	},
	prompt: 'What are the top 3 trending repositories on GitHub today?',
});

console.log(result.text);

SDKs Overview

Browse all official Spidra SDKs in one place.

PHP

Official PHP SDK — idiomatic helpers, typed exceptions, and configurable polling.

​Installation

​Setup

​Scraping

​Scrape a web page

​Fire-and-forget approach

​Structured JSON output

​Structured output with Zod

​Geo-targeted scraping

​Authenticated pages

​Browser actions

​forEach — loop over every element

​inline mode

​navigate mode

​click mode

​Pagination

​Per-element actions

​itemPrompt vs top-level prompt

​Controlling how long run() waits

​Batch scraping

​Scrape a list of URLs

​Submit now, check later

​Stream results as they finish

​Retry failed items

​Cancel a batch

​List past batches

​Crawling

​Crawl a site

​Raw content mode

​Structured output with schema

​Scoped crawling with path filters

​Submit now, check later

​Watch a crawl page-by-page

​Cancel a crawl

​Download the raw HTML and Markdown

​Re-extract with a different prompt

​History and stats

​Logs

​List and filter your logs

​Get one log with its full output

​Usage statistics

​Retries and reliability

​Error handling

​Verifying webhooks

​AI agent integration

SDKs Overview

PHP

Installation

Setup

Scraping

Scrape a web page

Fire-and-forget approach

Structured JSON output

Structured output with Zod

Geo-targeted scraping

Authenticated pages

Browser actions

forEach — loop over every element

inline mode

navigate mode

click mode

Pagination

Per-element actions

itemPrompt vs top-level prompt

Controlling how long run() waits

Batch scraping

Scrape a list of URLs

Submit now, check later

Stream results as they finish

Retry failed items

Cancel a batch

List past batches

Crawling

Crawl a site

Raw content mode

Structured output with schema

Scoped crawling with path filters

Submit now, check later

Watch a crawl page-by-page

Cancel a crawl

Download the raw HTML and Markdown

Re-extract with a different prompt

History and stats

Logs

List and filter your logs

Get one log with its full output

Usage statistics

Retries and reliability

Error handling

Verifying webhooks

AI agent integration