Skip to main content
The PHP SDK wraps the Spidra API with idiomatic PHP helpers so you’re not writing raw HTTP calls and hand-rolling polling loops. It handles job submission, status polling, error mapping to typed exceptions, and everything in between.

Installation

composer require spidra/spidra-php
Requires PHP 8.1+ and Guzzle 7. Once Composer is done, you’re good to go — no extra setup.
Get your API key from app.spidra.io under Settings → API Keys. Store it as an environment variable. Never hardcode it.

Getting started

use Spidra\SpidraClient;

$spidra = new SpidraClient(getenv('SPIDRA_API_KEY'));
From here you access everything through $spidra->scrape, $spidra->batch, $spidra->crawl, $spidra->logs, and $spidra->usage.

Scraping

The scraper accepts up to three URLs per request and processes them in parallel. You can pass a plain extraction prompt, a full JSON schema, per-URL browser actions, or any mix of those. The simplest path is run() — it submits the job and blocks until it finishes, then returns the result:
$job = $spidra->scrape->run([
    'urls'   => [['url' => 'https://example.com/pricing']],
    'prompt' => 'Extract all pricing plans with name, price, and included features',
    'output' => 'json',
]);

print_r($job['content']);
// ['plans' => [['name' => 'Starter', 'price' => '$9/mo', ...], ...]]
If you’d rather fire and move on, submit() returns a jobId immediately. You can then call get() whenever you’re ready to check:
['jobId' => $jobId] = $spidra->scrape->submit([
    'urls'   => [['url' => 'https://example.com']],
    'prompt' => 'Extract the main headline',
]);

// Later...
$status = $spidra->scrape->get($jobId);

if ($status['status'] === 'completed') {
    echo $status['content'];
}
Job statuses move through: waitingactivecompleted (or failed).

Scrape parameters

ParameterTypeDescription
urlsarrayUp to 3 URLs. Each entry is ['url' => '...', 'actions' => [...]]
promptstringWhat to extract. Written in plain English
outputstring"markdown" (default) or "json"
schemaarrayJSON Schema — forces a specific shape when using output: "json"
useProxyboolRoute through a residential proxy
proxyCountrystringTwo-letter country code: "us", "de", "jp", etc.
extractContentOnlyboolStrip nav, ads, and boilerplate before the AI sees the page
screenshotboolCapture a viewport screenshot
fullPageScreenshotboolCapture a full-page (scrolled) screenshot
cookiesstringRaw Cookie header string for pages behind a login

Enforcing an exact output shape

Without a schema, the AI extracts what it finds. With a schema, missing fields come back as null rather than guessed values — useful when the output feeds a database or a typed pipeline downstream.
$job = $spidra->scrape->run([
    'urls'   => [['url' => 'https://jobs.example.com/senior-engineer']],
    'prompt' => 'Extract the job listing details',
    'output' => 'json',
    'schema' => [
        'type'       => 'object',
        'required'   => ['title', 'company', 'remote'],
        'properties' => [
            'title'      => ['type' => 'string'],
            'company'    => ['type' => 'string'],
            'remote'     => ['type' => ['boolean', 'null']],
            'salary_min' => ['type' => ['number', 'null']],
            'skills'     => ['type' => 'array', 'items' => ['type' => 'string']],
        ],
    ],
]);

Scraping geo-restricted content

Some sites serve different prices or content depending on where you’re browsing from. Set useProxy and proxyCountry to route through a residential IP in that country:
$job = $spidra->scrape->run([
    'urls'         => [['url' => 'https://www.amazon.de/gp/bestsellers']],
    'prompt'       => 'List the top 10 products with name and price',
    'useProxy'     => true,
    'proxyCountry' => 'de',
]);
Supported country codes include us, gb, de, fr, jp, au, ca, br, in, nl, and 40+ more. Use "global" or "eu" for regional routing without pinning to a specific country.

Scraping pages behind a login

If the page requires a session, pass your cookies as a raw header string. The easiest way to get this is to log in through your browser’s devtools, then copy the Cookie header from any authenticated request.
$job = $spidra->scrape->run([
    'urls'    => [['url' => 'https://app.example.com/dashboard']],
    'prompt'  => 'Extract the monthly revenue and active user count',
    'cookies' => 'session=abc123; auth_token=xyz789',
]);

Browser actions

Sometimes you need to interact with the page before extraction — dismiss a cookie banner, type into a search box, scroll to load lazy content. Pass an actions array inside the URL entry and they’ll run in order before the AI sees the page:
$job = $spidra->scrape->run([
    'urls' => [
        [
            'url'     => 'https://example.com/products',
            'actions' => [
                ['type' => 'click', 'selector' => '#accept-cookies'],
                ['type' => 'wait',  'duration'  => 1000],
                ['type' => 'scroll', 'to'        => '80%'],
            ],
        ],
    ],
    'prompt' => 'Extract all product names and prices visible on the page',
]);
For selector you can pass a CSS selector or XPath. If you’d rather describe the element in plain English, use value — Spidra will locate it with AI.
ActionWhat it does
clickClick any element — use selector for CSS, value for plain text
typeType into an input or textarea
checkCheck a checkbox
uncheckUncheck a checkbox
waitPause for duration milliseconds
scrollScroll to a percentage of the page height (e.g. "80%")
forEachLoop over every matched element and extract from each one

Controlling how long run() waits

By default run() polls every 3 seconds and gives up after 120 seconds. You can override both:
$job = $spidra->scrape->run($params, [
    'pollInterval' => 5,   // seconds between checks
    'timeout'      => 60,  // throw after this many seconds if still running
]);
The same options work on batch->run() and crawl->run().

Batch scraping

When you have a list of URLs to process, batch is the right tool. You can submit up to 50 URLs in a single request and they all run in parallel. Unlike the scraper, each URL here is a plain string — there’s no per-URL actions support.
$batch = $spidra->batch->run([
    'urls' => [
        'https://shop.example.com/product/1',
        'https://shop.example.com/product/2',
        'https://shop.example.com/product/3',
    ],
    'prompt' => 'Extract product name, price, and whether it is in stock',
    'output' => 'json',
]);

echo $batch['completedCount'] . '/' . $batch['totalUrls'] . " completed\n";

foreach ($batch['items'] as $item) {
    if ($item['status'] === 'completed') {
        print_r($item['result']);
    } else {
        echo "Failed: {$item['url']} — {$item['error']}\n";
    }
}
Each item in items moves through pendingrunningcompleted (or failed). The batch itself follows the same lifecycle, plus a cancelled state if you stop it early. If you don’t want to wait for the whole batch to finish, use submit() and get() separately:
['batchId' => $batchId] = $spidra->batch->submit([
    'urls'   => ['https://example.com/1', 'https://example.com/2'],
    'prompt' => 'Extract the page title and meta description',
]);

// Come back later
$result = $spidra->batch->get($batchId);
echo "{$result['completedCount']} of {$result['totalUrls']} done\n";

Retrying failures and cancelling

If some items fail (transient network errors, timeouts), you can retry just those without re-running the ones that already succeeded:
if ($batch['failedCount'] > 0) {
    $retry = $spidra->batch->retry($batchId);
    echo "Retrying {$retry['retriedCount']} failed items\n";
}
To stop a running batch and get credits back for anything that hasn’t started yet:
$result = $spidra->batch->cancel($batchId);
echo "Cancelled {$result['cancelledItems']} items — {$result['creditsRefunded']} credits refunded\n";
To look through past batches:
$page = $spidra->batch->list(1, 20); // page, limit

foreach ($page['jobs'] as $job) {
    echo "{$job['uuid']} {$job['status']} — {$job['completedCount']}/{$job['totalUrls']}\n";
}

Crawling

Crawling is different from scraping — you give it a starting URL and it discovers and processes pages on its own, following links according to your instructions. Good for indexing a docs site, monitoring a competitor’s blog, or building a structured dataset from an entire section of a site.
$job = $spidra->crawl->run([
    'baseUrl'              => 'https://competitor.com/blog',
    'crawlInstruction'     => 'Follow links to blog posts only — skip tag pages, category pages, and the homepage',
    'transformInstruction' => 'Extract the post title, author name, publish date, and a one-sentence summary',
    'maxPages'             => 30,
    'useProxy'             => true,
]);

foreach ($job['result'] as $page) {
    echo $page['url'] . "\n";
    print_r($page['data']);
}
crawlInstruction tells the crawler which links to follow. transformInstruction tells the AI what to extract from each page it visits. maxPages is a safety cap — the crawl stops once it hits that number. The same useProxy, proxyCountry, and cookies options from the scraper work here too. Just like scraping, you can fire-and-forget with submit() and poll with get():
['jobId' => $jobId] = $spidra->crawl->submit([
    'baseUrl'              => 'https://example.com/docs',
    'crawlInstruction'     => 'Follow all documentation pages',
    'transformInstruction' => 'Extract the page title and a short summary of the content',
    'maxPages'             => 50,
]);

$status = $spidra->crawl->get($jobId);
// status moves through: waiting → active → running → completed (or failed)

Downloading the raw content

Once a crawl completes, you can fetch signed URLs to download the raw HTML and Markdown for every page that was crawled. These links expire after an hour:
$result = $spidra->crawl->pages($jobId);

foreach ($result['pages'] as $page) {
    // $page['html_url']     — download the raw HTML
    // $page['markdown_url'] — download the cleaned Markdown
    echo $page['url'] . ' — ' . $page['status'] . "\n";
}

Re-extracting with a different prompt

If you crawled a site and want to pull out different information — say you originally extracted titles and summaries, but now you need prices — you don’t have to re-crawl. extract() runs a new AI pass over the already-crawled content and charges only transformation credits:
$result = $spidra->crawl->extract(
    $completedJobId,
    'Extract only product SKUs and prices as structured JSON'
);

// This creates a new job — poll it like any other
$extracted = $spidra->crawl->get($result['jobId']);

Browsing your crawl history

$history = $spidra->crawl->history(1, 10);
echo "Total crawl jobs: {$history['total']}\n";

$stats = $spidra->crawl->stats();
echo "All-time: {$stats['total']}\n";

Logs

Every scrape request your API key makes gets logged automatically. You can filter by status, URL, date range, or where it came from (API vs playground):
$result = $spidra->logs->list([
    'status'     => 'failed',
    'searchTerm' => 'amazon.com',
    'dateStart'  => '2024-01-01',
    'dateEnd'    => '2024-12-31',
    'page'       => 1,
    'limit'      => 20,
]);

foreach ($result['logs'] as $log) {
    echo $log['urls'][0]['url'] . ' — ' . $log['status'] . ' (' . $log['credits_used'] . ' credits)' . "\n";
}
To fetch the full details of a single log entry, including the AI extraction output:
$log = $spidra->logs->get($logUuid);
print_r($log['result_data']);

Usage statistics

Check how many requests and credits your account has used over a given period:
$rows = $spidra->usage->get('30d'); // "7d" | "30d" | "weekly"

foreach ($rows as $row) {
    echo "{$row['date']}: {$row['requests']} requests, {$row['credits']} credits\n";
}
"7d" gives one row per day for the last week. "30d" gives the last month. "weekly" gives one row per week for the last seven weeks.

Error handling

Every API error is mapped to a typed exception, so you can catch exactly what you care about and ignore the rest:
use Spidra\Exceptions\SpidraException;
use Spidra\Exceptions\AuthenticationException;
use Spidra\Exceptions\InsufficientCreditsException;
use Spidra\Exceptions\RateLimitException;
use Spidra\Exceptions\ServerException;

try {
    $job = $spidra->scrape->run([
        'urls'   => [['url' => 'https://example.com']],
        'prompt' => 'Extract the main headline',
    ]);
} catch (AuthenticationException $e) {
    // Bad or missing API key
} catch (InsufficientCreditsException $e) {
    // Account is out of credits — time to top up
} catch (RateLimitException $e) {
    // Slow down — you're hitting limits
} catch (ServerException $e) {
    // Something went wrong on Spidra's side — retry is usually safe
} catch (SpidraException $e) {
    // Catch-all for anything else
    echo "Error {$e->getCode()}: {$e->getMessage()}\n";
}
ExceptionHTTPMeaning
AuthenticationException401The API key is missing or invalid
InsufficientCreditsException403No credits remaining on the account
RateLimitException429Too many requests — back off
ServerException500Unexpected server-side error
SpidraExceptionanyBase class for all Spidra exceptions
All exceptions expose getCode() for the HTTP status and getMessage() for a human-readable explanation.