> ## Documentation Index
> Fetch the complete documentation index at: https://docs.spidra.io/llms.txt
> Use this file to discover all available pages before exploring further.

# PHP

> Official PHP SDK for the Spidra web scraping API. Handles job submission, status polling, and error mapping. Requires PHP 8.1+ with Guzzle.

The PHP SDK wraps the Spidra API with idiomatic PHP helpers so you're not writing raw HTTP calls and hand-rolling polling loops. It handles job submission, status polling, error mapping to typed exceptions, and everything in between.

## Installation

```bash theme={null}
composer require spidra/spidra-php
```

Requires PHP 8.1+ and Guzzle 7. Once Composer is done, you're good to go — no extra setup.

<Note>
  Get your API key from [app.spidra.io](https://app.spidra.io) under **Settings → API Keys**.
  Store it as an environment variable. Never hardcode it.
</Note>

## Getting started

```php theme={null}
use Spidra\SpidraClient;

$spidra = new SpidraClient(getenv('SPIDRA_API_KEY'));
```

From here you access everything through `$spidra->scrape`, `$spidra->batch`, `$spidra->crawl`, `$spidra->logs`, and `$spidra->usage`.

## Scraping

The scraper accepts up to three URLs per request and processes them in parallel. You can pass a plain extraction prompt, a full JSON schema, per-URL browser actions, or any mix of those.

The simplest path is `run()` — it submits the job and blocks until it finishes, then returns the result:

```php theme={null}
$job = $spidra->scrape->run([
    'urls'   => [['url' => 'https://example.com/pricing']],
    'prompt' => 'Extract all pricing plans with name, price, and included features',
    'output' => 'json',
]);

print_r($job['content']);
// ['plans' => [['name' => 'Starter', 'price' => '$9/mo', ...], ...]]
```

If you'd rather fire and move on, `submit()` returns a `jobId` immediately. You can then call `get()` whenever you're ready to check:

```php theme={null}
['jobId' => $jobId] = $spidra->scrape->submit([
    'urls'   => [['url' => 'https://example.com']],
    'prompt' => 'Extract the main headline',
]);

// Later...
$status = $spidra->scrape->get($jobId);

if ($status['status'] === 'completed') {
    echo $status['content'];
}
```

Job statuses move through: `waiting` → `active` → `completed` (or `failed`).

### Scrape parameters

| Parameter            | Type     | Description                                                        |
| -------------------- | -------- | ------------------------------------------------------------------ |
| `urls`               | `array`  | Up to 3 URLs. Each entry is `['url' => '...', 'actions' => [...]]` |
| `prompt`             | `string` | What to extract. Written in plain English                          |
| `output`             | `string` | `"markdown"` (default) or `"json"`                                 |
| `schema`             | `array`  | JSON Schema — forces a specific shape when using `output: "json"`  |
| `useProxy`           | `bool`   | Route through a residential proxy                                  |
| `proxyCountry`       | `string` | Two-letter country code: `"us"`, `"de"`, `"jp"`, etc.              |
| `extractContentOnly` | `bool`   | Strip nav, ads, and boilerplate before the AI sees the page        |
| `screenshot`         | `bool`   | Capture a viewport screenshot                                      |
| `fullPageScreenshot` | `bool`   | Capture a full-page (scrolled) screenshot                          |
| `cookies`            | `string` | Raw `Cookie` header string for pages behind a login                |

### Enforcing an exact output shape

Without a schema, the AI extracts what it finds. With a schema, missing fields come back as `null` rather than guessed values — useful when the output feeds a database or a typed pipeline downstream.

```php theme={null}
$job = $spidra->scrape->run([
    'urls'   => [['url' => 'https://jobs.example.com/senior-engineer']],
    'prompt' => 'Extract the job listing details',
    'output' => 'json',
    'schema' => [
        'type'       => 'object',
        'required'   => ['title', 'company', 'remote'],
        'properties' => [
            'title'      => ['type' => 'string'],
            'company'    => ['type' => 'string'],
            'remote'     => ['type' => ['boolean', 'null']],
            'salary_min' => ['type' => ['number', 'null']],
            'skills'     => ['type' => 'array', 'items' => ['type' => 'string']],
        ],
    ],
]);
```

### Scraping geo-restricted content

Some sites serve different prices or content depending on where you're browsing from. Set `useProxy` and `proxyCountry` to route through a residential IP in that country:

```php theme={null}
$job = $spidra->scrape->run([
    'urls'         => [['url' => 'https://www.amazon.de/gp/bestsellers']],
    'prompt'       => 'List the top 10 products with name and price',
    'useProxy'     => true,
    'proxyCountry' => 'de',
]);
```

Supported country codes include `us`, `gb`, `de`, `fr`, `jp`, `au`, `ca`, `br`, `in`, `nl`, and [40+ more](/features/stealth-mode#country-targeting). Use `"global"` or `"eu"` for regional routing without pinning to a specific country.

### Scraping pages behind a login

If the page requires a session, pass your cookies as a raw header string. The easiest way to get this is to log in through your browser's devtools, then copy the `Cookie` header from any authenticated request.

```php theme={null}
$job = $spidra->scrape->run([
    'urls'    => [['url' => 'https://app.example.com/dashboard']],
    'prompt'  => 'Extract the monthly revenue and active user count',
    'cookies' => 'session=abc123; auth_token=xyz789',
]);
```

### Browser actions

Sometimes you need to interact with the page before extraction — dismiss a cookie banner, type into a search box, scroll to load lazy content. Pass an `actions` array inside the URL entry and they'll run in order before the AI sees the page:

```php theme={null}
$job = $spidra->scrape->run([
    'urls' => [
        [
            'url'     => 'https://example.com/products',
            'actions' => [
                ['type' => 'click', 'selector' => '#accept-cookies'],
                ['type' => 'wait',  'duration'  => 1000],
                ['type' => 'scroll', 'to'        => '80%'],
            ],
        ],
    ],
    'prompt' => 'Extract all product names and prices visible on the page',
]);
```

For `selector` you can pass a CSS selector or XPath. If you'd rather describe the element in plain English, use `value` — Spidra will locate it with AI.

| Action    | What it does                                                       |
| --------- | ------------------------------------------------------------------ |
| `click`   | Click any element — use `selector` for CSS, `value` for plain text |
| `type`    | Type into an input or textarea                                     |
| `check`   | Check a checkbox                                                   |
| `uncheck` | Uncheck a checkbox                                                 |
| `wait`    | Pause for `duration` milliseconds                                  |
| `scroll`  | Scroll to a percentage of the page height (e.g. `"80%"`)           |
| `forEach` | Loop over every matched element and extract from each one          |

### Controlling how long run() waits

By default `run()` polls every 3 seconds and gives up after 120 seconds. You can override both:

```php theme={null}
$job = $spidra->scrape->run($params, [
    'pollInterval' => 5,   // seconds between checks
    'timeout'      => 60,  // throw after this many seconds if still running
]);
```

The same options work on `batch->run()` and `crawl->run()`.

## Batch scraping

When you have a list of URLs to process, batch is the right tool. You can submit up to 50 URLs in a single request and they all run in parallel. Unlike the scraper, each URL here is a plain string — there's no per-URL actions support.

```php theme={null}
$batch = $spidra->batch->run([
    'urls' => [
        'https://shop.example.com/product/1',
        'https://shop.example.com/product/2',
        'https://shop.example.com/product/3',
    ],
    'prompt' => 'Extract product name, price, and whether it is in stock',
    'output' => 'json',
]);

echo $batch['completedCount'] . '/' . $batch['totalUrls'] . " completed\n";

foreach ($batch['items'] as $item) {
    if ($item['status'] === 'completed') {
        print_r($item['result']);
    } else {
        echo "Failed: {$item['url']} — {$item['error']}\n";
    }
}
```

Each item in `items` moves through `pending` → `running` → `completed` (or `failed`). The batch itself follows the same lifecycle, plus a `cancelled` state if you stop it early.

If you don't want to wait for the whole batch to finish, use `submit()` and `get()` separately:

```php theme={null}
['batchId' => $batchId] = $spidra->batch->submit([
    'urls'   => ['https://example.com/1', 'https://example.com/2'],
    'prompt' => 'Extract the page title and meta description',
]);

// Come back later
$result = $spidra->batch->get($batchId);
echo "{$result['completedCount']} of {$result['totalUrls']} done\n";
```

### Retrying failures and cancelling

If some items fail (transient network errors, timeouts), you can retry just those without re-running the ones that already succeeded:

```php theme={null}
if ($batch['failedCount'] > 0) {
    $retry = $spidra->batch->retry($batchId);
    echo "Retrying {$retry['retriedCount']} failed items\n";
}
```

To stop a running batch and get credits back for anything that hasn't started yet:

```php theme={null}
$result = $spidra->batch->cancel($batchId);
echo "Cancelled {$result['cancelledItems']} items — {$result['creditsRefunded']} credits refunded\n";
```

To look through past batches:

```php theme={null}
$page = $spidra->batch->list(1, 20); // page, limit

foreach ($page['jobs'] as $job) {
    echo "{$job['uuid']} {$job['status']} — {$job['completedCount']}/{$job['totalUrls']}\n";
}
```

## Crawling

Crawling is different from scraping — you give it a starting URL and it discovers and processes pages on its own, following links according to your instructions. Good for indexing a docs site, monitoring a competitor's blog, or building a structured dataset from an entire section of a site.

```php theme={null}
$job = $spidra->crawl->run([
    'baseUrl'              => 'https://competitor.com/blog',
    'crawlInstruction'     => 'Follow links to blog posts only — skip tag pages, category pages, and the homepage',
    'transformInstruction' => 'Extract the post title, author name, publish date, and a one-sentence summary',
    'maxPages'             => 50,
    'useProxy'             => true,
]);

foreach ($job['result'] as $page) {
    echo $page['url'] . "\n";
    print_r($page['data']);
}
```

`crawlInstruction` tells the crawler which links to follow. `transformInstruction` tells the AI what to extract from each page it visits. `maxPages` is a safety cap — the crawl stops once it hits that number.

The same `useProxy`, `proxyCountry`, and `cookies` options from the scraper work here too.

Just like scraping, you can fire-and-forget with `submit()` and poll with `get()`:

```php theme={null}
['jobId' => $jobId] = $spidra->crawl->submit([
    'baseUrl'              => 'https://example.com/docs',
    'crawlInstruction'     => 'Follow all documentation pages',
    'transformInstruction' => 'Extract the page title and a short summary of the content',
    'maxPages'             => 50,
]);

$status = $spidra->crawl->get($jobId);
// status moves through: waiting → active → running → completed (or failed)
```

### Downloading the raw content

Once a crawl completes, you can fetch signed URLs to download the raw HTML and Markdown for every page that was crawled. These links expire after an hour:

```php theme={null}
$result = $spidra->crawl->pages($jobId);

foreach ($result['pages'] as $page) {
    // $page['html_url']     — download the raw HTML
    // $page['markdown_url'] — download the cleaned Markdown
    echo $page['url'] . ' — ' . $page['status'] . "\n";
}
```

### Re-extracting with a different prompt

If you crawled a site and want to pull out different information — say you originally extracted titles and summaries, but now you need prices — you don't have to re-crawl. `extract()` runs a new AI pass over the already-crawled content and charges only transformation credits:

```php theme={null}
$result = $spidra->crawl->extract(
    $completedJobId,
    'Extract only product SKUs and prices as structured JSON'
);

// This creates a new job — poll it like any other
$extracted = $spidra->crawl->get($result['jobId']);
```

### Browsing your crawl history

```php theme={null}
$history = $spidra->crawl->history(1, 10);
echo "Total crawl jobs: {$history['total']}\n";

$stats = $spidra->crawl->stats();
echo "All-time: {$stats['total']}\n";
```

## Logs

Every scrape request your API key makes gets logged automatically. You can filter by status, URL, date range, or where it came from (API vs playground):

```php theme={null}
$result = $spidra->logs->list([
    'status'     => 'failed',
    'searchTerm' => 'amazon.com',
    'dateStart'  => '2024-01-01',
    'dateEnd'    => '2024-12-31',
    'page'       => 1,
    'limit'      => 20,
]);

foreach ($result['logs'] as $log) {
    echo $log['urls'][0]['url'] . ' — ' . $log['status'] . ' (' . $log['credits_used'] . ' credits)' . "\n";
}
```

To fetch the full details of a single log entry, including the AI extraction output:

```php theme={null}
$log = $spidra->logs->get($logUuid);
print_r($log['result_data']);
```

## Usage statistics

Check how many requests and credits your account has used over a given period:

```php theme={null}
$rows = $spidra->usage->get('30d'); // "7d" | "30d" | "weekly"

foreach ($rows as $row) {
    echo "{$row['date']}: {$row['requests']} requests, {$row['credits']} credits\n";
}
```

`"7d"` gives one row per day for the last week. `"30d"` gives the last month. `"weekly"` gives one row per week for the last seven weeks.

## Error handling

Every API error is mapped to a typed exception, so you can catch exactly what you care about and ignore the rest:

```php theme={null}
use Spidra\Exceptions\SpidraException;
use Spidra\Exceptions\AuthenticationException;
use Spidra\Exceptions\InsufficientCreditsException;
use Spidra\Exceptions\RateLimitException;
use Spidra\Exceptions\ServerException;

try {
    $job = $spidra->scrape->run([
        'urls'   => [['url' => 'https://example.com']],
        'prompt' => 'Extract the main headline',
    ]);
} catch (AuthenticationException $e) {
    // Bad or missing API key
} catch (InsufficientCreditsException $e) {
    // Account is out of credits — time to top up
} catch (RateLimitException $e) {
    // Slow down — you're hitting limits
} catch (ServerException $e) {
    // Something went wrong on Spidra's side — retry is usually safe
} catch (SpidraException $e) {
    // Catch-all for anything else
    echo "Error {$e->getCode()}: {$e->getMessage()}\n";
}
```

| Exception                      | HTTP | Meaning                              |
| ------------------------------ | ---- | ------------------------------------ |
| `AuthenticationException`      | 401  | The API key is missing or invalid    |
| `InsufficientCreditsException` | 403  | No credits remaining on the account  |
| `RateLimitException`           | 429  | Too many requests — back off         |
| `ServerException`              | 500  | Unexpected server-side error         |
| `SpidraException`              | any  | Base class for all Spidra exceptions |

All exceptions expose `getCode()` for the HTTP status and `getMessage()` for a human-readable explanation.

<CardGroup cols={2}>
  <Card title="Node.js" icon="node" href="/sdks/node">
    Official Node.js / TypeScript SDK — works in Next.js, Express, Bun, and edge runtimes.
  </Card>

  <Card title="Go" icon="golang" href="/sdks/go">
    Official Go SDK — typed structs, idiomatic errors, zero external dependencies.
  </Card>
</CardGroup>