The Python SDK wraps the Spidra API so you’re not writing raw HTTP calls and polling loops yourself. It handles job submission, status polling, retry logic, and error mapping to typed exceptions. The SDK is async by design, with synchronous wrappers on every method so it works anywhere — async scripts, Django views, Flask routes, or Jupyter notebooks.Documentation Index
Fetch the complete documentation index at: https://docs.spidra.io/llms.txt
Use this file to discover all available pages before exploring further.
Installation
Get your API key from app.spidra.io under Settings → API Keys.
Store it as an environment variable. Never hardcode it.
Getting started
spidra.scrape, spidra.batch, spidra.crawl, spidra.logs, and spidra.usage.
Every method is async by default. If you’re not in an async context, each method has a _sync counterpart that works anywhere:
asyncio.run() directly would fail.
Scraping
The scraper accepts up to three URLs per request and processes them in parallel. You can pass a plain extraction prompt, a JSON schema, per-URL browser actions, or any combination of those. The simplest path isrun() — it submits the job and blocks until it finishes, then returns the result:
submit() returns a job ID immediately. You can then call get() whenever you’re ready to check:
waiting → active → completed (or failed).
Scrape parameters
| Parameter | Type | Description |
|---|---|---|
urls | list | Up to 3 ScrapeUrl objects. Each takes a url and optional actions |
prompt | str | What to extract, written in plain English |
output | str | "markdown" (default) or "json" |
schema | dict | JSON Schema that forces a specific output shape |
use_proxy | bool | Route through a residential proxy |
proxy_country | str | Two-letter country code: "us", "de", "jp", etc. |
extract_content_only | bool | Strip nav, ads, and boilerplate before the AI sees the page |
screenshot | bool | Capture a viewport screenshot |
full_page_screenshot | bool | Capture a full-page scrolled screenshot |
cookies | str | Raw Cookie header string for pages behind a login |
Enforcing an exact output shape
Without a schema the AI extracts what it finds. With a schema, missing fields come back asNone rather than guessed values, which matters when the output feeds a database or a typed pipeline downstream:
Scraping geo-restricted content
Some sites serve different prices or content depending on where you’re browsing from. Setuse_proxy=True and a proxy_country code to route through a residential IP in that country:
us, gb, de, fr, jp, au, ca, br, in, nl, and 40+ more. Use "global" or "eu" for regional routing without pinning to a specific country.
Scraping pages behind a login
If the page requires a session, pass your cookies as a raw header string. The easiest way to get this is to log in through your browser, open devtools, and copy theCookie header from any authenticated request:
Browser actions
Sometimes you need to interact with the page before extraction — dismiss a cookie banner, type into a search box, scroll to load lazy content. Pass anactions list inside the ScrapeUrl and they run in order before the AI sees the page:
selector you can pass a CSS selector or XPath. If you’d rather describe the element in plain English, use value and Spidra will locate it with AI.
| Action | What it does |
|---|---|
click | Click any element — use selector for CSS, value for plain text |
type | Type into an input or textarea |
check | Check a checkbox |
uncheck | Uncheck a checkbox |
wait | Pause for duration milliseconds |
scroll | Scroll to a percentage of the page height, e.g. "80%" |
forEach | Loop over every matched element and extract from each one |
Controlling how long run() waits
By defaultrun() polls every 3 seconds and gives up after 120 seconds. You can override both by passing a PollOptions object:
batch.run() and crawl.run().
Batch scraping
When you have a list of URLs to process, batch is the right tool. You can submit up to 50 URLs in a single request and they all run in parallel. Unlike the scraper, each URL here is a plain string — there’s no per-URL actions support in batch mode.pending → running → completed (or failed). The batch itself follows the same lifecycle, plus a cancelled state if you stop it early.
If you don’t want to wait for the whole thing to finish, use submit() and get() separately:
Retrying failures and cancelling
If some items fail due to timeouts or transient errors, you can retry just those without re-running the ones that already succeeded:Crawling
Crawling is different from scraping. You give it a starting URL and it discovers and processes pages on its own, following links according to your instructions. Good for indexing a docs site, monitoring a competitor’s blog, or building a structured dataset from an entire section of a site.crawl_instruction tells the crawler which links to follow. transform_instruction tells the AI what to extract from each page it visits. max_pages is a safety cap so the crawl doesn’t run indefinitely. The default timeout for crawl.run() is 120 seconds — pass a higher value for bigger crawls.
The same use_proxy, proxy_country, and cookies options from the scraper work here too.
Just like scraping, you can fire-and-forget with submit() and poll with get():
Downloading the raw content
Once a crawl completes, you can fetch signed URLs to download the raw HTML and Markdown for every page that was crawled. These links expire after an hour:Re-extracting with a different prompt
If you crawled a site and want to pull out different information, you don’t have to re-crawl.extract() runs a new AI pass over the already-crawled content and charges only transformation credits:
Browsing your crawl history
Logs
Every scrape request your API key makes gets logged automatically. You can filter by status, URL, date range, or where it came from:Usage statistics
Check how many requests and credits your account has used over a given period:"7d" gives one row per day for the last week. "30d" gives the last month. "weekly" gives one row per week for the last seven weeks.
Error handling
Every API error is mapped to a typed exception class, so you can catch exactly what you care about and let the rest bubble up:| Exception | Status | When |
|---|---|---|
SpidraAuthenticationError | 401 | The API key is missing or invalid |
SpidraInsufficientCreditsError | 403 | No credits remaining on the account |
SpidraRateLimitError | 429 | Too many requests — back off |
SpidraServerError | 500 | Unexpected server-side error |
SpidraError | any | Base class for all Spidra exceptions |
.status for the HTTP status code and .message for a human-readable explanation.
Ruby
Official Ruby SDK — pure stdlib, no external dependencies. Works in Rails, Sinatra, and scripts.
Elixir
Official Elixir SDK — idiomatic pattern matching, OTP-ready, works with Phoenix and plain Mix projects.

