Elixir

The official Elixir SDK for Spidra wraps the Spidra API so you’re not writing raw HTTP calls and polling loops yourself. It handles job submission, status polling, retry logic, and error mapping. All results come back as structured data ready to feed into your LLM pipelines or store directly.

Installation

Add spidra to your list of dependencies in mix.exs:

def deps do
  [
    {:spidra, "~> 0.1.0"}
  ]
end

Then run mix deps.get in your terminal.

Get your API key from app.spidra.io under Settings → API Keys. Store it as an environment variable — never hardcode it in source.

Requirements

Elixir 1.14 or later
A Spidra API key (sign up free)

Getting started

# Initialize your configuration
config = Spidra.Config.new(api_key: "spd_YOUR_API_KEY")

From here you access everything through Spidra.Scrape, Spidra.Batch, Spidra.Crawl, Spidra.Logs, and Spidra.Usage.

Scraping

All scrape jobs run asynchronously on the Spidra platform. Spidra.Scrape.run/3 submits a job and polls until it finishes. If you need more control, use submit/2 and get/2 directly. Up to 3 URLs can be passed per request and they are processed in parallel.

Basic scrape

Submit a scrape job and wait for results.

{:ok, job} = Spidra.Scrape.run(config, %{
  urls: [%{url: "https://example.com/pricing"}],
  prompt: "Extract all pricing plans with name, price, and included features",
  output: "json"
})

IO.inspect(job["result"]["content"])
# "{ \"plans\": [{ \"name\": \"Starter\", \"price\": \"$9/mo\", \"features\": [...] }, ...] }"

Parameters

Parameter	Type	Description
`urls`	list	Up to 3 URL maps. Each takes a `url` key and optional `actions` list
`prompt`	string	AI extraction instruction
`output`	string	`"markdown"` (default) or `"json"`
`schema`	map	JSON Schema for guaranteed output shape (use with `output: "json"`)
`use_proxy`	boolean	Route through a residential proxy
`proxy_country`	string	Two-letter country code, e.g. `"us"`, `"de"`, `"jp"`
`extract_content_only`	boolean	Strip navigation, ads, and boilerplate before AI extraction
`screenshot`	boolean	Capture a screenshot of the page
`full_page_screenshot`	boolean	Capture a full-page (scrolled) screenshot
`cookies`	string	Raw `Cookie` header string for authenticated pages

Fire-and-forget approach

Use submit/2 and get/2 when you want to manage polling yourself.

# Submit a job and get the job_id immediately
{:ok, %{"jobId" => job_id}} = Spidra.Scrape.submit(config, %{
  urls: [%{url: "https://example.com"}],
  prompt: "Extract the main headline"
})

# Check status at any point
{:ok, status} = Spidra.Scrape.get(config, job_id)

case status["status"] do
  "completed" -> IO.inspect(status["result"]["content"])
  "failed"    -> IO.inspect(status["error"])
  _           -> IO.puts("Job is still pending...")
end

Job statuses: waiting · active · completed · failed

Structured JSON output

Pass a schema to enforce an exact output shape. Missing fields come back as null rather than hallucinated values.

{:ok, job} = Spidra.Scrape.run(config, %{
  urls: [%{url: "https://jobs.example.com/senior-engineer"}],
  prompt: "Extract the job listing details",
  output: "json",
  schema: %{
    "type" => "object",
    "required" => ["title", "company", "remote"],
    "properties" => %{
      "title"      => %{"type" => "string"},
      "company"    => %{"type" => "string"},
      "remote"     => %{"type" => ["boolean", "null"]},
      "salary_min" => %{"type" => ["number", "null"]},
      "salary_max" => %{"type" => ["number", "null"]},
      "skills"     => %{"type" => "array", "items" => %{"type" => "string"}}
    }
  }
})

Geo-targeted scraping

Pass use_proxy: true and a proxy_country code to route the request through a specific country. Useful for geo-restricted content or localized pricing.

{:ok, job} = Spidra.Scrape.run(config, %{
  urls: [%{url: "https://www.amazon.de/gp/bestsellers"}],
  prompt: "List the top 10 products with name and price",
  use_proxy: true,
  proxy_country: "de"
})

Supported country codes include us, gb, de, fr, jp, au, ca, br, in, nl, sg, es, it, mx, and 40+ more. Use "global" or "eu" for regional routing.

Authenticated pages

Pass cookies as a string to scrape pages that require a login session.

{:ok, job} = Spidra.Scrape.run(config, %{
  urls: [%{url: "https://app.example.com/dashboard"}],
  prompt: "Extract the monthly revenue and active user count",
  cookies: "session=abc123; auth_token=xyz789"
})

Browser actions

Actions let you interact with the page before the scrape runs. They execute in order, and the scrape happens after all actions complete.

{:ok, job} = Spidra.Scrape.run(config, %{
  urls: [
    %{
      url: "https://example.com/products",
      actions: [
        %{type: "click", selector: "#accept-cookies"},
        %{type: "wait", duration: 1000},
        %{type: "scroll", to: "80%"}
      ]
    }
  ],
  prompt: "Extract all product names and prices"
})

Available actions

Action	Required fields	Description
`click`	`selector` or `value`	Click a button, link, or any element
`type`	`selector`, `value`	Type text into an input or textarea
`check`	`selector` or `value`	Check a checkbox
`uncheck`	`selector` or `value`	Uncheck a checkbox
`wait`	`duration` (ms)	Pause for a set number of milliseconds
`scroll`	`to` (`0–100%`)	Scroll the page to a percentage of its height
`forEach`	`observe`	Loop over every matched element and process each one

Use selector for a CSS selector or XPath. Use value for plain English — Spidra locates the element using AI.

Batch scraping

Submit up to 50 URLs in a single request. All URLs are processed in parallel. Each URL is a plain string.

{:ok, batch} = Spidra.Batch.run(config, %{
  urls: [
    "https://shop.example.com/product/1",
    "https://shop.example.com/product/2",
    "https://shop.example.com/product/3"
  ],
  prompt: "Extract product name, price, and availability",
  output: "json",
  use_proxy: true
})

for item <- batch["items"] do
  case item["status"] do
    "completed" -> IO.inspect({item["url"], item["result"]})
    "failed"    -> IO.inspect({item["url"], item["error"]})
    _           -> :ok
  end
end

Item statuses: pending · running · completed · failed Batch statuses: pending · running · completed · failed · cancelled

batch.submit() + batch.get()

{:ok, %{"batchId" => batch_id}} = Spidra.Batch.submit(config, %{
  urls: ["https://example.com/1", "https://example.com/2"],
  prompt: "Extract the page title"
})

# Come back later
{:ok, result} = Spidra.Batch.get(config, batch_id)

Retry failed items

Re-queue only the items that failed — successful items are not re-run.

{:ok, result} = Spidra.Batch.get(config, batch_id)

if result["failedCount"] > 0 do
  {:ok, retried} = Spidra.Batch.retry(config, batch_id)
  IO.puts("Retried #{retried["retriedCount"]} items")
end

Cancel a batch

Stops all pending items and refunds credits for unprocessed work.

{:ok, response} = Spidra.Batch.cancel(config, batch_id)
IO.puts("Cancelled #{response["cancelledItems"]} items, refunded #{response["creditsRefunded"]} credits")

List past batches

{:ok, response} = Spidra.Batch.list(config, page: 1, limit: 20)

for job <- response["jobs"] do
  IO.puts("#{job["uuid"]} #{job["status"]} #{job["completedCount"]}/#{job["totalUrls"]}")
end

Crawling

Give Spidra a starting URL and instructions for which links to follow. It discovers pages automatically, extracts structured data from each one, and returns everything when the crawl is done.

{:ok, job} = Spidra.Crawl.run(config, %{
  base_url: "https://competitor.com/blog",
  crawl_instruction: "Find all blog posts published in 2024",
  transform_instruction: "Extract the title, author, publish date, and a one-sentence summary",
  max_pages: 30,
  use_proxy: true
})

for page <- job["result"] do
  IO.inspect({page["url"], page["data"]})
end

Parameters

Parameter	Type	Description
`base_url`	string	Starting URL for the crawl
`crawl_instruction`	string	Which links to follow and which to skip
`transform_instruction`	string	What to extract from each page
`max_pages`	integer	Maximum number of pages to crawl
`use_proxy`	boolean	Route through a residential proxy
`proxy_country`	string	Two-letter country code, e.g. `"us"`
`cookies`	string	Raw `Cookie` header string for authenticated sites

crawl.submit() + crawl.get()

{:ok, %{"jobId" => job_id}} = Spidra.Crawl.submit(config, %{
  base_url: "https://example.com/docs",
  crawl_instruction: "Follow all documentation pages",
  transform_instruction: "Extract the page title and a short summary",
  max_pages: 50
})

# Poll manually
{:ok, status} = Spidra.Crawl.get(config, job_id)
# status: "waiting" | "active" | "running" | "completed" | "failed"

Download crawled content

Returns signed S3 URLs for the raw HTML and Markdown of each crawled page. Links expire after 1 hour.

{:ok, response} = Spidra.Crawl.pages(config, job_id)

for page <- response["pages"] do
  IO.puts("#{page["url"]} - #{page["status"]}")
  # Download raw HTML:     page["html_url"]
  # Download Markdown:     page["markdown_url"]
end

Re-extract without re-crawling

Apply a new AI prompt to an existing completed crawl without fetching the pages again. Only transformation credits are charged.

{:ok, queued} = Spidra.Crawl.extract(config, source_job_id, "Extract only the product SKUs and prices as a CSV")

# Poll the new extraction job
{:ok, result} = Spidra.Crawl.get(config, queued["jobId"])

History and stats

{:ok, response} = Spidra.Crawl.history(config, page: 1, limit: 10)
{:ok, stats}    = Spidra.Crawl.stats(config)

IO.puts("Total crawls: #{stats["total"]}")

Logs

Every API scrape job is logged automatically. Access your full history with optional filters.

{:ok, response} = Spidra.Logs.list(config, %{
  status:      "failed",
  search_term: "amazon.com",
  channel:     "api",
  date_start:  "2024-01-01",
  date_end:    "2024-12-31",
  page:        1,
  limit:       20
})

for log <- response["logs"] do
  IO.puts("#{hd(log["urls"])["url"]} #{log["status"]} #{log["credits_used"]}")
end

Filter parameters

Parameter	Type	Description
`status`	string	`"success"` or `"failed"`
`search_term`	string	Search by URL or prompt
`channel`	string	`"api"` or `"playground"`
`date_start`	string	ISO date — return logs on or after this date
`date_end`	string	ISO date — return logs on or before this date
`page`	integer	Page number (default: 1)
`limit`	integer	Results per page (default: 20)

Get a single log entry including the full AI extraction result:

{:ok, log} = Spidra.Logs.get(config, "log-uuid")
IO.inspect(log["result_data"]) # the full AI output for that job

Usage statistics

Returns credit and request usage broken down by day or week.

# Range options: "7d" | "30d" | "weekly"
{:ok, rows} = Spidra.Usage.get(config, "30d")

for row <- rows do
  IO.puts("#{row["date"]} Requests: #{row["requests"]} Credits: #{row["credits"]}")
end

Range	Description
`"7d"`	Last 7 days, one row per day
`"30d"`	Last 30 days, one row per day
`"weekly"`	Last 7 weeks, one row per week

Python

Official Python SDK — async-first with sync wrappers. Works in scripts, Django, Flask, and Jupyter.

.NET

Official .NET SDK — fully async, typed exceptions, JSON schema support. Requires .NET 8+.

Overview

Official SDKs

Installation

Requirements

Getting started

Scraping

Basic scrape

Fire-and-forget approach

Structured JSON output

Geo-targeted scraping

Authenticated pages

Browser actions

Batch scraping

batch.submit() + batch.get()

Retry failed items

Cancel a batch

List past batches

Crawling

crawl.submit() + crawl.get()

Download crawled content

Re-extract without re-crawling

History and stats

Logs

Usage statistics

Python

.NET

Overview

Official SDKs

​Installation

​Requirements

​Getting started

​Scraping

​Basic scrape

​Fire-and-forget approach

​Structured JSON output

​Geo-targeted scraping

​Authenticated pages

​Browser actions

​Batch scraping

​batch.submit() + batch.get()

​Retry failed items

​Cancel a batch

​List past batches

​Crawling

​crawl.submit() + crawl.get()

​Download crawled content

​Re-extract without re-crawling

​History and stats

​Logs

​Usage statistics

Python

.NET

Installation

Requirements

Getting started

Scraping

Basic scrape

Fire-and-forget approach

Structured JSON output

Geo-targeted scraping

Authenticated pages

Browser actions

Batch scraping

batch.submit() + batch.get()

Retry failed items

Cancel a batch

List past batches

Crawling

crawl.submit() + crawl.get()

Download crawled content

Re-extract without re-crawling

History and stats

Logs

Usage statistics