Skip to main content
The official Elixir SDK for Spidra wraps the Spidra API so you’re not writing raw HTTP calls and polling loops yourself. It handles job submission, status polling, retry logic, and error mapping. All results come back as structured data ready to feed into your LLM pipelines or store directly.

Installation

Add spidra to your list of dependencies in mix.exs:
def deps do
  [
    {:spidra, "~> 0.1.0"}
  ]
end
Then run mix deps.get in your terminal.
Get your API key from app.spidra.io under Settings → API Keys. Store it as an environment variable — never hardcode it in source.

Requirements


Getting started

# Initialize your configuration
config = Spidra.Config.new(api_key: "spd_YOUR_API_KEY")
From here you access everything through Spidra.Scrape, Spidra.Batch, Spidra.Crawl, Spidra.Logs, and Spidra.Usage.

Scraping

All scrape jobs run asynchronously on the Spidra platform. Spidra.Scrape.run/3 submits a job and polls until it finishes. If you need more control, use submit/2 and get/2 directly. Up to 3 URLs can be passed per request and they are processed in parallel.

Basic scrape

Submit a scrape job and wait for results.
{:ok, job} = Spidra.Scrape.run(config, %{
  urls: [%{url: "https://example.com/pricing"}],
  prompt: "Extract all pricing plans with name, price, and included features",
  output: "json"
})

IO.inspect(job["result"]["content"])
# "{ \"plans\": [{ \"name\": \"Starter\", \"price\": \"$9/mo\", \"features\": [...] }, ...] }"
Parameters
ParameterTypeDescription
urlslistUp to 3 URL maps. Each takes a url key and optional actions list
promptstringAI extraction instruction
outputstring"markdown" (default) or "json"
schemamapJSON Schema for guaranteed output shape (use with output: "json")
use_proxybooleanRoute through a residential proxy
proxy_countrystringTwo-letter country code, e.g. "us", "de", "jp"
extract_content_onlybooleanStrip navigation, ads, and boilerplate before AI extraction
screenshotbooleanCapture a screenshot of the page
full_page_screenshotbooleanCapture a full-page (scrolled) screenshot
cookiesstringRaw Cookie header string for authenticated pages

Fire-and-forget approach

Use submit/2 and get/2 when you want to manage polling yourself.
# Submit a job and get the job_id immediately
{:ok, %{"jobId" => job_id}} = Spidra.Scrape.submit(config, %{
  urls: [%{url: "https://example.com"}],
  prompt: "Extract the main headline"
})

# Check status at any point
{:ok, status} = Spidra.Scrape.get(config, job_id)

case status["status"] do
  "completed" -> IO.inspect(status["result"]["content"])
  "failed"    -> IO.inspect(status["error"])
  _           -> IO.puts("Job is still pending...")
end
Job statuses: waiting · active · completed · failed

Structured JSON output

Pass a schema to enforce an exact output shape. Missing fields come back as null rather than hallucinated values.
{:ok, job} = Spidra.Scrape.run(config, %{
  urls: [%{url: "https://jobs.example.com/senior-engineer"}],
  prompt: "Extract the job listing details",
  output: "json",
  schema: %{
    "type" => "object",
    "required" => ["title", "company", "remote"],
    "properties" => %{
      "title"      => %{"type" => "string"},
      "company"    => %{"type" => "string"},
      "remote"     => %{"type" => ["boolean", "null"]},
      "salary_min" => %{"type" => ["number", "null"]},
      "salary_max" => %{"type" => ["number", "null"]},
      "skills"     => %{"type" => "array", "items" => %{"type" => "string"}}
    }
  }
})

Geo-targeted scraping

Pass use_proxy: true and a proxy_country code to route the request through a specific country. Useful for geo-restricted content or localized pricing.
{:ok, job} = Spidra.Scrape.run(config, %{
  urls: [%{url: "https://www.amazon.de/gp/bestsellers"}],
  prompt: "List the top 10 products with name and price",
  use_proxy: true,
  proxy_country: "de"
})
Supported country codes include us, gb, de, fr, jp, au, ca, br, in, nl, sg, es, it, mx, and 40+ more. Use "global" or "eu" for regional routing.

Authenticated pages

Pass cookies as a string to scrape pages that require a login session.
{:ok, job} = Spidra.Scrape.run(config, %{
  urls: [%{url: "https://app.example.com/dashboard"}],
  prompt: "Extract the monthly revenue and active user count",
  cookies: "session=abc123; auth_token=xyz789"
})

Browser actions

Actions let you interact with the page before the scrape runs. They execute in order, and the scrape happens after all actions complete.
{:ok, job} = Spidra.Scrape.run(config, %{
  urls: [
    %{
      url: "https://example.com/products",
      actions: [
        %{type: "click", selector: "#accept-cookies"},
        %{type: "wait", duration: 1000},
        %{type: "scroll", to: "80%"}
      ]
    }
  ],
  prompt: "Extract all product names and prices"
})
Available actions
ActionRequired fieldsDescription
clickselector or valueClick a button, link, or any element
typeselector, valueType text into an input or textarea
checkselector or valueCheck a checkbox
uncheckselector or valueUncheck a checkbox
waitduration (ms)Pause for a set number of milliseconds
scrollto (0–100%)Scroll the page to a percentage of its height
forEachobserveLoop over every matched element and process each one
Use selector for a CSS selector or XPath. Use value for plain English — Spidra locates the element using AI.

Batch scraping

Submit up to 50 URLs in a single request. All URLs are processed in parallel. Each URL is a plain string.
{:ok, batch} = Spidra.Batch.run(config, %{
  urls: [
    "https://shop.example.com/product/1",
    "https://shop.example.com/product/2",
    "https://shop.example.com/product/3"
  ],
  prompt: "Extract product name, price, and availability",
  output: "json",
  use_proxy: true
})

for item <- batch["items"] do
  case item["status"] do
    "completed" -> IO.inspect({item["url"], item["result"]})
    "failed"    -> IO.inspect({item["url"], item["error"]})
    _           -> :ok
  end
end
Item statuses: pending · running · completed · failed Batch statuses: pending · running · completed · failed · cancelled

batch.submit() + batch.get()

{:ok, %{"batchId" => batch_id}} = Spidra.Batch.submit(config, %{
  urls: ["https://example.com/1", "https://example.com/2"],
  prompt: "Extract the page title"
})

# Come back later
{:ok, result} = Spidra.Batch.get(config, batch_id)

Retry failed items

Re-queue only the items that failed — successful items are not re-run.
{:ok, result} = Spidra.Batch.get(config, batch_id)

if result["failedCount"] > 0 do
  {:ok, retried} = Spidra.Batch.retry(config, batch_id)
  IO.puts("Retried #{retried["retriedCount"]} items")
end

Cancel a batch

Stops all pending items and refunds credits for unprocessed work.
{:ok, response} = Spidra.Batch.cancel(config, batch_id)
IO.puts("Cancelled #{response["cancelledItems"]} items, refunded #{response["creditsRefunded"]} credits")

List past batches

{:ok, response} = Spidra.Batch.list(config, page: 1, limit: 20)

for job <- response["jobs"] do
  IO.puts("#{job["uuid"]} #{job["status"]} #{job["completedCount"]}/#{job["totalUrls"]}")
end

Crawling

Give Spidra a starting URL and instructions for which links to follow. It discovers pages automatically, extracts structured data from each one, and returns everything when the crawl is done.
{:ok, job} = Spidra.Crawl.run(config, %{
  base_url: "https://competitor.com/blog",
  crawl_instruction: "Find all blog posts published in 2024",
  transform_instruction: "Extract the title, author, publish date, and a one-sentence summary",
  max_pages: 30,
  use_proxy: true
})

for page <- job["result"] do
  IO.inspect({page["url"], page["data"]})
end
Parameters
ParameterTypeDescription
base_urlstringStarting URL for the crawl
crawl_instructionstringWhich links to follow and which to skip
transform_instructionstringWhat to extract from each page
max_pagesintegerMaximum number of pages to crawl
use_proxybooleanRoute through a residential proxy
proxy_countrystringTwo-letter country code, e.g. "us"
cookiesstringRaw Cookie header string for authenticated sites

crawl.submit() + crawl.get()

{:ok, %{"jobId" => job_id}} = Spidra.Crawl.submit(config, %{
  base_url: "https://example.com/docs",
  crawl_instruction: "Follow all documentation pages",
  transform_instruction: "Extract the page title and a short summary",
  max_pages: 50
})

# Poll manually
{:ok, status} = Spidra.Crawl.get(config, job_id)
# status: "waiting" | "active" | "running" | "completed" | "failed"

Download crawled content

Returns signed S3 URLs for the raw HTML and Markdown of each crawled page. Links expire after 1 hour.
{:ok, response} = Spidra.Crawl.pages(config, job_id)

for page <- response["pages"] do
  IO.puts("#{page["url"]} - #{page["status"]}")
  # Download raw HTML:     page["html_url"]
  # Download Markdown:     page["markdown_url"]
end

Re-extract without re-crawling

Apply a new AI prompt to an existing completed crawl without fetching the pages again. Only transformation credits are charged.
{:ok, queued} = Spidra.Crawl.extract(config, source_job_id, "Extract only the product SKUs and prices as a CSV")

# Poll the new extraction job
{:ok, result} = Spidra.Crawl.get(config, queued["jobId"])

History and stats

{:ok, response} = Spidra.Crawl.history(config, page: 1, limit: 10)
{:ok, stats}    = Spidra.Crawl.stats(config)

IO.puts("Total crawls: #{stats["total"]}")

Logs

Every API scrape job is logged automatically. Access your full history with optional filters.
{:ok, response} = Spidra.Logs.list(config, %{
  status:      "failed",
  search_term: "amazon.com",
  channel:     "api",
  date_start:  "2024-01-01",
  date_end:    "2024-12-31",
  page:        1,
  limit:       20
})

for log <- response["logs"] do
  IO.puts("#{hd(log["urls"])["url"]} #{log["status"]} #{log["credits_used"]}")
end
Filter parameters
ParameterTypeDescription
statusstring"success" or "failed"
search_termstringSearch by URL or prompt
channelstring"api" or "playground"
date_startstringISO date — return logs on or after this date
date_endstringISO date — return logs on or before this date
pageintegerPage number (default: 1)
limitintegerResults per page (default: 20)
Get a single log entry including the full AI extraction result:
{:ok, log} = Spidra.Logs.get(config, "log-uuid")
IO.inspect(log["result_data"]) # the full AI output for that job

Usage statistics

Returns credit and request usage broken down by day or week.
# Range options: "7d" | "30d" | "weekly"
{:ok, rows} = Spidra.Usage.get(config, "30d")

for row <- rows do
  IO.puts("#{row["date"]} Requests: #{row["requests"]} Credits: #{row["credits"]}")
end
RangeDescription
"7d"Last 7 days, one row per day
"30d"Last 30 days, one row per day
"weekly"Last 7 weeks, one row per week

Python

Official Python SDK — async-first with sync wrappers. Works in scripts, Django, Flask, and Jupyter.

.NET

Official .NET SDK — fully async, typed exceptions, JSON schema support. Requires .NET 8+.