> ## Documentation Index > Fetch the complete documentation index at: https://docs.spidra.io/llms.txt > Use this file to discover all available pages before exploring further. # Elixir > Official Elixir SDK for Spidra — scrape pages, run browser actions, batch-process URLs, and crawl entire sites. The official Elixir SDK for Spidra wraps the Spidra API so you're not writing raw HTTP calls and polling loops yourself. It handles job submission, status polling, retry logic, and error mapping. All results come back as structured data ready to feed into your LLM pipelines or store directly. ## Installation Add `spidra` to your list of dependencies in `mix.exs`: ```elixir theme={null} def deps do [ {:spidra, "~> 0.1.0"} ] end ``` Then run `mix deps.get` in your terminal. Get your API key from [app.spidra.io](https://app.spidra.io) under **Settings → API Keys**. Store it as an environment variable — never hardcode it in source. ## Requirements * Elixir 1.14 or later * A Spidra API key ([sign up free](https://spidra.io)) *** ## Getting started ```elixir theme={null} # Initialize your configuration config = Spidra.Config.new(api_key: "spd_YOUR_API_KEY") ``` From here you access everything through `Spidra.Scrape`, `Spidra.Batch`, `Spidra.Crawl`, `Spidra.Logs`, and `Spidra.Usage`. ## Scraping All scrape jobs run asynchronously on the Spidra platform. `Spidra.Scrape.run/3` submits a job and polls until it finishes. If you need more control, use `submit/2` and `get/2` directly. Up to 3 URLs can be passed per request and they are processed in parallel. ### Basic scrape Submit a scrape job and wait for results. ```elixir theme={null} {:ok, job} = Spidra.Scrape.run(config, %{ urls: [%{url: "https://example.com/pricing"}], prompt: "Extract all pricing plans with name, price, and included features", output: "json" }) IO.inspect(job["result"]["content"]) # "{ \"plans\": [{ \"name\": \"Starter\", \"price\": \"$9/mo\", \"features\": [...] }, ...] }" ``` **Parameters** | Parameter | Type | Description | | ---------------------- | ------- | -------------------------------------------------------------------- | | `urls` | list | Up to 3 URL maps. Each takes a `url` key and optional `actions` list | | `prompt` | string | AI extraction instruction | | `output` | string | `"markdown"` (default) or `"json"` | | `schema` | map | JSON Schema for guaranteed output shape (use with `output: "json"`) | | `use_proxy` | boolean | Route through a residential proxy | | `proxy_country` | string | Two-letter country code, e.g. `"us"`, `"de"`, `"jp"` | | `extract_content_only` | boolean | Strip navigation, ads, and boilerplate before AI extraction | | `screenshot` | boolean | Capture a screenshot of the page | | `full_page_screenshot` | boolean | Capture a full-page (scrolled) screenshot | | `cookies` | string | Raw `Cookie` header string for authenticated pages | ### Fire-and-forget approach Use `submit/2` and `get/2` when you want to manage polling yourself. ```elixir theme={null} # Submit a job and get the job_id immediately {:ok, %{"jobId" => job_id}} = Spidra.Scrape.submit(config, %{ urls: [%{url: "https://example.com"}], prompt: "Extract the main headline" }) # Check status at any point {:ok, status} = Spidra.Scrape.get(config, job_id) case status["status"] do "completed" -> IO.inspect(status["result"]["content"]) "failed" -> IO.inspect(status["error"]) _ -> IO.puts("Job is still pending...") end ``` **Job statuses:** `waiting` · `active` · `completed` · `failed` ### Structured JSON output Pass a `schema` to enforce an exact output shape. Missing fields come back as `null` rather than hallucinated values. ```elixir theme={null} {:ok, job} = Spidra.Scrape.run(config, %{ urls: [%{url: "https://jobs.example.com/senior-engineer"}], prompt: "Extract the job listing details", output: "json", schema: %{ "type" => "object", "required" => ["title", "company", "remote"], "properties" => %{ "title" => %{"type" => "string"}, "company" => %{"type" => "string"}, "remote" => %{"type" => ["boolean", "null"]}, "salary_min" => %{"type" => ["number", "null"]}, "salary_max" => %{"type" => ["number", "null"]}, "skills" => %{"type" => "array", "items" => %{"type" => "string"}} } } }) ``` ### Geo-targeted scraping Pass `use_proxy: true` and a `proxy_country` code to route the request through a specific country. Useful for geo-restricted content or localized pricing. ```elixir theme={null} {:ok, job} = Spidra.Scrape.run(config, %{ urls: [%{url: "https://www.amazon.de/gp/bestsellers"}], prompt: "List the top 10 products with name and price", use_proxy: true, proxy_country: "de" }) ``` Supported country codes include `us`, `gb`, `de`, `fr`, `jp`, `au`, `ca`, `br`, `in`, `nl`, `sg`, `es`, `it`, `mx`, and [40+ more](/features/stealth-mode#country-targeting). Use `"global"` or `"eu"` for regional routing. ### Authenticated pages Pass cookies as a string to scrape pages that require a login session. ```elixir theme={null} {:ok, job} = Spidra.Scrape.run(config, %{ urls: [%{url: "https://app.example.com/dashboard"}], prompt: "Extract the monthly revenue and active user count", cookies: "session=abc123; auth_token=xyz789" }) ``` ### Browser actions Actions let you interact with the page before the scrape runs. They execute in order, and the scrape happens after all actions complete. ```elixir theme={null} {:ok, job} = Spidra.Scrape.run(config, %{ urls: [ %{ url: "https://example.com/products", actions: [ %{type: "click", selector: "#accept-cookies"}, %{type: "wait", duration: 1000}, %{type: "scroll", to: "80%"} ] } ], prompt: "Extract all product names and prices" }) ``` **Available actions** | Action | Required fields | Description | | --------- | --------------------- | ---------------------------------------------------- | | `click` | `selector` or `value` | Click a button, link, or any element | | `type` | `selector`, `value` | Type text into an input or textarea | | `check` | `selector` or `value` | Check a checkbox | | `uncheck` | `selector` or `value` | Uncheck a checkbox | | `wait` | `duration` (ms) | Pause for a set number of milliseconds | | `scroll` | `to` (`0–100%`) | Scroll the page to a percentage of its height | | `forEach` | `observe` | Loop over every matched element and process each one | Use `selector` for a CSS selector or XPath. Use `value` for plain English — Spidra locates the element using AI. *** ## Batch scraping Submit up to 50 URLs in a single request. All URLs are processed in parallel. Each URL is a plain string. ```elixir theme={null} {:ok, batch} = Spidra.Batch.run(config, %{ urls: [ "https://shop.example.com/product/1", "https://shop.example.com/product/2", "https://shop.example.com/product/3" ], prompt: "Extract product name, price, and availability", output: "json", use_proxy: true }) for item <- batch["items"] do case item["status"] do "completed" -> IO.inspect({item["url"], item["result"]}) "failed" -> IO.inspect({item["url"], item["error"]}) _ -> :ok end end ``` **Item statuses:** `pending` · `running` · `completed` · `failed` **Batch statuses:** `pending` · `running` · `completed` · `failed` · `cancelled` ### batch.submit() + batch.get() ```elixir theme={null} {:ok, %{"batchId" => batch_id}} = Spidra.Batch.submit(config, %{ urls: ["https://example.com/1", "https://example.com/2"], prompt: "Extract the page title" }) # Come back later {:ok, result} = Spidra.Batch.get(config, batch_id) ``` ### Retry failed items Re-queue only the items that failed — successful items are not re-run. ```elixir theme={null} {:ok, result} = Spidra.Batch.get(config, batch_id) if result["failedCount"] > 0 do {:ok, retried} = Spidra.Batch.retry(config, batch_id) IO.puts("Retried #{retried["retriedCount"]} items") end ``` ### Cancel a batch Stops all pending items and refunds credits for unprocessed work. ```elixir theme={null} {:ok, response} = Spidra.Batch.cancel(config, batch_id) IO.puts("Cancelled #{response["cancelledItems"]} items, refunded #{response["creditsRefunded"]} credits") ``` ### List past batches ```elixir theme={null} {:ok, response} = Spidra.Batch.list(config, page: 1, limit: 20) for job <- response["jobs"] do IO.puts("#{job["uuid"]} #{job["status"]} #{job["completedCount"]}/#{job["totalUrls"]}") end ``` *** ## Crawling Give Spidra a starting URL and instructions for which links to follow. It discovers pages automatically, extracts structured data from each one, and returns everything when the crawl is done. ```elixir theme={null} {:ok, job} = Spidra.Crawl.run(config, %{ base_url: "https://competitor.com/blog", crawl_instruction: "Find all blog posts published in 2024", transform_instruction: "Extract the title, author, publish date, and a one-sentence summary", max_pages: 30, use_proxy: true }) for page <- job["result"] do IO.inspect({page["url"], page["data"]}) end ``` **Parameters** | Parameter | Type | Description | | ----------------------- | ------- | -------------------------------------------------- | | `base_url` | string | Starting URL for the crawl | | `crawl_instruction` | string | Which links to follow and which to skip | | `transform_instruction` | string | What to extract from each page | | `max_pages` | integer | Maximum number of pages to crawl | | `use_proxy` | boolean | Route through a residential proxy | | `proxy_country` | string | Two-letter country code, e.g. `"us"` | | `cookies` | string | Raw `Cookie` header string for authenticated sites | ### crawl.submit() + crawl.get() ```elixir theme={null} {:ok, %{"jobId" => job_id}} = Spidra.Crawl.submit(config, %{ base_url: "https://example.com/docs", crawl_instruction: "Follow all documentation pages", transform_instruction: "Extract the page title and a short summary", max_pages: 50 }) # Poll manually {:ok, status} = Spidra.Crawl.get(config, job_id) # status: "waiting" | "active" | "running" | "completed" | "failed" ``` ### Download crawled content Returns signed S3 URLs for the raw HTML and Markdown of each crawled page. Links expire after **1 hour**. ```elixir theme={null} {:ok, response} = Spidra.Crawl.pages(config, job_id) for page <- response["pages"] do IO.puts("#{page["url"]} - #{page["status"]}") # Download raw HTML: page["html_url"] # Download Markdown: page["markdown_url"] end ``` ### Re-extract without re-crawling Apply a new AI prompt to an existing completed crawl without fetching the pages again. Only transformation credits are charged. ```elixir theme={null} {:ok, queued} = Spidra.Crawl.extract(config, source_job_id, "Extract only the product SKUs and prices as a CSV") # Poll the new extraction job {:ok, result} = Spidra.Crawl.get(config, queued["jobId"]) ``` ### History and stats ```elixir theme={null} {:ok, response} = Spidra.Crawl.history(config, page: 1, limit: 10) {:ok, stats} = Spidra.Crawl.stats(config) IO.puts("Total crawls: #{stats["total"]}") ``` *** ## Logs Every API scrape job is logged automatically. Access your full history with optional filters. ```elixir theme={null} {:ok, response} = Spidra.Logs.list(config, %{ status: "failed", search_term: "amazon.com", channel: "api", date_start: "2024-01-01", date_end: "2024-12-31", page: 1, limit: 20 }) for log <- response["logs"] do IO.puts("#{hd(log["urls"])["url"]} #{log["status"]} #{log["credits_used"]}") end ``` **Filter parameters** | Parameter | Type | Description | | ------------- | ------- | --------------------------------------------- | | `status` | string | `"success"` or `"failed"` | | `search_term` | string | Search by URL or prompt | | `channel` | string | `"api"` or `"playground"` | | `date_start` | string | ISO date — return logs on or after this date | | `date_end` | string | ISO date — return logs on or before this date | | `page` | integer | Page number (default: 1) | | `limit` | integer | Results per page (default: 20) | Get a single log entry including the full AI extraction result: ```elixir theme={null} {:ok, log} = Spidra.Logs.get(config, "log-uuid") IO.inspect(log["result_data"]) # the full AI output for that job ``` ## Usage statistics Returns credit and request usage broken down by day or week. ```elixir theme={null} # Range options: "7d" | "30d" | "weekly" {:ok, rows} = Spidra.Usage.get(config, "30d") for row <- rows do IO.puts("#{row["date"]} Requests: #{row["requests"]} Credits: #{row["credits"]}") end ``` | Range | Description | | ---------- | ------------------------------ | | `"7d"` | Last 7 days, one row per day | | `"30d"` | Last 30 days, one row per day | | `"weekly"` | Last 7 weeks, one row per week | Official Python SDK — async-first with sync wrappers. Works in scripts, Django, Flask, and Jupyter. Official .NET SDK — fully async, typed exceptions, JSON schema support. Requires .NET 8+.