Skip to main content
The official Java SDK for Spidra lets you extract structured data from any website by describing what you want in plain English. It handles JavaScript rendering, anti-bot bypass, and CAPTCHA solving as a managed API, so your Java code stays focused on the data.
  • Java 17+ — uses java.net.http.HttpClient, no extra HTTP dependencies
  • Jackson for JSON (de)serialization
  • CompletableFuture<T> for all async operations
  • Builder pattern for all request parameter objects

Installation

Gradle

dependencies {
    implementation 'io.spidra:spidra-java-sdk:0.1.0'
}

Maven

<dependency>
    <groupId>io.spidra</groupId>
    <artifactId>spidra-java-sdk</artifactId>
    <version>0.1.0</version>
</dependency>
Get your API key from app.spidra.io under Settings → API Keys. Keep your key out of source control — read it from an environment variable or a secrets manager.

Getting started

All requests require an API key sent as the x-api-key header. Pass it to the client:
SpidraClient client = new SpidraClient(System.getenv("SPIDRA_API_KEY"));

Quick start

import io.spidra.sdk.SpidraClient;
import io.spidra.sdk.model.scrape.ScrapeParams;

SpidraClient client = new SpidraClient("your-api-key");

ScrapeParams params = ScrapeParams.builder()
    .url("https://example.com")
    .prompt("Extract the page title and main heading")
    .build();

// submit + poll until complete (non-blocking, returns CompletableFuture)
client.scrape().run(params)
    .thenAccept(job -> System.out.println(job.getResult().getContent()))
    .exceptionally(err -> { err.printStackTrace(); return null; })
    .join(); // block the main thread for this example
run() submits the job and polls until it completes. The CompletableFuture resolves with the final result.

Scraping

Single-page scrape

import io.spidra.sdk.SpidraClient;
import io.spidra.sdk.model.scrape.ScrapeParams;
import io.spidra.sdk.model.scrape.ScrapeJob;

SpidraClient client = new SpidraClient(System.getenv("SPIDRA_API_KEY"));

ScrapeParams params = ScrapeParams.builder()
    .url("https://news.ycombinator.com")
    .prompt("Extract the top 10 story titles and their URLs")
    .outputFormat("json")
    .build();

ScrapeJob job = client.scrape().run(params).join();
System.out.println("Status: " + job.getStatus());
System.out.println("Content: " + job.getResult().getContent());
System.out.println("Extracted data: " + job.getResult().getData());
Job statuses: waiting · active · completed · failed

Submit and poll manually

If you need to track progress yourself, use submit() and get() directly:
// Step 1: submit
ScrapeJob pending = client.scrape().submit(params).join();
System.out.println("Job submitted: " + pending.getJobId());

// Step 2: poll manually
ScrapeJob current;
do {
    Thread.sleep(2000);
    current = client.scrape().get(pending.getJobId()).join();
    System.out.println("Status: " + current.getStatus());
} while (!current.isTerminal());

Browser actions

import io.spidra.sdk.model.scrape.BrowserAction;
import java.util.List;

ScrapeParams params = ScrapeParams.builder()
    .url("https://example.com/login")
    .browserActions(List.of(
        BrowserAction.builder().type("type").selector("#email").value("[email protected]").build(),
        BrowserAction.builder().type("type").selector("#password").value("secret").build(),
        BrowserAction.builder().type("click").selector("button[type=submit]").build(),
        BrowserAction.builder().type("wait").waitMs(2000).build()
    ))
    .prompt("Extract the user dashboard summary")
    .build();

ScrapeJob job = client.scrape().run(params).join();
Available actions
ActionRequired fieldsDescription
clickselector or valueClick a button, link, or any element
typeselector, valueType text into an input or textarea
checkselector or valueCheck a checkbox
uncheckselector or valueUncheck a checkbox
waitwaitMs (ms)Pause for a set number of milliseconds
scrollto (0–100%)Scroll the page to a percentage of its height

Batch scraping

Submit up to 50 URLs in a single request. All URLs are processed in parallel.
import io.spidra.sdk.model.batch.BatchParams;
import io.spidra.sdk.model.batch.BatchJob;
import io.spidra.sdk.model.scrape.ScrapeUrl;
import java.util.List;

BatchParams params = BatchParams.builder()
    .urls(List.of(
        ScrapeUrl.builder().url("https://example.com/page1").build(),
        ScrapeUrl.builder().url("https://example.com/page2").build(),
        ScrapeUrl.builder().url("https://example.com/page3").build()
    ))
    .prompt("Extract the article title, author, and publication date")
    .outputFormat("json")
    .build();

BatchJob job = client.batch().run(params).join();

System.out.println("Completed: " + job.getCompletedCount() + "/" + job.getTotal());
job.getItems().forEach(item -> {
    System.out.println(item.getUrl() + " -> " + item.getStatus());
    if ("completed".equals(item.getStatus())) {
        System.out.println("  Data: " + item.getResult().getData());
    }
});
Item statuses: pending · running · completed · failed Batch statuses: pending · running · completed · failed · cancelled

Cancel a batch

BatchCancelResult result = client.batch().cancel(batchId).join();
System.out.println("Cancelled: " + result.getCancelledItems() + " items");
System.out.println("Refunded: " + result.getCreditsRefunded() + " credits");

Crawling

Give Spidra a starting URL and instructions for which links to follow. It discovers pages automatically and extracts structured data from each one.
import io.spidra.sdk.model.crawl.CrawlParams;
import io.spidra.sdk.model.crawl.CrawlJob;
import io.spidra.sdk.model.crawl.CrawlPagesResult;

CrawlParams params = CrawlParams.builder()
    .url("https://example.com")
    .maxDepth(3)
    .maxPages(100)
    .includePatterns(List.of("/blog/*"))
    .excludePatterns(List.of("/tag/*", "/author/*"))
    .prompt("Extract the blog post title and summary")
    .build();

CrawlJob job = client.crawl().run(params).join();
System.out.println("Crawled " + job.getPagesCrawled() + " pages");

Retrieve all crawled pages

CrawlPagesResult pages = client.crawl().pages(job.getJobId()).join();
pages.getPages().forEach(page -> {
    System.out.println(page.getUrl() + " [depth=" + page.getDepth() + "]");
    if (page.getData() != null) {
        System.out.println("  Extracted: " + page.getData());
    }
});

Re-extract without re-crawling

Apply a new AI prompt to an existing completed crawl without fetching pages again.
Object extracted = client.crawl()
    .extract(jobId, "Summarize all blog posts into a single markdown document")
    .join();
System.out.println(extracted);

Logs

Every API scrape job is logged automatically.
import io.spidra.sdk.model.logs.LogsParams;
import io.spidra.sdk.model.logs.LogsResult;
import io.spidra.sdk.model.logs.ScrapeLogDetail;

LogsParams params = LogsParams.builder()
    .status("completed")
    .channel("production")
    .limit(25)
    .page(1)
    .dateStart("2024-01-01T00:00:00Z")
    .build();

LogsResult result = client.logs().list(params).join();
System.out.println("Total logs: " + result.getPagination().get("total"));

// Get full detail for a single log
ScrapeLogDetail detail = client.logs().get(result.getLogs().get(0).getUuid()).join();
System.out.println("Prompt used: " + detail.getPrompt());

Usage statistics

Returns credit and request usage broken down by day or week.
import io.spidra.sdk.model.usage.UsageStats;

UsageStats stats = client.usage().get("30d").join();
System.out.println("Plan: " + stats.getPlan());
System.out.println("Credits used (30d): " + stats.getCreditsUsed());
System.out.println("Credits remaining: " + stats.getCreditsRemaining());
RangeDescription
"7d"Last 7 days, one row per day
"30d"Last 30 days, one row per day
"weekly"Last 7 weeks, one row per week

Error handling

All exceptions extend SpidraException (an unchecked RuntimeException):
ExceptionHTTP Status
SpidraAuthException401, 403
SpidraInsufficientCreditsException402
SpidraRateLimitException429
SpidraServerException5xx
SpidraExceptionany other
import io.spidra.sdk.exception.*;

client.scrape().run(params)
    .thenAccept(job -> System.out.println(job.getResult().getContent()))
    .exceptionally(throwable -> {
        Throwable cause = throwable.getCause();
        if (cause instanceof SpidraAuthException) {
            System.err.println("Invalid API key");
        } else if (cause instanceof SpidraRateLimitException) {
            System.err.println("Rate limited — back off and retry");
        } else if (cause instanceof SpidraInsufficientCreditsException) {
            System.err.println("Out of credits");
        } else if (cause instanceof SpidraException ex) {
            System.err.println("API error [" + ex.getStatusCode() + "]: " + ex.getMessage());
        }
        return null;
    });

Configuration

Custom base URL

// Point at a staging environment or local mock
SpidraClient client = new SpidraClient("your-api-key", "https://staging-api.spidra.io/api");

Environment variable pattern

SpidraClient client = new SpidraClient(
    Objects.requireNonNull(System.getenv("SPIDRA_API_KEY"), "SPIDRA_API_KEY env var not set")
);

Building from source

./gradlew build
./gradlew test
./gradlew javadoc

Swift

Official Swift SDK — async/await native, works on iOS, macOS, tvOS, watchOS, and server-side Swift.

Rust

Official Rust SDK — tokio-based async, zero-cost abstractions, returns Result on every call.