Documentation Index Fetch the complete documentation index at: https://docs.spidra.io/llms.txt
Use this file to discover all available pages before exploring further.
The official Java SDK for Spidra lets you extract structured data from any website by describing what you want in plain English. It handles JavaScript rendering, anti-bot bypass, and CAPTCHA solving as a managed API, so your Java code stays focused on the data.
Java 17+ — uses java.net.http.HttpClient, no extra HTTP dependencies
Jackson for JSON (de)serialization
CompletableFuture<T> for all async operations
Builder pattern for all request parameter objects
Installation
Gradle
dependencies {
implementation 'io.spidra:spidra-java-sdk:0.1.0'
}
Maven
< dependency >
< groupId > io.spidra </ groupId >
< artifactId > spidra-java-sdk </ artifactId >
< version > 0.1.0 </ version >
</ dependency >
Get your API key from app.spidra.io under Settings → API Keys .
Keep your key out of source control — read it from an environment variable or a secrets manager.
Getting started
All requests require an API key sent as the x-api-key header. Pass it to the client:
SpidraClient client = new SpidraClient ( System . getenv ( "SPIDRA_API_KEY" ));
Quick start
import io.spidra.sdk.SpidraClient;
import io.spidra.sdk.model.scrape.ScrapeParams;
SpidraClient client = new SpidraClient ( "your-api-key" );
ScrapeParams params = ScrapeParams . builder ()
. url ( "https://example.com" )
. prompt ( "Extract the page title and main heading" )
. build ();
// submit + poll until complete (non-blocking, returns CompletableFuture)
client . scrape (). run (params)
. thenAccept (job -> System . out . println ( job . getResult (). getContent ()))
. exceptionally (err -> { err . printStackTrace (); return null ; })
. join (); // block the main thread for this example
run() submits the job and polls until it completes. The CompletableFuture resolves with the final result.
Scraping
Single-page scrape
import io.spidra.sdk.SpidraClient;
import io.spidra.sdk.model.scrape.ScrapeParams;
import io.spidra.sdk.model.scrape.ScrapeJob;
SpidraClient client = new SpidraClient ( System . getenv ( "SPIDRA_API_KEY" ));
ScrapeParams params = ScrapeParams . builder ()
. url ( "https://news.ycombinator.com" )
. prompt ( "Extract the top 10 story titles and their URLs" )
. outputFormat ( "json" )
. build ();
ScrapeJob job = client . scrape (). run (params). join ();
System . out . println ( "Status: " + job . getStatus ());
System . out . println ( "Content: " + job . getResult (). getContent ());
System . out . println ( "Extracted data: " + job . getResult (). getData ());
Job statuses: waiting · active · completed · failed
Submit and poll manually
If you need to track progress yourself, use submit() and get() directly:
// Step 1: submit
ScrapeJob pending = client . scrape (). submit (params). join ();
System . out . println ( "Job submitted: " + pending . getJobId ());
// Step 2: poll manually
ScrapeJob current ;
do {
Thread . sleep ( 2000 );
current = client . scrape (). get ( pending . getJobId ()). join ();
System . out . println ( "Status: " + current . getStatus ());
} while ( ! current . isTerminal ());
Browser actions
import io.spidra.sdk.model.scrape.BrowserAction;
import java.util.List;
ScrapeParams params = ScrapeParams . builder ()
. url ( "https://example.com/login" )
. browserActions ( List . of (
BrowserAction . builder (). type ( "type" ). selector ( "#email" ). value ( "[email protected] " ). build (),
BrowserAction . builder (). type ( "type" ). selector ( "#password" ). value ( "secret" ). build (),
BrowserAction . builder (). type ( "click" ). selector ( "button[type=submit]" ). build (),
BrowserAction . builder (). type ( "wait" ). waitMs ( 2000 ). build ()
))
. prompt ( "Extract the user dashboard summary" )
. build ();
ScrapeJob job = client . scrape (). run (params). join ();
Available actions
Action Required fields Description clickselector or valueClick a button, link, or any element typeselector, valueType text into an input or textarea checkselector or valueCheck a checkbox uncheckselector or valueUncheck a checkbox waitwaitMs (ms)Pause for a set number of milliseconds scrollto (0–100%)Scroll the page to a percentage of its height
Batch scraping
Submit up to 50 URLs in a single request. All URLs are processed in parallel.
import io.spidra.sdk.model.batch.BatchParams;
import io.spidra.sdk.model.batch.BatchJob;
import io.spidra.sdk.model.scrape.ScrapeUrl;
import java.util.List;
BatchParams params = BatchParams . builder ()
. urls ( List . of (
ScrapeUrl . builder (). url ( "https://example.com/page1" ). build (),
ScrapeUrl . builder (). url ( "https://example.com/page2" ). build (),
ScrapeUrl . builder (). url ( "https://example.com/page3" ). build ()
))
. prompt ( "Extract the article title, author, and publication date" )
. outputFormat ( "json" )
. build ();
BatchJob job = client . batch (). run (params). join ();
System . out . println ( "Completed: " + job . getCompletedCount () + "/" + job . getTotal ());
job . getItems (). forEach (item -> {
System . out . println ( item . getUrl () + " -> " + item . getStatus ());
if ( "completed" . equals ( item . getStatus ())) {
System . out . println ( " Data: " + item . getResult (). getData ());
}
});
Item statuses: pending · running · completed · failed
Batch statuses: pending · running · completed · failed · cancelled
Cancel a batch
BatchCancelResult result = client . batch (). cancel (batchId). join ();
System . out . println ( "Cancelled: " + result . getCancelledItems () + " items" );
System . out . println ( "Refunded: " + result . getCreditsRefunded () + " credits" );
Crawling
Give Spidra a starting URL and instructions for which links to follow. It discovers pages automatically and extracts structured data from each one.
import io.spidra.sdk.model.crawl.CrawlParams;
import io.spidra.sdk.model.crawl.CrawlJob;
import io.spidra.sdk.model.crawl.CrawlPagesResult;
CrawlParams params = CrawlParams . builder ()
. url ( "https://example.com" )
. maxDepth ( 3 )
. maxPages ( 100 )
. includePatterns ( List . of ( "/blog/*" ))
. excludePatterns ( List . of ( "/tag/*" , "/author/*" ))
. prompt ( "Extract the blog post title and summary" )
. build ();
CrawlJob job = client . crawl (). run (params). join ();
System . out . println ( "Crawled " + job . getPagesCrawled () + " pages" );
Retrieve all crawled pages
CrawlPagesResult pages = client . crawl (). pages ( job . getJobId ()). join ();
pages . getPages (). forEach (page -> {
System . out . println ( page . getUrl () + " [depth=" + page . getDepth () + "]" );
if ( page . getData () != null ) {
System . out . println ( " Extracted: " + page . getData ());
}
});
Apply a new AI prompt to an existing completed crawl without fetching pages again.
Object extracted = client . crawl ()
. extract (jobId, "Summarize all blog posts into a single markdown document" )
. join ();
System . out . println (extracted);
Logs
Every API scrape job is logged automatically.
import io.spidra.sdk.model.logs.LogsParams;
import io.spidra.sdk.model.logs.LogsResult;
import io.spidra.sdk.model.logs.ScrapeLogDetail;
LogsParams params = LogsParams . builder ()
. status ( "completed" )
. channel ( "production" )
. limit ( 25 )
. page ( 1 )
. dateStart ( "2024-01-01T00:00:00Z" )
. build ();
LogsResult result = client . logs (). list (params). join ();
System . out . println ( "Total logs: " + result . getPagination (). get ( "total" ));
// Get full detail for a single log
ScrapeLogDetail detail = client . logs (). get ( result . getLogs (). get ( 0 ). getUuid ()). join ();
System . out . println ( "Prompt used: " + detail . getPrompt ());
Usage statistics
Returns credit and request usage broken down by day or week.
import io.spidra.sdk.model.usage.UsageStats;
UsageStats stats = client . usage (). get ( "30d" ). join ();
System . out . println ( "Plan: " + stats . getPlan ());
System . out . println ( "Credits used (30d): " + stats . getCreditsUsed ());
System . out . println ( "Credits remaining: " + stats . getCreditsRemaining ());
Range Description "7d"Last 7 days, one row per day "30d"Last 30 days, one row per day "weekly"Last 7 weeks, one row per week
Error handling
All exceptions extend SpidraException (an unchecked RuntimeException):
Exception HTTP Status SpidraAuthException401, 403 SpidraInsufficientCreditsException402 SpidraRateLimitException429 SpidraServerException5xx SpidraExceptionany other
import io.spidra.sdk.exception. * ;
client . scrape (). run (params)
. thenAccept (job -> System . out . println ( job . getResult (). getContent ()))
. exceptionally (throwable -> {
Throwable cause = throwable . getCause ();
if (cause instanceof SpidraAuthException) {
System . err . println ( "Invalid API key" );
} else if (cause instanceof SpidraRateLimitException) {
System . err . println ( "Rate limited — back off and retry" );
} else if (cause instanceof SpidraInsufficientCreditsException) {
System . err . println ( "Out of credits" );
} else if (cause instanceof SpidraException ex) {
System . err . println ( "API error [" + ex . getStatusCode () + "]: " + ex . getMessage ());
}
return null ;
});
Configuration
Custom base URL
// Point at a staging environment or local mock
SpidraClient client = new SpidraClient ( "your-api-key" , "https://staging-api.spidra.io/api" );
Environment variable pattern
SpidraClient client = new SpidraClient (
Objects . requireNonNull ( System . getenv ( "SPIDRA_API_KEY" ), "SPIDRA_API_KEY env var not set" )
);
Building from source
./gradlew build
./gradlew test
./gradlew javadoc
Swift Official Swift SDK — async/await native, works on iOS, macOS, tvOS, watchOS, and server-side Swift.
Rust Official Rust SDK — tokio-based async, zero-cost abstractions, returns Result on every call.