> ## Documentation Index
> Fetch the complete documentation index at: https://docs.spidra.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Java

> Official Java SDK for Spidra — AI-powered web scraping with proxy rotation and CAPTCHA handling.

The official Java SDK for Spidra lets you extract structured data from any website by describing what you want in plain English. It handles JavaScript rendering, anti-bot bypass, and CAPTCHA solving as a managed API, so your Java code stays focused on the data.

* **Java 17+** — uses `java.net.http.HttpClient`, no extra HTTP dependencies
* **Jackson** for JSON (de)serialization
* **`CompletableFuture<T>`** for all async operations
* **Builder pattern** for all request parameter objects

## Installation

### Gradle

```groovy theme={null}
dependencies {
    implementation 'io.spidra:spidra-java-sdk:0.1.0'
}
```

### Maven

```xml theme={null}
<dependency>
    <groupId>io.spidra</groupId>
    <artifactId>spidra-java-sdk</artifactId>
    <version>0.1.0</version>
</dependency>
```

<Note>
  Get your API key from [app.spidra.io](https://app.spidra.io) under **Settings → API Keys**.
  Keep your key out of source control — read it from an environment variable or a secrets manager.
</Note>

***

## Getting started

All requests require an API key sent as the `x-api-key` header. Pass it to the client:

```java theme={null}
SpidraClient client = new SpidraClient(System.getenv("SPIDRA_API_KEY"));
```

## Quick start

```java theme={null}
import io.spidra.sdk.SpidraClient;
import io.spidra.sdk.model.scrape.ScrapeParams;

SpidraClient client = new SpidraClient("your-api-key");

ScrapeParams params = ScrapeParams.builder()
    .url("https://example.com")
    .prompt("Extract the page title and main heading")
    .build();

// submit + poll until complete (non-blocking, returns CompletableFuture)
client.scrape().run(params)
    .thenAccept(job -> System.out.println(job.getResult().getContent()))
    .exceptionally(err -> { err.printStackTrace(); return null; })
    .join(); // block the main thread for this example
```

`run()` submits the job and polls until it completes. The `CompletableFuture` resolves with the final result.

***

## Scraping

### Single-page scrape

```java theme={null}
import io.spidra.sdk.SpidraClient;
import io.spidra.sdk.model.scrape.ScrapeParams;
import io.spidra.sdk.model.scrape.ScrapeJob;

SpidraClient client = new SpidraClient(System.getenv("SPIDRA_API_KEY"));

ScrapeParams params = ScrapeParams.builder()
    .url("https://news.ycombinator.com")
    .prompt("Extract the top 10 story titles and their URLs")
    .outputFormat("json")
    .build();

ScrapeJob job = client.scrape().run(params).join();
System.out.println("Status: " + job.getStatus());
System.out.println("Content: " + job.getResult().getContent());
System.out.println("Extracted data: " + job.getResult().getData());
```

**Job statuses:** `waiting` · `active` · `completed` · `failed`

### Submit and poll manually

If you need to track progress yourself, use `submit()` and `get()` directly:

```java theme={null}
// Step 1: submit
ScrapeJob pending = client.scrape().submit(params).join();
System.out.println("Job submitted: " + pending.getJobId());

// Step 2: poll manually
ScrapeJob current;
do {
    Thread.sleep(2000);
    current = client.scrape().get(pending.getJobId()).join();
    System.out.println("Status: " + current.getStatus());
} while (!current.isTerminal());
```

### Browser actions

```java theme={null}
import io.spidra.sdk.model.scrape.BrowserAction;
import java.util.List;

ScrapeParams params = ScrapeParams.builder()
    .url("https://example.com/login")
    .browserActions(List.of(
        BrowserAction.builder().type("type").selector("#email").value("user@example.com").build(),
        BrowserAction.builder().type("type").selector("#password").value("secret").build(),
        BrowserAction.builder().type("click").selector("button[type=submit]").build(),
        BrowserAction.builder().type("wait").waitMs(2000).build()
    ))
    .prompt("Extract the user dashboard summary")
    .build();

ScrapeJob job = client.scrape().run(params).join();
```

**Available actions**

| Action    | Required fields       | Description                                   |
| --------- | --------------------- | --------------------------------------------- |
| `click`   | `selector` or `value` | Click a button, link, or any element          |
| `type`    | `selector`, `value`   | Type text into an input or textarea           |
| `check`   | `selector` or `value` | Check a checkbox                              |
| `uncheck` | `selector` or `value` | Uncheck a checkbox                            |
| `wait`    | `waitMs` (ms)         | Pause for a set number of milliseconds        |
| `scroll`  | `to` (`0–100%`)       | Scroll the page to a percentage of its height |

***

## Batch scraping

Submit up to 50 URLs in a single request. All URLs are processed in parallel.

```java theme={null}
import io.spidra.sdk.model.batch.BatchParams;
import io.spidra.sdk.model.batch.BatchJob;
import io.spidra.sdk.model.scrape.ScrapeUrl;
import java.util.List;

BatchParams params = BatchParams.builder()
    .urls(List.of(
        ScrapeUrl.builder().url("https://example.com/page1").build(),
        ScrapeUrl.builder().url("https://example.com/page2").build(),
        ScrapeUrl.builder().url("https://example.com/page3").build()
    ))
    .prompt("Extract the article title, author, and publication date")
    .outputFormat("json")
    .build();

BatchJob job = client.batch().run(params).join();

System.out.println("Completed: " + job.getCompletedCount() + "/" + job.getTotal());
job.getItems().forEach(item -> {
    System.out.println(item.getUrl() + " -> " + item.getStatus());
    if ("completed".equals(item.getStatus())) {
        System.out.println("  Data: " + item.getResult().getData());
    }
});
```

**Item statuses:** `pending` · `running` · `completed` · `failed`

**Batch statuses:** `pending` · `running` · `completed` · `failed` · `cancelled`

### Cancel a batch

```java theme={null}
BatchCancelResult result = client.batch().cancel(batchId).join();
System.out.println("Cancelled: " + result.getCancelledItems() + " items");
System.out.println("Refunded: " + result.getCreditsRefunded() + " credits");
```

***

## Crawling

Give Spidra a starting URL and instructions for which links to follow. It discovers pages automatically and extracts structured data from each one.

```java theme={null}
import io.spidra.sdk.model.crawl.CrawlParams;
import io.spidra.sdk.model.crawl.CrawlJob;
import io.spidra.sdk.model.crawl.CrawlPagesResult;

CrawlParams params = CrawlParams.builder()
    .url("https://example.com")
    .maxDepth(3)
    .maxPages(100)
    .includePatterns(List.of("/blog/*"))
    .excludePatterns(List.of("/tag/*", "/author/*"))
    .prompt("Extract the blog post title and summary")
    .build();

CrawlJob job = client.crawl().run(params).join();
System.out.println("Crawled " + job.getPagesCrawled() + " pages");
```

### Retrieve all crawled pages

```java theme={null}
CrawlPagesResult pages = client.crawl().pages(job.getJobId()).join();
pages.getPages().forEach(page -> {
    System.out.println(page.getUrl() + " [depth=" + page.getDepth() + "]");
    if (page.getData() != null) {
        System.out.println("  Extracted: " + page.getData());
    }
});
```

### Re-extract without re-crawling

Apply a new AI prompt to an existing completed crawl without fetching pages again.

```java theme={null}
Object extracted = client.crawl()
    .extract(jobId, "Summarize all blog posts into a single markdown document")
    .join();
System.out.println(extracted);
```

***

## Logs

Every API scrape job is logged automatically.

```java theme={null}
import io.spidra.sdk.model.logs.LogsParams;
import io.spidra.sdk.model.logs.LogsResult;
import io.spidra.sdk.model.logs.ScrapeLogDetail;

LogsParams params = LogsParams.builder()
    .status("completed")
    .channel("production")
    .limit(25)
    .page(1)
    .dateStart("2024-01-01T00:00:00Z")
    .build();

LogsResult result = client.logs().list(params).join();
System.out.println("Total logs: " + result.getPagination().get("total"));

// Get full detail for a single log
ScrapeLogDetail detail = client.logs().get(result.getLogs().get(0).getUuid()).join();
System.out.println("Prompt used: " + detail.getPrompt());
```

***

## Usage statistics

Returns credit and request usage broken down by day or week.

```java theme={null}
import io.spidra.sdk.model.usage.UsageStats;

UsageStats stats = client.usage().get("30d").join();
System.out.println("Plan: " + stats.getPlan());
System.out.println("Credits used (30d): " + stats.getCreditsUsed());
System.out.println("Credits remaining: " + stats.getCreditsRemaining());
```

| Range      | Description                    |
| ---------- | ------------------------------ |
| `"7d"`     | Last 7 days, one row per day   |
| `"30d"`    | Last 30 days, one row per day  |
| `"weekly"` | Last 7 weeks, one row per week |

## Error handling

All exceptions extend `SpidraException` (an unchecked `RuntimeException`):

| Exception                            | HTTP Status |
| ------------------------------------ | ----------- |
| `SpidraAuthException`                | 401, 403    |
| `SpidraInsufficientCreditsException` | 402         |
| `SpidraRateLimitException`           | 429         |
| `SpidraServerException`              | 5xx         |
| `SpidraException`                    | any other   |

```java theme={null}
import io.spidra.sdk.exception.*;

client.scrape().run(params)
    .thenAccept(job -> System.out.println(job.getResult().getContent()))
    .exceptionally(throwable -> {
        Throwable cause = throwable.getCause();
        if (cause instanceof SpidraAuthException) {
            System.err.println("Invalid API key");
        } else if (cause instanceof SpidraRateLimitException) {
            System.err.println("Rate limited — back off and retry");
        } else if (cause instanceof SpidraInsufficientCreditsException) {
            System.err.println("Out of credits");
        } else if (cause instanceof SpidraException ex) {
            System.err.println("API error [" + ex.getStatusCode() + "]: " + ex.getMessage());
        }
        return null;
    });
```

***

## Configuration

### Custom base URL

```java theme={null}
// Point at a staging environment or local mock
SpidraClient client = new SpidraClient("your-api-key", "https://staging-api.spidra.io/api");
```

### Environment variable pattern

```java theme={null}
SpidraClient client = new SpidraClient(
    Objects.requireNonNull(System.getenv("SPIDRA_API_KEY"), "SPIDRA_API_KEY env var not set")
);
```

## Building from source

```bash theme={null}
./gradlew build
./gradlew test
./gradlew javadoc
```

<CardGroup cols={2}>
  <Card title="Swift" icon="swift" href="/sdks/swift">
    Official Swift SDK — async/await native, works on iOS, macOS, tvOS, watchOS, and server-side Swift.
  </Card>

  <Card title="Rust" icon="rust" href="/sdks/rust">
    Official Rust SDK — tokio-based async, zero-cost abstractions, returns Result on every call.
  </Card>
</CardGroup>
