Skip to main content
POST
/
crawl
/
{jobId}
/
extract
Re-Extract from Existing Crawl
curl --request POST \
  --url https://api.spidra.io/api/crawl/{jobId}/extract \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '
{
  "transformInstruction": "Extract only the product price and availability status"
}
'
{
  "status": "queued",
  "jobId": "new-job-uuid",
  "message": "Extraction job queued. Poll /api/crawl/new-job-uuid for results."
}
This does not re-crawl the website. Spidra reads the HTML and markdown already saved from the original crawl.

Prerequisites

The source crawl job must have a completed status before you call this endpoint. Calling /extract on a job that is still running, pending, or failed will return a 400 Bad Request.Poll GET /crawl/{jobId} and wait for "status": "completed" before proceeding.

Request Body

FieldTypeRequiredDescription
transformInstructionstringYesThe extraction prompt to apply to every page from the source crawl. Maximum 5,000 characters.

How It Works

  1. Pass the jobId of a completed crawl job. If you ran the crawl previously, this is the id field shown in your crawl history.
  2. Provide a transformInstruction describing what you want to extract.
  3. Spidra loads the saved content for each page and runs your prompt against it.
  4. A new crawl job is created with the results, which you can poll and download the same way as any other job.

When to Use This

  • You want to extract different fields from pages you already crawled
  • Your first extraction prompt wasn’t quite right and you want to try again
  • You need the same pages in two different formats, like JSON and CSV

Polling Results

The response returns a new jobId. Use the standard crawl endpoints to check progress and get results:
EndpointPurpose
GET /crawl/{jobId}Poll job status
GET /crawl/{jobId}/pagesGet extracted data per page
GET /crawl/{jobId}/downloadDownload results as ZIP
POST /crawl/{jobId}/retry/{pageId}Retry a specific page

Common Errors

StatusError messageCause
400Source crawl job has not completed successfullyYou called /extract before the source job finished. Wait for status: "completed".
400Missing required field: transformInstructionThe request body is missing the transformInstruction field.
400transformInstruction must be 5000 characters or fewerYour prompt exceeds the 5,000 character limit.
403You have exceeded your monthly credit limit.Not enough credits remaining. Check your usage at GET /usage.
404Source crawl job not foundThe jobId does not exist or does not belong to your account.

Authorizations

x-api-key
string
header
required

Path Parameters

jobId
string
required

The ID of the completed source crawl job to extract from

Body

application/json
transformInstruction
string
required

Extraction prompt to apply to all pages from the source crawl. Maximum 5,000 characters.

Maximum string length: 5000

Response

Extraction job queued

status
string
Example:

"queued"

jobId
string

New job ID to poll for results

message
string