When you scrape with a prompt, the AI reads the page and returns whatever JSON it decides makes sense. The shape can vary between runs. A field might appear with a different name. A field might be missing if the AI was not confident. If you are saving results to a database or processing them in code, this inconsistency is a problem.Structured output solves this. You add a schema to your request that describes the exact shape you want. The AI must return JSON that matches that shape exactly, with the field names, types, and nesting you defined. If the AI cannot find a value for a field, it writes null instead of skipping the field.Without a schema, asking for job details might give you:
This is the most important thing to understand about how structured data works.Fields listed in required are always in the output. If the AI cannot find a value, it writes null. The field is never missing.Fields not in required may be omitted. If the AI has no evidence for an optional field, it leaves it out of the response entirely rather than guessing.A concrete example. Say your schema has these properties:
If required is ["title", "company"], and the page has no salary or benefits info:
Copy
{ "title": "Engineer", "company": "Acme" }
The optional fields are completely absent.If required is ["title", "company", "salary", "benefits"], and the page still has no salary or benefits info:
The fields are there, just null.Rule of thumb: put a field in required when you need it to always be present in your output, even as null. Leave it out of required when you are fine with it being absent if there is nothing to extract.
The AI will pick the closest matching option from your list. If nothing fits, it uses null.
Be careful with required enum fields. If the field is in required and the AI cannot find clear evidence for any of your enum values, it must still write something. It will either pick the closest match or write null if you included it in the enum. Always include null in your enum when the field might not always appear on the page.
prompt and schema are designed to work together. The schema controls the output shape. The prompt guides how the AI interprets and normalizes the page content before filling in the schema.Use prompt to give the AI instructions about normalization, what to look for, or what to ignore:
Copy
{ "urls": [{ "url": "https://jobs.example.com/engineer" }], "prompt": "Extract the job data. Normalize salary to a plain number in USD (drop symbols and commas). For employment_type, map contract-based and freelance roles to 'contract'. If the page shows salary as a range like '$140k - $180k', split into salary_min and salary_max.", "schema": { "type": "object", "required": ["title", "company", "salary_min", "salary_max", "employment_type"], "properties": { "title": { "type": "string" }, "company": { "type": "string" }, "salary_min": { "type": ["number", "null"] }, "salary_max": { "type": ["number", "null"] }, "employment_type": { "type": ["string", "null"], "enum": ["full_time", "part_time", "contract", null] } } }}
Think of the prompt as instructions to the AI and the schema as the contract for the output. Both are optional on their own, but they are most powerful together.
Spidra accepts standard JSON Schema. You can write that JSON by hand, or you can use a schema validation library in your own code to generate it.Zod (JavaScript / TypeScript)
When your scrape job completes, poll GET /scrape/{jobId} as usual. The structured data appears in result.content as a parsed JSON object, not a string.
If the scrape job includes a schema, the job will fail rather than fall back to raw markdown when AI extraction cannot complete. This is intentional. When you pass a schema, you are expecting a specific shape, and returning unstructured markdown would be silently wrong.
Some JSON Schema keywords are not supported by the AI model. If your schema includes them, Spidra strips them before processing and returns a schema_warnings list in the job status response so you know what was ignored.
Copy
{ "status": "completed", "schema_warnings": [ "Property 'title': keyword 'minLength' is not supported and will be ignored", "Property 'salary': keyword '$ref' is not supported and will be ignored" ], "result": { ... }}
Warnings are non-fatal. The job still runs. But you should remove or replace the flagged keywords to make sure the AI is enforcing what you intended.Supported keywords:type, properties, required, items, enum, nullable, descriptionNot supported:$ref, anyOf, oneOf, allOf, if/then/else, minLength, maxLength, minimum, maximum, pattern, additionalProperties
If your schema has a structural problem, the API returns a 422 error before the job is queued. No credits are used.
Copy
{ "status": "error", "message": "Invalid schema. Fix the errors below and try again.", "errors": [ "Root schema must be type 'object'", "Schema exceeds maximum nesting depth of 5" ]}
Root schema must be type 'object'
Your top-level schema must be { "type": "object", "properties": { ... } }. Passing an array or a plain string type at the root is not allowed.Schema exceeds maximum nesting depth of 5
Your schema has more than 5 levels of nested objects. Flatten the structure or move deeply nested data into a string field that the AI formats itself.Schema exceeds maximum size
The schema JSON is over 10KB. Remove unused fields or descriptions to bring it under the limit.
schema (object)
JSON Schema object describing the output shape. Root must be type: "object". When provided, output is automatically set to "json".prompt (string, optional)
Extraction and normalization instructions. Works alongside schema to guide how the AI reads and maps the page content.output (string)
You do not need to set this when using a schema. It is automatically forced to "json".
Submit a Scrape Job
Full API reference for the POST /scrape endpoint
Browser Actions
Combine structured output with forEach to extract lists of items