Extract
The Extract feature is the core functionality of Stract Data, allowing you to extract structured data from HTML documents using advanced AI models. This feature eliminates the need for manual HTML parsing and fragile selectors, providing a reliable and scalable solution for data extraction.
Features
- Extract structured data from HTML documents
- Process and normalize the extracted data
- Scale your extraction needs automatically
- AI-powered understanding of content context and structure
API Endpoint
POST /extract
Extract structured data from an HTML document based on your specified output schema.
Endpoint: https://api.stractdata.com/v1/extract
Headers:
Content-Type: multipart/form-data
api-key: API_KEY
Parameters:
file
: The HTML file to process (multipart/form-data)outputSchema
: JSON schema defining the structure of the output data
Example Request
Given the following JSON schema:
{
"type": "object",
"properties": {
"title": {
"type": "string"
},
"price": {
"type": "number"
},
"description": {
"type": "string"
},
"features": {
"type": "array",
"items": {
"type": "string"
}
}
},
"required": [
"title",
"price",
"description",
"features"
]
}
You can use the following request to extract the structured data from the HTML document called input.html
:
curl -X POST https://api.stractdata.com/v1/extract \
-H 'Content-Type: multipart/form-data' \
-H 'api-key: API_KEY' \
-F file=@/input.html \
-F 'outputSchema={"type":"object","properties":{"title":{"type":"string"},"price":{"type":"number"},"description":{"type":"string"},"features":{"type":"array","items":{"type":"string"}}},"required":["title","price","description","features"]}'
Then the response will be:
{
"result": {
"title": "Example Product",
"price": 99.99,
"description": "A great product description",
"features": ["Feature 1", "Feature 2", "Feature 3"]
},
"metadata": {
"tokens": {
"input": 150,
"output": 75,
"total": 225
},
"timeSpent": 1234
}
}
Billing and Usage Tracking
Stract Data tracks your usage based on tokens processed during extraction:
- Input Tokens: Number of tokens in the input HTML content
- Output Tokens: Number of tokens in the extracted structured data
- Total Tokens: Sum of input and output tokens
Only the output
tokens are charged.
Usage History
You can track your extraction history in the dashboard extract section, which includes:
- Date of extraction
- Number of tokens used
- Time spent processing
This information helps you monitor your usage and optimize your extraction processes.