Extract

The Extract feature is the core functionality of Stract Data, allowing you to extract structured data from HTML documents using advanced AI models. This feature eliminates the need for manual HTML parsing and fragile selectors, providing a reliable and scalable solution for data extraction.

Features

  • Extract structured data from HTML documents
  • Process and normalize the extracted data
  • Scale your extraction needs automatically
  • AI-powered understanding of content context and structure

API Endpoint

POST /extract

Extract structured data from an HTML document based on your specified output schema.

Endpoint: https://api.stractdata.com/v1/extract

Headers:

  • Content-Type: multipart/form-data
  • api-key: API_KEY

Parameters:

  • file: The HTML file to process (multipart/form-data)
  • outputSchema: JSON schema defining the structure of the output data

Example Request

Given the following JSON schema:

{
  "type": "object",
  "properties": {
    "title": {
      "type": "string"
    },
    "price": {
      "type": "number" 
    },
    "description": {
      "type": "string"
    },
    "features": {
      "type": "array",
      "items": {
        "type": "string"
      }
    }
  },
  "required": [
    "title",
    "price", 
    "description",
    "features"
  ]
}

You can use the following request to extract the structured data from the HTML document called input.html:

curl -X POST https://api.stractdata.com/v1/extract \
  -H 'Content-Type: multipart/form-data' \
  -H 'api-key: API_KEY' \
  -F file=@/input.html \
  -F 'outputSchema={"type":"object","properties":{"title":{"type":"string"},"price":{"type":"number"},"description":{"type":"string"},"features":{"type":"array","items":{"type":"string"}}},"required":["title","price","description","features"]}'

Then the response will be:

{
  "result": {
    "title": "Example Product",
    "price": 99.99,
    "description": "A great product description",
    "features": ["Feature 1", "Feature 2", "Feature 3"]
  },
  "metadata": {
    "tokens": {
      "input": 150,
      "output": 75,
      "total": 225
    },
    "timeSpent": 1234
  }
}

Billing and Usage Tracking

Stract Data tracks your usage based on tokens processed during extraction:

  • Input Tokens: Number of tokens in the input HTML content
  • Output Tokens: Number of tokens in the extracted structured data
  • Total Tokens: Sum of input and output tokens

Only the output tokens are charged.

Usage History

You can track your extraction history in the dashboard extract section, which includes:

  • Date of extraction
  • Number of tokens used
  • Time spent processing

This information helps you monitor your usage and optimize your extraction processes.