> ## Documentation Index
> Fetch the complete documentation index at: https://docs.edgeimpulse.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Split dataset

> Performs a deterministic, in-place split of the project's dataset into "training", "testing", and optional "validation" sets. Split balancing can use the label, one or more metadata keys, or both as a composite grouping signal. Related samples can also be kept together across splits by metadata key. This is a deterministic process based on the hash of the name of the data. Returns immediately on small datasets, or starts a job on larger datasets.
For example:
    { "trainingSplitRatio": 0.8, "testingSplitRatio": 0.1, "validationSplitRatio": 0.1, "excludeDisabledSamples": false, "stratifyBy": { "label": true, "metadataKeys": ["site", "scanner"] }, "keepTogetherMetadataKeys": ["capture_group"] }
    With these options, label/site/scanner are used to balance the split, while samples sharing the same capture_group value stay in the same split bucket.




## OpenAPI

````yaml /.assets/openapi.yaml post /api/{projectId}/split
openapi: 3.0.0
info:
  title: Edge Impulse API
  version: 1.0.0
servers:
  - url: https://studio.edgeimpulse.com/v1
security:
  - ApiKeyAuthentication: []
  - JWTAuthentication: []
  - JWTHttpHeaderAuthentication: []
  - OAuth2: []
paths:
  /api/{projectId}/split:
    post:
      tags:
        - Raw data
      summary: Split dataset
      description: >
        Performs a deterministic, in-place split of the project's dataset into
        "training", "testing", and optional "validation" sets. Split balancing
        can use the label, one or more metadata keys, or both as a composite
        grouping signal. Related samples can also be kept together across splits
        by metadata key. This is a deterministic process based on the hash of
        the name of the data. Returns immediately on small datasets, or starts a
        job on larger datasets.

        For example:
            { "trainingSplitRatio": 0.8, "testingSplitRatio": 0.1, "validationSplitRatio": 0.1, "excludeDisabledSamples": false, "stratifyBy": { "label": true, "metadataKeys": ["site", "scanner"] }, "keepTogetherMetadataKeys": ["capture_group"] }
            With these options, label/site/scanner are used to balance the split, while samples sharing the same capture_group value stay in the same split bucket.
      operationId: splitDataset
      parameters:
        - $ref: '#/components/parameters/ProjectIdParameter'
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/DatasetSplitOptions'
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                anyOf:
                  - $ref: '#/components/schemas/GenericApiResponse'
                  - $ref: '#/components/schemas/StartJobResponse'
components:
  parameters:
    ProjectIdParameter:
      name: projectId
      in: path
      required: true
      description: Project ID
      schema:
        type: integer
  schemas:
    DatasetSplitOptions:
      type: object
      required:
        - trainingSplitRatio
        - testingSplitRatio
      properties:
        trainingSplitRatio:
          type: number
          description: Proportion of the dataset to use for training.
        testingSplitRatio:
          type: number
          description: Proportion of the dataset to use for testing.
        validationSplitRatio:
          type: number
          description: >-
            Proportion of the dataset to use for validation. This is
            experimental and may change in the future.
        excludeDisabledSamples:
          type: boolean
          description: Whether to exclude samples that are marked as disabled.
          default: false
        stratifyBy:
          type: object
          description: Optional balancing targets for the split.
          properties:
            label:
              type: boolean
              description: Whether to stratify by label.
            metadataKeys:
              type: array
              items:
                type: string
              description: >-
                Metadata keys to use as balancing targets. If more than one is
                selected, they are combined into composite assignment buckets.
        keepTogetherMetadataKeys:
          type: array
          items:
            type: string
          description: >
            List of metadata keys whose matching values must stay together in a
            single split. This is useful for leakage prevention across train,
            validation, and test.
    GenericApiResponse:
      type: object
      required:
        - success
      properties:
        success:
          type: boolean
          description: Whether the operation succeeded
        error:
          type: string
          description: Optional error description (set if 'success' was false)
    StartJobResponse:
      allOf:
        - $ref: '#/components/schemas/GenericApiResponse'
        - type: object
          required:
            - id
          properties:
            id:
              type: integer
              description: Job identifier. Status updates will include this identifier.
              example: 12873488112
  securitySchemes:
    ApiKeyAuthentication:
      type: apiKey
      in: header
      name: x-api-key
    JWTAuthentication:
      type: apiKey
      in: cookie
      name: jwt
    JWTHttpHeaderAuthentication:
      type: apiKey
      in: header
      name: x-jwt-token
    OAuth2:
      type: oauth2
      flows:
        authorizationCode:
          authorizationUrl: /v1/oauth/authorize
          tokenUrl: /v1/oauth/token
          scopes:
            openid: Access to basic profile information
            email: Access to email address
            profile: Access to full profile information
        implicit:
          authorizationUrl: /v1/oauth/authorize
          scopes:
            openid: Access to basic profile information
            email: Access to email address
            profile: Access to full profile information
        password:
          tokenUrl: /v1/oauth/token
          scopes:
            openid: Access to basic profile information
            email: Access to email address
            profile: Access to full profile information
        clientCredentials:
          tokenUrl: /v1/oauth/token
          scopes:
            openid: Access to basic profile information
            email: Access to email address
            profile: Access to full profile information

````