Data sources
The data sources page is actually much more than just adding data from external sources. It let you create complete automated data pipelines so you can work on your active learning strategies.
From there, you can import datasets from existing cloud storage buckets, automate and schedule the imports, and, trigger actions such as explore and label your new data, retrain your model, automatically build a new deployment task and more.

Data sources
Click in + Add new data source and select where your data lives:
You can either use:
- AWS S3 buckets
- Google Cloud Storage
- Any S3-compatible bucket
- Don't import data (if you just need to create a pipeline)

Add new data source
Click on Next, provide credentials:

Provide your credentials
Click on Verify credentials:

Automatically label your data
Here, you have several options to automatically label your data:
In the example above, the structure of the folder is the following:
.
├── cars
│ ├── cars.01741.jpg
│ ├── cars.01743.jpg
│ ├── cars.01745.jpg
│ ├── ... (400 items)
├── unknown
│ ├── unknown.test_2547.jpg
│ ├── unknown.test_2548.jpg
│ ├── unknown.test_2549.jpg
│ ├── ... (400 items)
└── unlabeled
├── cars.02066.jpg
├── cars.02067.jpg
├── cars.02068.jpg
└── ... (14 items)
3 directories, 814 files
The labels will be picked from the folder name and will be split between your training and testing set using the following ratio
80/20
.The samples present in an
unlabeled/
folder will be kept unlabeled in Edge Impulse Studio.Alternatively, you can also organize your folder using the following structure to automatically split your dataset between training and testing sets:
.
├── testing
│ ├── cars
│ │ ├── cars.00012.jpg
│ │ ├── cars.00031.jpg
│ │ ├── cars.00035.jpg
│ │ └── ... (~150 items)
│ └── unknown
│ ├── unknown.test_1012.jpg
│ ├── unknown.test_1026.jpg
│ ├── unknown.test_1027.jpg
│ ├── ... (~150 items)
├── training
│ ├── cars
│ │ ├── cars.00006.jpg
│ │ ├── cars.00025.jpg
│ │ ├── cars.00065.jpg
│ │ └── ... (~600 items)
│ └── unknown
│ ├── unknown.test_1002.jpg
│ ├── unknown.test_1005.jpg
│ └── unknown.test_46.jpg
│ └── ... (~600 items)
└── unlabeled
├── cars.02066.jpg
├── cars.02067.jpg
├── cars.02068.jpg
└── ... (14 items)
7 directories, 1512 files
When using this option, only the file name is taken into account. The part before the first
.
will be used to set the label. E.g. cars.01741.jpg
will set the label to cars
.All the data samples will be unlabeled, you will need to label them manually before using them.
Finally, click on Next, post-sync actions.

Trigger actions
From this view, you can automate several actions:
- Recreate data explorerThe data explorer gives you a one-look view of your dataset, letting you quickly label unknown data. If you enable this you'll also get an email with a screenshot of the data explorer whenever there's new data.
- Retrain modelIf needed, will retrain your model with the same impulse. If you enable this you'll also get an email with the new validation and test set accuracy.Note: You will need to have trained your project at least once.
- Create new versionStore all data, configuration, intermediate results and final models.
- Create new deploymentBuilds a new library or binary with your updated model. Requires 'Retrain model' to also be enabled.
Once your pipeline is set, you can run it directly from the UI, from external sources or by scheduling the task.

Run your pipeline
To run your pipeline from Edge Impulse studio, click on the