How to preprocess data for CreateML training

Learn how to preprocess image data before training a computer vision model.

Overview

Before you train a model, you may want to apply various preprocessing steps to your dataset. For example, you may want to resize your images to a specific resolution, or apply tiling.

Adding preprocessing steps ensures your data is consistent before it is used in training.

In this guide, we are going to show how to preprocess data for

CreateML

models using Roboflow.

To generate preprocessing steps for a

CreateML

model, you will:

1. Import data into Roboflow
2. Open the Versions tab
3. Select the preprocessing steps you want to apply
4. Generate your dataset
5. (Optional) Train a model or export your data

Let's get started!

and Image Annotation Resources

Explore these resources to enhance your understanding of CreateML and image annotation techniques.

Import data into Roboflow Annotate

First, create a free Roboflow account. Then, create a new project from the Roboflow dashboard:


Once you have created a project, you will be taken to a page where you can upload your images. Drag-and-drop any images into the box:


You can also drag in annotation files if you want to view or amend annotations in Roboflow Annotate.

When you have uploaded your files, click "Save and Continue".

Your images will be uploaded to Roboflow.

After you have uploaded all of your images, you can label them using Roboflow Annotate.

Create a Dataset Version

Once you have labeled all of your images, you are ready to generate a dataset version. A dataset version is a frozen-in-time snapshot of a dataset. You can use versions to track different changes to your dataset over time. If you train models using Roboflow Train, you can compare performance with different augmentations all from the Versions tab.

To create a new dataset version, click "Versions" in the sidebar at the left side of your project.

This will show a page that lets you apply preprocessing and augmentation steps to your dataset.

Select Preprocessing Steps

Roboflow automatically selects a few preprocessing steps that we recommend for most projects. If you need to, however, you can remove the default steps.

To add more preprocesisng steps to your dataset, click on the "Preprocessing" section of the dataset generation page. Then, click the "Add Preprocessing Step" button. A pop up will appear showing all of the options available. You can add as many augmentations as you would like.

There are several preprocessing steps available, including:

  • Auto-orient
  • Resize images
  • Isolate objects
  • Static crop
  • Dynamic crop
  • Grayscale
  • Auto-adjust contrast
  • Tile

You can also filter your dataset to only include images that are not marked as null in your Roboflow dataset, or include only images that match specific tags that you have added in Roboflow.

Here is an example showing how to apply a Tile augmentation to a dataset:

To add the preprocessing step, click "Apply". The change will appear in the list of augmentations to apply when generating your dataset version:

Once you have added all of the preprocessing and augmentation steps you want to apply, click "Generate" at the bottom of the page to generate your dataset.

You can then use your dataset for training a model on Roboflow. Or, you can export your dataset for use in a custom training process.

Train a Model or Export Data

With all of your data labeled, you are now ready to train a model on Roboflow or export your data elsewhere. To train a model in Roboflow with your data, follow our Roboflow Train guide.

Alternatively, you can export your data into over 30 different formats, depending on the needs for your project.