How to split YOLO datasets

Before you train a computer vision model, you should split your data into a train, test, and validation dataset. This is essential for ensuring that the datasets you use to evaluate your model -- test and validation -- are separate from the data on which your model is trained.

You can split

YOLO

datasets in a few lines of code. In this guide, we will show how to split your datasets with the supervision Python package.

We will:

1. Install supervision
2. Load data into a supervision Detections() object
3. Split data using the Detections().split() method

Without further ado, let's get started!

Step #1: Install supervision

First, install the supervision pip package:

pip install supervision

Once you have installed supervision, you are ready to load your data and start writing logic to filter detections.

Step #2: Load and Split Data

First, we are going to load our dataset into a supervision.DetectionDataset() object. This object will contain information about all the images in a dataset. You can load datasets from many different model types, from YOLO to MMDetection. For this guide, we will use the

YOLO

data loader.

We will then split the data into a train and test set using the .split() method.

You can load data using the following code:
‍


import supervision as sv

ds = sv.DetectionDataset.from_yolo(
    images_directory_path=f"dataset/train/images",
    annotations_directory_path=f"dataset/train/labels",
    data_yaml_path=f"dataset/data.yaml"
)

train_ds, test_ds = ds.split(split_ratio=0.7,
                             random_state=42, shuffle=True)
len(train_ds), len(test_ds)
# (700, 300)

You can split your test set into 80/20 to generate a validation set, too.

Next steps

supervision provides an extensive range of functionalities for working with computer vision models. With supervision, you can:

1. Process and filter detections and segmentation masks from a range of popular models (YOLOv5, Ultralytics YOLOv8, MMDetection, and more).
2. Process and filter classifications.
3. Compute confusion matrices.

And more! To learn about the full range of functionality in supervision, check out the supervision documentation.