The annotation format originally created for the Visual Object Challenge (VOC) has become a common interchange format for object detection labels. It's well-specified and can be exported from many labeling tools including CVAT, VoTT, and RectLabel.
Unfortunately, no known models directly consume VOC XML labels. That's where Roboflow comes in; it's a universal computer vision format converter that can convert PASCAL VOC into any other format so your data is ready to train in a jiffy.
Below, learn the structure of Pascal VOC XML.
<annotation>
<folder></folder>
<filename>000001.jpg</filename>
<path>000001.jpg</path>
<source>
<database>roboflow.ai</database>
</source>
<size>
<width>500</width>
<height>375</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>helmet</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<occluded>0</occluded>
<bndbox>
<xmin>179</xmin>
<xmax>231</xmax>
<ymin>85</ymin>
<ymax>144</ymax>
</bndbox>
</object>
<object>
<name>helmet</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<occluded>0</occluded>
<bndbox>
<xmin>112</xmin>
<xmax>135</xmax>
<ymin>145</ymin>
<ymax>175</ymax>
</bndbox>
</object>
</annotation>
With Roboflow supervision, an open source Python package with utilities for completing computer vision tasks, you can merge and split detections in Pascal VOC XML. Read our dedicated guides to learn how to merge and split Pascal VOC XML detections.
import supervision as sv
ds = sv.DetectionDataset.from_pascal_voc(
images_directory_path=f"dataset/train/images",
annotations_directory_path=f"dataset/train/labels"
)
train_ds, test_ds = ds.split(split_ratio=0.7,
random_state=42, shuffle=True)
len(train_ds), len(test_ds)
# (700, 300)
import supervision as sv
ds_1 = sv.DetectionDataset.from_pascal_voc(
images_directory_path=f"dataset1/train/images",
annotations_directory_path=f"dataset1/train/labels"
)
len(ds_1)
# 100
ds_1.classes
# ['dog', 'person']
ds_2 = sv.DetectionDataset.from_pascal_voc(
images_directory_path=f"dataset2/train/images",
annotations_directory_path=f"dataset2/train/labels"
)
len(ds_2)
# 200
ds_2.classes
# ['cat']
ds_merged = sv.DetectionDataset.merge([ds_1, ds_2])
len(ds_merged)
# 300
ds_merged.classes
# ['cat', 'dog', 'person']
Below, see model architectures that require data in the Pascal VOC XML format when training a new model.
On each page below, you can find links to our guides that show how to plot predictions from the model, and complete other common tasks like detecting small objects with the model.
The annotation format originally created for the Visual Object Challenge (VOC) has become a common interchange format for object detection labels. It's well-specified and can be exported from many labeling tools including CVAT, VoTT, and RectLabel.
Unfortunately, no known models directly consume VOC XML labels. That's where Roboflow comes in; it's a universal computer vision format converter that can convert PASCAL VOC into any other format so your data is ready to train in a jiffy.
With Roboflow, you can deploy a computer vision model without having to build your own infrastructure.
Below, we show how to convert data to and from
Pascal VOC XML
. We also list popular models that use the
Pascal VOC XML
data format. Our conversion tools are free to use.
Free data conversion
SOC II Type 2 Compliant
Trusted by 250,000+ developers
Free data conversion
SOC II Type 1 Compliant
Trusted by 250,000+ developers
The
YOLOX
,
models all use the
data format.
<annotation>
<folder></folder>
<filename>000001.jpg</filename>
<path>000001.jpg</path>
<source>
<database>roboflow.ai</database>
</source>
<size>
<width>500</width>
<height>375</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>helmet</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<occluded>0</occluded>
<bndbox>
<xmin>179</xmin>
<xmax>231</xmax>
<ymin>85</ymin>
<ymax>144</ymax>
</bndbox>
</object>
<object>
<name>helmet</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<occluded>0</occluded>
<bndbox>
<xmin>112</xmin>
<xmax>135</xmax>
<ymin>145</ymin>
<ymax>175</ymax>
</bndbox>
</object>
</annotation>