Use the widget below to experiment with OneFormer. You can detect COCO classes such as people, vehicles, animals, household items.
OneFormer is a new segmentation model that earned 5x state-of-the-art badges from Papers with Code. It beat former SOTA solutions — MaskFormer and Mask2Former, and it is now ranked number one in the instance, semantic and panoptic segmentation. OneFormer is based on transformers and built using Detectron2.
OneFormer is the first multi-task image segmentation framework. This means the model only needs to be trained once with universal architecture and a single dataset. Previously, even if the model scored high in all three segmentation tasks, it needed to be trained individually on the semantic, instance, or panoptic datasets.
OneFormer introduces a task-conditional joint training strategy. The model uniformly samples training examples from different ground truth domains. As a result, the model architecture is task-guided for training and task-dynamic for inference, all with a single model.
OneFormer
is licensed under a
MIT
license.
Method | Backbone | Crop Size | PQ | AP | mIoU (s.s) |
mIoU (ms+flip) |
#params | config | Checkpoint |
---|---|---|---|---|---|---|---|---|---|
OneFormer | Swin-L† | 640×640 | 48.6 | 35.9 | 57.0 | 57.7 | 219M | config | model |
OneFormer | Swin-L† | 896×896 | 50.2 | 37.6 | 57.4 | 58.3 | 219M | config | model |
OneFormer | ConvNeXt-L† | 640×640 | 48.7 | 36.2 | 56.6 | 57.4 | 220M | config | model |
OneFormer | DiNAT-L† | 640×640 | 49.1 | 36.0 | 57.8 | 58.4 | 223M | config | model |
OneFormer | DiNAT-L† | 896×896 | 50.0 | 36.8 | 58.1 | 58.6 | 223M | config | model |
OneFormer | ConvNeXt-XL† | 640×640 | 48.9 | 36.3 | 57.4 | 58.8 | 372M | config | model |
Method | Backbone | PQ | AP | mIoU (s.s) |
mIoU (ms+flip) |
#params | config | Checkpoint |
---|---|---|---|---|---|---|---|---|
OneFormer | Swin-L† | 67.2 | 45.6 | 83.0 | 84.4 | 219M | config | model |
OneFormer | ConvNeXt-L† | 68.5 | 46.5 | 83.0 | 84.0 | 220M | config | model |
OneFormer | DiNAT-L† | 67.6 | 45.6 | 83.1 | 84.0 | 223M | config | model |
OneFormer | ConvNeXt-XL† | 68.4 | 46.7 | 83.6 | 84.6 | 372M | config | model |
Method | Backbone | PQ | PQTh | PQSt | AP | mIoU | #params | config | Checkpoint |
---|---|---|---|---|---|---|---|---|---|
OneFormer | Swin-L† | 57.9 | 64.4 | 48.0 | 49.0 | 67.4 | 219M | config | model |
OneFormer | DiNAT-L† | 58.0 | 64.3 | 48.4 | 49.2 | 68.1 | 223M | config | model |
You can use Roboflow Inference to deploy a
OneFormer
API on your hardware. You can deploy the model on CPU (i.e. Raspberry Pi, AI PCs) and GPU devices (i.e. NVIDIA Jetson, NVIDIA T4).
Below are instructions on how to deploy your own model API.