Deploy with Roboflow

SegFormer

SegFormer is a computer vision framework used in semantic segmentation tasks, implemented with transformers.

Try the Model

Use the widget below to experiment with SegFormer. You can detect COCO classes such as people, vehicles, animals, household items.

Overview

With ViT as a backbone showing great promise, various papers began to build on the idea and innovate to address issues of low resolution and high computational cost. And, while performance continued to improve with each new method, these papers seemed to focus solely on the design of the transformer encoder and neglected the decoder. Enter SegFormer. SegFormer sets itself apart with:

a new "positional-encoding-free and hierarchical Transformer encoder"
"a lightweight All-MLP decoder design"

The novel encoder is able operate at arbitrary resolutions without impacting performance. Additionally, the encoder is able to generate both high resolution and low resolution features in contrast to ViT. The decoder design is able to combine both local and global attention to produce high quality representations at low cost.

With these novel improvements, SegFormer sets a new SOTA on ADE20K, Cityscapes, and COCO-Stuff semantic segmentation datasets.

SegFormer License

SegFormer

is licensed under a

NVIDIA Source Code

license.

Performance

Deploy a SegFormer API

You can use Roboflow Inference to deploy a

SegFormer

API on your hardware. You can deploy the model on CPU (i.e. Raspberry Pi, AI PCs) and GPU devices (i.e. NVIDIA Jetson, NVIDIA T4).

Below are instructions on how to deploy your own model API.

Label Data Automatically with SegFormer

You can automatically label a dataset using SegFormer with help from Autodistill, an open source package for training computer vision models. You can label a folder of images automatically with only a few lines of code. Below, see our tutorials that demonstrate how to use SegFormer to train a computer vision model.

No items found.