YOLOS looks at patches of an image to to form "patch tokens", which are used in place of the traditional wordpiece tokens in NLP. There are 100 detection tokens on the right are learnable embeddings and feed into potential detections.
Compared to other CNN-based YOLO models, YOLOS benefits from the rising tides of transformers in computer vision, as well as inferring without the need for non max supression (NMS), a tedious post-processing step that makes the deployment of other YOLO models difficult and slow.
Its design allows it to generalize well across different datasets and tasks, making it an impressive choice for research and experimental applications.
Train YOLOS on your own dataset here.
YOLOS
is licensed under a
MIT
license.
Model | Pre-train Epochs | ViT (DeiT) Weight / Log | Fine-tune Epochs | Eval Size | YOLOS Checkpoint / Log | AP @ COCO val |
---|---|---|---|---|---|---|
YOLOS-Ti |
300 | FB | 300 | 512 | Baidu Drive, Google Drive / Log | 28.7 |
YOLOS-S |
200 | Baidu Drive, Google Drive / Log | 150 | 800 | Baidu Drive, Google Drive / Log | 36.1 |
YOLOS-S |
300 | FB | 150 | 800 | Baidu Drive, Google Drive / Log | 36.1 |
YOLOS-S (dWr) |
300 | Baidu Drive, Google Drive / Log | 150 | 800 | Baidu Drive, Google Drive / Log | 37.6 |
YOLOS-B |
1000 | FB | 150 | 800 | Baidu Drive, Google Drive / Log | 42.0 |
You can use Roboflow Inference to deploy a
YOLOS
API on your hardware. You can deploy the model on CPU (i.e. Raspberry Pi, AI PCs) and GPU devices (i.e. NVIDIA Jetson, NVIDIA T4).
Below are instructions on how to deploy your own model API.