In January 2021 OpenAI released CLIP (Contrastive Language-Image Pre-Training), a zero-shot classifier that leverages knowledge of the English language to classify images without having to be trained on any specific dataset. It applies the recent advancements in large-scale transformers like GPT-3 to the vision arena.
The results are extremely impressive; we have put together a CLIP tutorial and a CLIP Colab notebook for you to experiment with the model on your own images. We've made slight modifications to make "prompt" engineering easier by extracting it into a configuration file and have automatically generated starter prompts for all of our public datasets. You can use Roboflow to generate this config file to try your own classification or object detection datasets with CLIP.
With Roboflow, you can deploy a computer vision model without having to build your own infrastructure.
models all use the
An example picture from the Hard Hat dataset depicting a man wearing a hard-hat An example picture from the Hard Hat dataset depicting several men wearing hard-hats An example picture from the Hard Hat dataset depicting several people, some wearing hard hats An example picture from the Hard Hat dataset depicting several people