GPT-4o is OpenAI’s third major iteration of their popular large multimodal model, GPT-4, which expands on the capabilities of GPT-4 with Vision. The newly released model is able to talk, see, and interact with the user in an integrated and seamless way, more so than previous versions when using the ChatGPT interface.
Learn more in our comprehensive overview and evaluation.
GPT-4o’s newest improvements are twice as fast, 50% cheaper, 5x rate limit, 128K context window, and a single multimodal model are exciting advancements for people building AI applications. More and more use cases are suitable to be solved with AI and the multiple inputs allow for a seamless interface.
In comparison to other multimodal models, GPT-4o is by far the strongest in terms of vision. It outperforms Google Gemini and Anthropic Claude in math, charts, documents, and more. However, in order to reach/out perform humans, 4o still has a while to go, with average human performance for evaluations like MMMU being 88%.