Gemini is a family of Large Multimodal Models (LMMs) developed by Google Deepmind focused specifically on multimodality, working with visual, audio, and text formats.
Gemini is the name of both the underlying LMM, as well as a consumer chatbot interface, formally named Bard, that uses the Gemini models.
Gemini comes in three sizes: Ultra, Pro, and Nano. Ultra is the largest and “most capable” model, while Pro is the middle option, with Nano being the smallest size, capable of on-device inference.
Gemini Pro is most known for its incredible input window of 1,000,000 tokens (400,000-750,000 words).
Learn more about Gemini here.
Gemini's three models are benchmarked on various tasks. Through most tests, Gemini 1.0 Ultra and Gemini 1.5 Pro both exhibit competency through math problems, science diagrams, video captioning, etc.