What is the Multimodal JSONL Annotation Format?

Overview

A JSONL format for multimodal datasets (i.e. VQA).

To use this format, you need a directory of images that contains JSONL files with the same name as each image (i.e. "image.png" should have a corresponding file called "image.png.jsonl"). You can then drag and drop the image and corresponding JSONL annotation file for use in a multimodal Roboflow project.

There should be one annotation per line in the JSONL file.

Format Description

Below, learn the structure of Multimodal JSONL.


{"image":"Beer-Can-Loading_mp4-21_jpg.rf.53ac14e905aa1f13529df4d1191c590e.jpg","prefix":"What's in this image?","suffix":"Beer cans on production line"}
{"image":"Beer-Can-Loading_mp4-23_jpg.rf.b0ecce950abd8bce4d4ca7b979b40cb2.jpg","prefix":"What's in this image?","suffix":"Beer cans on production line"}

Split and Merge Datasets

With Roboflow supervision, an open source Python package with utilities for completing computer vision tasks, you can merge and split detections in Multimodal JSONL. Read our dedicated guides to learn how to merge and split Multimodal JSONL detections.

Split detections in Multimodal JSONL

No items found.

Supported Models

Below, see model architectures that require data in the Multimodal JSONL format when training a new model.

On each page below, you can find links to our guides that show how to plot predictions from the model, and complete other common tasks like detecting small objects with the model.

No items found.

Multimodal JSONL

Overview

Format Description

Split and Merge Datasets

Split detections in Multimodal JSONL

Supported Models

On this page

Convert Data from Multimodal JSONL

Convert Data to Multimodal JSONL

Join over 1 million developers building with Roboflow