ROBIN Competition

Datasets, Ground Truths and Metrics
For the evaluation of
object recognition and image categorisation


Dataset #1 : Multi-class object detection with view point changes

This dataset will be provided to evaluate performances of object detection algorithms with view point changes. Two kinds of images will be provided : images from a static camera and images from a moving vehicle.
Objects of interest will belong to several categories of civil vehicles like tourism cars (4 different vehicles), motorbikes, 4 wheeled vehicles (2 different), peoples (walking, standing, running, being at a window, etc) and various kind of obstacles or buildings.

Images size will be 768x576 pixels ; 2 different cameras will be used : a color camera and an IR camera (8-10 micron) ; images will be taken from short video clips (2 seconds) ; ground truth for every images of these sequences will be provided.

Three different datasets with their ground truths will be available for training (1000 images), validation (4000 images) and testing (10000 images).

This dataset will be produced by Bertin Technologies and Cyberntix.

Dataset #2 : Generic Object classification in satellite images

This dataset will contain panchromatic SPOT 5 satellite images (1 pixel = 2.5 m). Each image covers a 60x60 km area representing different regions.

The dataset will contain 10000 regions of interest (128x128 pixels) belonging to one of these 10 categories :

1. Roundabouts
2. Crossroads
3. Highways and trunk roads
4. Secondary roads
5. Minor roads
6. tracks and ways
7. Insulated building
8. Suburban Area, house gathering
9. Bridges (PT: Ponts)
10. Railways (VF : Voies Ferrées)
"Background" ROI will also be given.

Ground truth will consist in the name of the category, a binary mask, the corresponding multi-spectral ROI (at a lower resolution).

The global dataset will be separated in 3 sets for learning, validation and testing.

This dataset will be produced by CNES.

Dataset #3 : Multi-class object detection in Aerial images

This dataset will contain aerial images obtained with a MWIR Matis camera (384x256 pixels). Images will be organized in short video clips (10 seconds) at a rate of 50 frames/second.

The training dataset will be made of ROIs of the objects. Two kinds of objects will be used : vehicles (six categories) and infrastructure elements (8 categories). Each object will be represented by sets of images taken from different view points and different lightening conditions (1400 images).

Validation and test dataset will also be produced (5000 still images and 1000 short video clips).

This dataset will be produced by SAGEM.

Dataset #4 : object detection in aerial images

EADS will produce this dataset by "hybriding" high-resolution aerial images with computer synthetized objects. This technique allows to produce highly realistic images at low cost.

20 different objects will be used to generate about 10000 test images, including different view points and lightening conditions.

Learning and validation sets will also be provided with ground truths.

This dataset will be produced by EADS.

For more information about the dataset and the scenarii, please refer to this presentation.

Dataset #5 : Robustness of detection algorithms

This dataset will be made of computer generated images. This kind of images makes easier the production of images with very different sensor models and lightening conditions. Once the scenes are modelled, the generation of images series having different level of difficulties (noise, occlusions, weather conditions, view point changes, etc) can be produced at low cost.

12 different object categories will be used. A learning data set will include up to 15000 samples (1300 per class) representing the different views of the different objects. A validation (2400 images) and a test sets (15000 images) as well as their associated ground truths will also be provided.

A part of these images will be organized in short video sequences.

This dataset will be produced by MBDA.

For more information about the dataset and the scenarii, please refer to this presentation.

Dataset #6 : Image categorisation

The dataset will contain multi-sensor aerial images obtained from a six hours video recorded from helicopter at different altitudes, different contexts (urban, suburban, expressway, rural or water-coast) and conditions (night and day).
The sensors comprise 3 infrared sensors (bolometer, InSb, QWIP) altogether with a high resolution visible fisheye.
Images are organized in more than 20 video clips (1 to 2 minutes).

The datasets will consist of 1500 images and 5000 annotated objects classified in 6 main categories of cars, trucks ,bus and boats and 13 subcategories.
Annotations of images comprise: date, type of environment, altitude, characteristics of sensor, size of image, type, location and subclass of objects included in image, resolution of pixel (between 8 and 30 cm).

This dataset will be produced by THALES.