Last Modified: Octuber 21, 2019 jzolfaghari@cvc.uab.es gvillalonga@cvc.uab.es http://synthia-dataset.net SYNTHIA: The SYNTHetic collection of Imagery and Annotations This package contains the SYNTHIA_AL subset. This is a video stream generated at 25 FPS. The number of classes here presented covers those defined in the table below, extending the classes Instance segmentation groundtruth, 2D bounding boxes, 3D bounding boxes and depth information are provided! Please, if you use this data for research purposes, consider citing our ICCVW 2019 paper: @article{bengar2019temporal, title={Temporal Coherence for Active Learning in Videos}, author={Zolfaghari Bengar, Javad and Gonzalez-Garcia, Abel and Villalonga, Gabriel and Raducanu, Bogdan and Aghdam, Hamed H and Mozerov, Mikhail and Lopez, Antonio M and van de Weijer, Joost}, journal={arXiv preprint arXiv:1908.11757}, year={2019} } DESCRIPTION: The package contains the following data, labels_kitti folder containing txt files (one per image). The content of each file is: <2D bounding box in the format of [xmin, ymin, xmax, ymax]> <3D bounding box in the format of [xpos, ypos, zpos, width, height, length, angle]> SemSeg folder containing png files (one per image). Annotations are given in three channels. First channel contains the class of that pixel (see the table below). The other two channels contain the unique ID of the instance for those objects that are dynamic (cars, pedestrians, etc.). RGB folder containing standard 640x480 RGB images modified_labels_kitti folder containing txt files (one per image). Only the annotation for images in which the objects are appeared with more than 50 pixels. The content of each file is: <2D bounding box in the format of [xmin, ymin, xmax, ymax]> <3D bounding box in the format of [xpos, ypos, zpos, width, height, length, angle]> calib_kitti calibration file on KITTI format. P0, P1, P2, P3 corresponds to the same intrinsic camera matrix. information raw data on json format. You can find all the objects and scene information in a python dictionary variable. Depth folder containing 640x480 8-bit images. Depth is encoded in the 3 channels using the following formula: Depth = 5000 * (R + G*256 + B*256*256) / (256*256*256 - 1) Class R G B ID Void 0 0 0 0 Sky 128 128 128 1 Building 128 0 0 2 Road 128 64 128 3 Sidewalk 0 0 192 4 Fence 64 64 128 5 Vegetation 128 128 0 6 Pole 192 192 128 7 Car 64 0 128 8 Traffic Sign 192 128 128 9 Pedestrian 64 64 0 10 Bicycle 0 128 192 11 Lanemarking 0 172 0 12 Reserved - - - 13 Reserved - - - 14 Traffic Light 0 128 128 15