MULTICLOD

The project MULTICLOD adresses recognition in stereoscopic video of various traffic environments. We study local and global image representations based on cues extracted by combining classification and reconstruction approaches. Our goal is to devise novel multi-class and weakly supervised recognition models capable of contributing to various applications in the fields of smart vehicles and intelligent transportation systems.

Methodology

We study calibrated stereoscopy as an affordable and powerful perceptual modality which shows a clear mass production potential in traffic and transportation applications. We address the fundamental problem of detecting and localizing objects of multiple classes at the same time. We approach this problem by combining cues arising from object appearance and reconstructed structure of the scene. In both of these two cases we leverage the capacity of deep convolutional models trained in the end-to-end fashion.

Overview of the results

Most of our research has been performed in the frame of semantic segmentation where we densely predict the class membership of each image pixel. We have shown that reconstructed structure can be exploited to assemble a scale invariant image representation which is independent of the object size in pixels. In this setup, reconstruction aids recognition by reducing intra-class variability and improving the usage of the training data. However, the resulting memory requirements preclude implementing this idea on large resolutions. We have therefore designed a lightweight architecture which introduces ladder-style lateral connections into a modeified DenseNet classifier. The resulting model achieves 74.3 mIoU on the Cityscapes dataset and enables the forward pass on 2 MPixel images at 7.5 Hz. To summarize, our main research results are as follows (more details can be found here):

ladder DenseNets for semantic segmentation
convolutional scale-invariance based on stereo
correcting the stereoscopic calibration bias
classification on a representation budget
block-sparse regularizers for weakly supervised models

Time frame

Start date: 1st October 2014.

Duration: 36 months

Funding

The project budget is 704 766,91 kn which includes one postdoc salary throughout the project duration.

The project has also been granted one additional PhD position for 2+2 years.

The project has been fully funded by the Croatian science foundation under contract I-2433-2014.

MULTICLOD: Multiclass object detection

Computer vision for smart cars and safer roads

Methodology

Overview of the results

Time frame

Funding