Depth sensing technology of existing RGB-D sensors (e.g. Kinect), is now capable of capturing reliable 3D information of our world in real-time. So far, this availability of Depth along with RGB Information has led several researchers to prove the usefulness of this type of multimodality on several computer vision tasks: Object recognition, categorization, detection and pose estimation.
This talk will focus on the problem of object categorization, where the goal is to predict the uncategorized of a never-before-seen object instance. Our recent work has shown how an efficient non-parametric classifier: Naive Bayes Nearest Neighbor can compete with sophisticated learning-based approaches. In that work, local image descriptors and local 3D surface descriptors were used to exploit respectively the RGB and Depth channel. Experimental results on a large-scale object dataset (51 classes) will be discussed, regarding the performance of each descriptor-type. This talk will also address issues such as imbalanced training-sets, feature combination, scalability, segmentation and how the proposed approach can be easily extended to real live-data.