Objects Actions Places

Authors: Stephen J McKenna Jesse Hoey Emanuele Trucco

Publish Date: 2014/01/23

Volume: 106, Issue: 3, Pages: 235-236

PDF Link

Abstract

Computer vision has evolved dramatically in the last 20 years or so especially in the domains of classification and recognition With the advent of a usable internet algorithms dealing with hundreds or thousands of images had to be transformed or replaced by ones dealing with millions or tens of millions The internet has made crowd sourcing possible reducing dramatically the time and cost needed for some semiautomatic tasks such as annotating large numbers of images However annotation remains a bottleneck for many applications a topic addressed by a paper in this issue using active learning Haines and Xiang 2012 Simultaneously higher computer power has become available at lower and lower prices via hardware solutions GPUs for example and access to lowcost or even free highthroughput computing facilities cloud computing for example There has never been such a convergence of huge opportunities and steep challenges for vision algorithm development It is safe to say that the computer vision community has risen to these challenges and embraced the opportunities The eight papers in this issue offer an intriguing crosssection of the stateoftheart Each of them has been carefully selected after multiple review cycles Between them they deal with important issues in learning and feature representation propose novel methods for recognition of objects actions and places and cast new light on timehonoured algorithms such as the Hough transform Woodford et al 2014In Object and Action Classification with Latent Window Parameters Bilen et al 2014 propose a technique using latent support vector machines to learn suboptimal adaptive spatial divisions for object categorisation and action recognition Several splitting models are considered The technique does not need bounding boxes in the training data set and experiments demonstrate good performance when compared with spatial pyramid matching The goal of efficient object classification is also addressed by Lehmann et al 2014 who propose a branch and rank method Efficiency is achieved by partitioning the search space the method learns a ranking function that compares candidate sets of windows and locates the most promising set to explore first As the method requires only a few classifier runs the authors can use nonlinear kernels to improve performance and robustnessRecognition under occlusion is tackled by Ren et al 2014 in Regressing Local to Global Shape Properties for Online Segmentation and Tracking They propose a method for shape recovery under occlusion using a training set without occlusion A 2D discrete cosine transform is used to estimate occluded lowfrequency shapes from highfrequency harmonics that are not occluded A locally weighted projection regression learns the mapping and has the advantages of being online and incrementalIn Detecting People Looking at each other in Videos MarinJimenez et al 2014 use estimates of head pose to derive gaze volumes and thus determine whether peoples’ eyelines match They use Gaussian process regression models based on histograms of oriented gradients HOG to infer pitch and yaw estimates along with their uncertainty Three measures based on these estimates are compared on a TV human interactions dataset Annotations specifying which shots contain people looking at each other as well as the trained head detector used in their experiments are made available for other researchers to useThe problem of recognizing outdoor places under changing conditions is tackled by Johns and Yang in their paper Generative Methods for LongTerm Place Recognition in Dynamic ScenesJohns and Yang 2014 A certain building may look significantly different after a renovation whilst a green space in summer may look different in winter for example Spatiotemporal properties of each landmark are incrementally learned over time making scene models robust to local changes A new bagofwords filtering approached is used along with a geometric verification schemeWhen building classifiers for recognition obtaining class labels for training and validation is often the most labour intensive step Active learning in which examples are automatically selected for labelling during learning offers one way to ease this bottleneck In Active Rare Class Discovery and Classification using Dirichlet Processes Haines and Xiang 2012 consider the use of active learning with datasets that are highly imbalanced and contain examples of as yet unknown rare classes Their contribution is a criterion for active learning that balances the goals of obtaining good classification and discovering these rare classes This is achieved using a Dirichlet process assumption to enable the probability of class membership for known classes to be estimated as well as the probability of belonging to a new unknown class The probability that an example will be misclassified is computed and used to select the next example for labelling They test their method which is relatively simple to implement on a wide range of machine learning and computer vision data setsWoodford et al 2014 approach one of the evergreens of computer vision the Hough transform Their paper Demisting the Hough Transform for 3D Shape Recognition and Registration proposes some new and interesting extensions leading to a competitive algorithm They achieve linear complexity assuming that the Hough space is sparse in which case only some regions need sampling They also observe that only one vote per feature is actually correct which allows them to minimize the entropy of the Hough spaceFinally Liu et al 2014 take a principled analytical approach to rotationinvariant feature extraction using HOGlike features The main idea is to treat gradient histograms as continuous functions using polar coordinates in 2D or spherical harmonics in 3D and to represent them using a Fourier basis The formulation for 3D volumetric images is of particular importance because alternative approaches to rotation invariance based on pose normalisation or sampling become unattractive in 3D where three angles are needed to specify pose The practical utility of the approach is evidenced by experiments on three applications car detection in aerial images 3D shape retrieval and voxel labelling in plant root images

Objects Actions Places

Authors: Stephen J McKenna Jesse Hoey Emanuele Trucco

Publish Date: 2014/01/23

Volume: 106, Issue: 3, Pages: 235-236

PDF Link

Abstract

Keywords:

References

Other Papers In This Journal:

Search Result: