Can computer vision problems benefit from structur

Authors: Thomas Hoyoux Antonio J RodríguezSánchez Justus H Piater

Publish Date: 2016/05/06

Volume: 27, Issue: 8, Pages: 1299-1312

PDF Link

Abstract

Research in the field of supervised classification has mostly focused on the standard socalled “flat” classification approach where the problem classes live in a trivial onelevel semantic space There is however an increasing interest in the hierarchical classification approach where a performance gain is expected by incorporating prior taxonomic knowledge about the classes into the learning process Intuitively the hierarchical approach should be beneficial in general for the classification of visual content as suggested by the fact that humans seem to organize objects into hierarchies based on visually perceived similarities In this paper we provide an analysis that aims to determine the conditions under which the hierarchical approach can consistently give better performances than the flat approach for the classification of visual content In particular we 1 show how hierarchical methods can fail to outperform flat methods when applied to real visionbased classification problems and 2 investigate the underlying reasons for the lack of improvement by applying the same methods to synthetic datasets in a simulation Our conclusion is that the use of highlevel hierarchical feature representations is crucial for obtaining a performance gain with the hierarchical approach and that poorly chosen prior taxonomies hinder this gain even though proper highlevel features are usedThe research leading to these results has received funding from the European Community’s Seventh Framework Programme FP7/20072013 Specific Programme Cooperation Theme 3 Information and Communication Technologies under Grant Agreements No 270273 Xperience and No 600918 PaCManMost of the theoretical work and applications in the field of supervised classification have been dedicated to the standard classification approach where the problem classes are considered to be equally different from each other in a semantic sense 28 In this standard approach also known as “flat” classification a classifier is learned from classlabeled data instances without any explicit information given about the highlevel semantic relationships between the classes A standard multiclass problem formulation will for example consider a bee an ant and a hammer to be different to the same degree they belong to different classes in a flat sense because the only available semantic information comes from the same unique semantic level However one could consider that ants and bees are part of a superclass of insects while hammers belong to another superclass of tools and it is intuitive that such hierarchical knowledge about the classes can help improve the classification performances Based upon this realization a new approach has emerged for dealing more efficiently with classification of content deemed to be inherently semantically hierarchical ie the hierarchical classification approach 28 The attention given to the hierarchical approach was also sustained by the advances made in machine learning generalized to arbitrary output spaces ie the structured classification approach eg 31 of which the hierarchical approach is actually a special caseThe a priori hierarchical organization of classes has been shown to constitute a key prior to classification problems in several application domains including text categorization 23 protein function prediction 7 and music genre classification 14 As for classification based on visual features a hierarchical prior intuitively seems especially appropriate as it reflects the natural way in which humans organize and recognize the objects they see which is also supported by neurophysiological studies of the visual cortex 2 15 35 In practice some results have shown that there is indeed a gain in performance with the hierarchical approach in the visualbased application domain eg for 3D object shape classification 3 and annotation of medical images 6 A quite active and closely related line of work consists of the supervised construction of class hierarchies from images with multiple tag labels The motivation is to reduce the complexity of visual recognition problems that have a very large number of instances To build useful taxonomies the proposed methods exploit either purely the semantic tag labels 19 29 or purely the visual information 10 18 or both as in 16 where the authors propose a way to learn a “semantivisual” hierarchy that is both semantically meaningful and close to the visual contentIn this paper we are interested in determining the conditions under which the hierarchical approach can consistently give better performances than the flat approach for the classification of visual content This paper is an extended version of the work published in 11 where we applied three hierarchical classification methods and their flat counterparts to two inherently hierarchical visionbased classification problems facial expression recognition and 3D shape classification Using evaluation measures designed for hierarchical classification we showed in 11 that for the considered methods and problems the hierarchical approach provided no or only marginal improvement over the standard approach We here extend our previous work by designing a simulation framework and conducting the comparative evaluation of the hierarchical and flat methods used in 11 this time applied to artificial problems generated with this simulation framework Specifically we generate completely synthetic datasets for which we can control the complexity through the manipulation of key aspects such as the underlying hierarchical phenomenon at the origin of the data measurements the amount of noise in the extraction of the features from the measurements and the amount of knowledge about the underlying hierarchical phenomenon Our goal with these simulation experiments is to draw useful insights to explain why the hierarchical approach did not outperform the flat approach when applied to our real visionbased classification problemsThe remainder of this paper is organized as follows Section 2 describes the hierarchical framework and terminology we adopted for our previous and present work and provides the details of the hierarchical methods used Section 3 shows the experimental evaluation first presented in 11 where the hierarchical and flat methods were applied to real computer vision problems Section 4 presents our simulation framework as well as the experimental results obtained for artificial problems generated with this simulation framework In light of the additional simulation results we provide a discussion in Sect 5 and draw a conclusion in Sect 6Recently a necessary effort to unify the hierarchical classification framework has been made 28 We follow on their terminology which is summarized next A class taxonomy consists of a finite set of semantic concepts mathcal C = c i i = 1 ldots n with a partial order relationship prec organizing these concepts either in a tree or a directed acyclic graph DAG A classification problem defined over such a taxonomy is hierarchical its classes and superclasses correspond to the leaf and interior nodes of the tree or DAG respectively A flat classification problem only considers the leaf nodes of such a taxonomy as its classes and has no superclass A hierarchical classification problem deals with either single or multiplepath labeling ie whether or not a single data instance can be labeled with more than one path and either full or partial depth labeling ie whether or not any path in a label must cover all hierarchy levels In all cases an indicator vector representation for the taxonomic label mathbf y of a data instance can be used ie mathbf y in mathcal Y subset 0 1n where the itext th component of mathbf y takes value 1 if the data instance belongs to the superclass c i in mathcal C and 0 otherwiseThe realworld and simulation problems considered in this work are defined using tree taxonomies with full depth labeling For the facial expression recognition problem we define multiple path labeling see Sect 311 whereas for the 3D shape classification problem and for our simulation problems we define single path labeling see Sects 321 and 42

Can computer vision problems benefit from structur

Authors: Thomas Hoyoux Antonio J RodríguezSánchez Justus H Piater

Publish Date: 2016/05/06

Volume: 27, Issue: 8, Pages: 1299-1312

PDF Link

Abstract

Keywords:

References

Other Papers In This Journal:

Search Result: