Abstract
Object classification in digital images remains one of the most challenging tasks in computer vision. Advances in the last decade have produced methods to repeatably extract and describe
characteristic local features in natural images. In order to apply machine learning techniques
in computer vision systems, a representation based on these features is needed.
A set of local features is the most popular representation and often used in conjunction
with Support Vector Machines for classification problems. In this work, we examine current
approaches based on set representations and identify their shortcomings.
To overcome these shortcomings, we argue for extending the set representation into a
graph representation, encoding more relevant information. Attributes associated with the
edges of the graph encode the geometric relationships between individual features by making
use of the meta data of each feature, such as the position, scale, orientation and shape of the
feature region. At the same time all invariances provided by the original feature extraction
method are retained.
To validate the novel approach, we use a standard subset of the ETH-80 classification
benchmark.