非表示:
キーワード:
-
要旨:
Maximum entropy (MaxEnt) framework has been studied extensively in supervised
learning. Here, the goal is to find a distribution p that maximizes an entropy function
while enforcing data constraints so that the expected values of some (pre-defined) features
with respect to p match their empirical counterparts approximately. Using different
entropy measures, different model spaces for p and different approximation criteria
for the data constraints yields a family of discriminative supervised learning methods
(e.g., logistic regression, conditional random fields, least squares and boosting). This
framework is known as the generalized maximum entropy framework.
Semi-supervised learning (SSL) has emerged in the last decade as a promising field
that combines unlabeled data along with labeled data so as to increase the accuracy and
robustness of inference algorithms. However, most SSL algorithms to date have had
trade-offs, e.g., in terms of scalability or applicability to multi-categorical data. We
extend the generalized MaxEnt framework to develop a family of novel SSL algorithms.
Extensive empirical evaluation on benchmark data sets that are widely used in
the literature demonstrates the validity and competitiveness of the proposed algorithms.