Authors: TieYun Qian Bing Liu Qing Li Jianfeng Si
Publish Date: 2015/01/21
Volume: 30, Issue: 1, Pages: 200-213
Abstract
Authorship attribution also known as authorship classification is the problem of identifying the authors reviewers of a set of documents reviews The common approach is to build a classifier using supervised learning This approach has several issues which hurts its applicability First supervised learning needs a large set of documents from each author to serve as the training data This can be difficult in practice For example in the online review domain most reviewers authors only write a few reviews which are not enough to serve as the training data Second the learned classifier cannot be applied to authors whose documents have not been used in training In this article we propose a novel solution to deal with the two problems The core idea is that instead of learning in the original document space we transform it to a similarity space In the similarity space the learning is able to naturally tackle the issues Our experiment results based on online reviews and reviewers show that the proposed method outperforms the stateoftheart supervised and unsupervised baseline methods significantly
Keywords: