Authors: Wentao Fan Hassen Sallay Nizar Bouguila Sami Bourouis
Publish Date: 2014/12/16
Volume: 20, Issue: 3, Pages: 979-990
Abstract
Data clustering is a fundamental unsupervised learning task in several domains such as data mining computer vision information retrieval and pattern recognition In this paper we propose and analyze a new clustering approach based on both hierarchical Dirichlet processes and the generalized Dirichlet distribution which leads to an interesting statistical framework for data analysis and modelling Our approach can be viewed as a hierarchical extension of the infinite generalized Dirichlet mixture model previously proposed in Bouguila and Ziou IEEE Trans Neural Netw 211107–122 2010 The proposed clustering approach tackles the problem of modelling grouped data where observations are organized into groups that we allow to remain statistically linked by sharing mixture components The resulting clustering model is learned using a principled variational Bayes inferencebased algorithm that we have developed Extensive experiments and simulations based on two challenging applications namely images categorization and web service intrusion detection demonstrate our model usefulness and meritsThe second author would like to thank King Abdulaziz City for Science and Technology KACST Kingdom of Saudi Arabia for their funding support under grant number 11INF178708 The authors would like to thank the anonymous referees and the associate editor for their comments
Keywords: