Cluster ensemble first generates a large library of different clustering solutions and then combines them into a more accurate consensus clustering. It is commonly accepted that for cluster ensemble to work well the member partitions should be different from each other, and meanwhile the quality of each partition should remain at an acceptable level. Many different strategies have been used to generate different base partitions for cluster ensemble. This talk will focus on the diversity and quality of the partitions generated using different strategies. We evaluate the performance of k-means ensemble, a typical random feature subspace method and a random sampling method. Further, to incorporate prior background knowledge, we discuss the constraint based clustering ensemble selection. We formalize the problem as a combinatorial optimization problem in terms of consistency under the constraints, the diversity among ensemble members, and the overall quality of ensembles.
Fan Yang is an assistant professor at School of Information science and Engineering, Xiamen University, China. He received his B.S. degree (Engineering) in 2003, M.S. degree (Engineering) in 2006, and Ph.D. degree (Engineering) in 2009, from Xiamen University. His research interest is on data mining, machine learning and its application to Bioinformatics.
Last updated on 22 Feb 2015 by Yi Chen - Page created on 13 Feb 2015 by Yi Chen