Identity Recognition of Complex Entity across Heterogeneous Sources

Wei Yu1,2, Shijun Li1,*, Weiyi Meng2,Sha Yang1,Kai Wang1,2, Lin Gan1,2
1Computer School, Wuhan University, Wuhan 430072, China
2Department of Computer Science, Binghamton University of SUNY, Binghamton 13902, USA

ABSTRACT:As part of the Web big data, information about the same entity may distribute widely in multiple hetero-geneous sources, which challenges traditional entity recognition and clustering methods. Aiming to ad-dress the characteristics of inconsistency and irrelevance of data across heterogeneous sources, in this pa-per, we propose a joint iterative method for entity recognition based on object similarity measurement and characteristic relevance analysis. In this work, we first construct a model of non-linear similarity measure-ment and propose a method of optimizing multidimensional weight parameters for measuring the similarity between objects; then we establish an iterative model to optimize object relevance, expand training set and analyze characteristic relativity. We also propose a method to estimate the weights and parameters con-cerning unknown characteristic data (they do not appear in training data) for ultimately achieving joint identity recognition on data across heterogeneous sources. We experiment on both homogeneous and het-erogeneous datasets and compare with three state-of-the-art methods. The results validate better accuracy and adaptability of our method.

(1)homogeneous datasets (mobile application) : download
(2) heterogeneous datasets (e-commerce) : jd yhd suning


版权信息:武汉大学web数据挖掘实验室© 13986190968(李教授)