Identity Recognition of Complex Entity across Heterogeneous Sources

Wei Yu1,2, Shijun Li1,*, Weiyi Meng2,Sha Yang1,Kai Wang1,2, Lin Gan1,2
1Computer School, Wuhan University, Wuhan 430072, China
2Department of Computer Science, Binghamton University of SUNY, Binghamton 13902, USA


ABSTRACT:As part of the Web big data, information about the same entity may distribute widely in multiple hetero-geneous sources, which challenges traditional entity recognition and clustering methods. Aiming to ad-dress the characteristics of inconsistency and irrelevance of data across heterogeneous sources, in this pa-per, we propose a joint iterative method for entity recognition based on object similarity measurement and characteristic relevance analysis. In this work, we first construct a model of non-linear similarity measure-ment and propose a method of optimizing multidimensional weight parameters for measuring the similarity between objects; then we establish an iterative model to optimize object relevance, expand training set and analyze characteristic relativity. We also propose a method to estimate the weights and parameters con-cerning unknown characteristic data (they do not appear in training data) for ultimately achieving joint identity recognition on data across heterogeneous sources. We experiment on both homogeneous and het-erogeneous datasets and compare with three state-of-the-art methods. The results validate better accuracy and adaptability of our method.

Dataset:
(1)homogeneous datasets (mobile application) : download
(2) heterogeneous datasets (e-commerce) : jd yhd suning

说明:数据集来自于互联网,仅用于教育和学习研究,不得用于商业用途.由于下载者的扩散引发的法律纠纷本实验室概不负责.
如需要更多的相关资料,或相关数据权属争议问题,请联系weiyu@binghamton.edu

版权信息:武汉大学web数据挖掘实验室© 13986190968(李教授)shjli@whu.edu.cn