Entity matching

Authors:

Johan Dahlin

Publish date: 2012-03-13

Report number: FOI-R--3265--SE

Pages: 104

Written in: English

Keywords:

Record matching
duplicate entry detection
entity resolution
vertex similarity
ensemble classification
data fusion
information fusion

Download report

Abstract

This report serves as a review and survey of earlier work in the field of entity matching as well as current software implementations in this area. Entity matching uses string matching methods known as field metrics to find similar text strings that could correspond to similar names or addresses. The outputs from these field metrics are often used with different classification methods to determine if the strings (or the entire entry the strings are a part of) are matching or unmatching. These classification methods include both supervised and unsupervised methods originating in statistics and machine learning. This report proposes using other classifiers including vertex similarity and text mining-methods to generate additional evidence that two entities match. Vertex similarity is studied in network analysis and aims to identify nodes sharing a large fraction of common neighbors, indicating that the entities have similar social or communication networks. Text mining-methods are useful in finding similar documents and other written longer texts, indicating that two entities have the same language usage or deal with the same topics. Some small experimental evaluations are offered using citation data from two different sources to test these two methods of finding similar entities. Furthermore, the report proposes methods based on data fusion to combine these classifiers with the traditional field metrics into an ensemble.

Leave feedback

$i18n['title']

Entity matching

Abstract

Follow Us

Newsletter

Contact Us

Entity matching

Abstract

Follow Us

Newsletter

Contact Us

We use cookies