Comparative validity of methods to select appropriate cutoff weight for probabilistic linkage without unique personal identifiers

Ying Zhu, Chih Ying Chen, Yutaka Matsuyama, Yasuo Ohashi, Jessica M. Franklin, Soko Setoguchi Iwata

Publication Date: 04/01/2016

Purpose: Record linkage can enhance data quality of observational database studies. Probabilistic linkage, a method that allows partial match of linkage variables, overcomes disagreements arising from errors and omissions in data entry but also results in false-positive links. The study aimed to assess the validity of probabilistic linkage in the absence of unique personal identifiers (UPI) and the methods of cutoff weight selection. Methods: We linked an implantable cardioverter defibrillator placement registry to Medicare inpatient files of 1year with anonymous nonunique variables and assessed the validity of three methods of cutoff selection against an internally derived gold standard with UPI. Results: Of the 64890 registry records with an expected linkage rate of 55-65%, 55% were linked at cutoffs associated with positive predictive value (PPV) of ≥90%. Histogram inspection suggested an approximate range of optimal cutoffs. The duplicate method made accurate estimates of cutoff and PPV if the method’s assumption was met. With adjusted estimates of the sizes of true matches and searched files, the odds formula method made relatively accurate estimates of cutoff and PPV. Conclusions: Probabilistic linkage without UPI generated valid linkages when an optimal cutoff was chosen. Cutoff selection remains challenging; however, histogram inspection, the duplicate method, and the odds formula method can be used in conjunction when a gold standard is not available.