Validity of claims-based algorithms for selected cancers in Japan: Results from the VALIDATE-J study

Cynthia de Luise, Naonobu Sugiyama, Toshitaka Morishima, Takakazu Higuchi, Kayoko Katayama, Sho Nakamura, Haoqian Chen, Edward Nonnenmacher, Ryota Hase, Sadao Jinno, Mitsuyo Kinjo, Daisuke Suzuki, Yoshiya Tanaka, Soko Setoguchi

Published Year: 05/07/2021

Purpose: Real-world data from large administrative claims databases in Japan have recently become available, but limited evidence exists to support their validity. VALIDATE-J validated claims-based algorithms for selected cancers in Japan. Methods: VALIDATE-J was a multicenter, cross-sectional, retrospective study. Disease-identifying algorithms were used to identify cancers diagnosed between January or March 2012 and December 2016 using claims data from two hospitals in Japan. Positive predictive values (PPVs), specificity, and sensitivity were calculated for prevalent (regardless of baseline cancer-free period) and incident (12-month cancer-free period; with claims and registry periods in the same month) cases, using hospital cancer registry data as gold standard. Results: 22 108 cancers were identified in the hospital claims databases. PPVs (number of registry cases) for prevalent/incident cases were: any malignancy 79.0% (25 934)/73.1% (18 119); colorectal 84.4% (3519)/65.6% (2340); gastric 87.4% (3534)/76.8% (2279); lung 88.1% (2066)/79.9% (1636); breast 86.4% (4959)/59.9% (3185); pancreatic 87.1% (582)/80.4% (508); melanoma 48.7% (46)/42.9% (36); and lymphoma 83.6% (1457)/77.8% (1035). Specificity ranged from 98.3% to 100% (prevalent)/99.5% to 100% (incident); sensitivity ranged from 39.1% to 67.6% (prevalent)/12.5% to 31.4% (incident). PPVs of claims-based algorithms for several cancers in patients ≥66 years of age were slightly higher than those in a US Medicare population. Conclusions: VALIDATE-J demonstrated high specificity and modest-to-moderate sensitivity for claims-based algorithms of most malignancies using Japanese claims data. Use of claims-based algorithms will enable identification of patient populations from claims databases, while avoiding direct patient identification. Further research is needed to confirm the generalizability of our results and applicability to specific subgroups of patient populations.