Data Science

Nature Genetics Paper Published by XY Scientists

Repurposing large health insurance claims data to estimate genetic and environmental contributions in 560 phenotypes.

We analysed a large health insurance dataset to assess the genetic and environmental contributions of 560 disease-related phenotypes in 56,396 twin pairs and 724,513 sibling pairs out of 44,859,462 individuals that live in the United States. We estimated the contribution of environmental risk factors (socioeconomic status (SES), air pollution and climate) in each phenotype. Mean heritability (h2 = 0.311) and shared environmental variance (c2 = 0.088) were higher than variance attributed to specific environmental factors such as zip-code-level SES (varSES = 0.002), daily air quality (varAQI = 0.0004), and average temperature (vartemp = 0.001) overall, as well as for individual phenotypes. We found significant heritability and shared environment for a number of comorbidities (h2 = 0.433, c2 = 0.241) and average monthly cost (h2 = 0.290, c2 = 0.302). All results are available using our Claims Analysis of Twin Correlation and Heritability (CaTCH) web application.

Written by Chirag M. Lakhani, Braden T. Tierney, Arjun K. Manrai, Jian Yang, Peter M. Visscher & Chirag J. Patel on January 14, 2019.