Principled distillation of UK Biobank phenotype data reveals underlying structure in human variation

Citation

Carey, Caitlin E.; Shafee, Rebecca; Wedow, Robbee; Elliott, Amanda; Palmer, Duncan S.; Compitello, John; Kanai, Masahiro; Abbott, Liam; Schultz, Patrick; & Karczewski, Konrad J., et al. (2024). Principled distillation of UK Biobank phenotype data reveals underlying structure in human variation. Nature Human Behaviour.

Abstract

Data within biobanks capture broad yet detailed indices of human variation, but biobank-wide insights can be difficult to extract due to complexity and scale. Here, using large-scale factor analysis, we distill hundreds of variables (diagnoses, assessments and survey items) into 35 latent constructs, using data from unrelated individuals with predominantly estimated European genetic ancestry in UK Biobank. These factors recapitulate known disease classifications, disentangle elements of socioeconomic status, highlight the relevance of psychiatric constructs to health and improve measurement of pro-health behaviours. We go on to demonstrate the power of this approach to clarify genetic signal, enhance discovery and identify associations between underlying phenotypic structure and health outcomes. In building a deeper understanding of ways in which constructs such as socioeconomic status, trauma, or physical activity are structured in the dataset, we emphasize the importance of considering the interwoven nature of the human phenome when evaluating public health patterns.

URL

https://doi.org/10.1038/s41562-024-01909-5

Keyword(s)

Data integration

Reference Type

Journal Article

Journal Title

Nature Human Behaviour

Author(s)

Carey, Caitlin E.
Shafee, Rebecca
Wedow, Robbee
Elliott, Amanda
Palmer, Duncan S.
Compitello, John
Kanai, Masahiro
Abbott, Liam
Schultz, Patrick
Karczewski, Konrad J.
Bryant, Samuel C.
Cusick, Caroline M.
Churchhouse, Claire
Howrigan, Daniel P.
King, Daniel
Davey Smith, George
Neale, Benjamin M.
Walters, Raymond K.
Robinson, Elise B.

Year Published

2024

ISSN/ISBN

2397-3374

DOI

10.1038/s41562-024-01909-5

Reference ID

10435