Subset selection in network-linked data

Citation

Gao, Mingyu & Wen, Canhong (2022). Subset selection in network-linked data. Journal of Statistical Computation and Simulation. pp. 1-22

Abstract

As a tool for producing meaningful and interpretable results, subset or variable selection has been well studied in modern statistics. However, most of the existing methods focus on the independent data and cannot directly extend to the network-linked data where samples are connected with each other. To this end, we propose a subset selection method in the linear regression model by incorporating the network information into the intercept term, which can achieve automatic subset selection and have good network structural interpretability simultaneously. Based on this, we develop an efficient algorithm to recover the true subset, as well as determine subgroups. Simulation studies demonstrate that the proposal outperforms the state-of-art methods in estimation and selection accuracy. We also apply the proposed method on data from the national longitudinal study of adolescent health and show the superiority of selecting variables alone a network by a smaller model size and more accurate prediction.

URL

https://doi.org/10.1080/00949655.2022.2029444

Keyword(s)

High-dimensional data

Reference Type

Journal Article

Journal Title

Journal of Statistical Computation and Simulation

Author(s)

Gao, Mingyu
Wen, Canhong

Year Published

2022

Pages

1-22

DOI

10.1080/00949655.2022.2029444

Reference ID

9610