Citation
Gao, Mingyu & Wen, Canhong (2022). Subset selection in network-linked data. Journal of Statistical Computation and Simulation. pp. 1-22Abstract
As a tool for producing meaningful and interpretable results, subset or variable selection has been well studied in modern statistics. However, most of the existing methods focus on the independent data and cannot directly extend to the network-linked data where samples are connected with each other. To this end, we propose a subset selection method in the linear regression model by incorporating the network information into the intercept term, which can achieve automatic subset selection and have good network structural interpretability simultaneously. Based on this, we develop an efficient algorithm to recover the true subset, as well as determine subgroups. Simulation studies demonstrate that the proposal outperforms the state-of-art methods in estimation and selection accuracy. We also apply the proposed method on data from the national longitudinal study of adolescent health and show the superiority of selecting variables alone a network by a smaller model size and more accurate prediction.URL
https://doi.org/10.1080/00949655.2022.2029444Keyword(s)
High-dimensional dataReference Type
Journal ArticleJournal Title
Journal of Statistical Computation and SimulationAuthor(s)
Gao, MingyuWen, Canhong