CaltechAUTHORS
  A Caltech Library Service

FREEtree: A Tree-based Approach for High Dimensional Longitudinal Data With Correlated Features

Xu, Yuancheng and Zafirov, Athanasse and Alvarez, R. Michael and Kojis, Dan and Tan, Min and Ramirez, Christina M. (2020) FREEtree: A Tree-based Approach for High Dimensional Longitudinal Data With Correlated Features. . (Unpublished) https://resolver.caltech.edu/CaltechAUTHORS:20210510-140844977

[img] PDF - Submitted Version
See Usage Policy.

1MB

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20210510-140844977

Abstract

This paper proposes FREEtree, a tree-based method for high dimensional longitudinal data with correlated features. Popular machine learning approaches, like Random Forests, commonly used for variable selection do not perform well when there are correlated features and do not account for data observed over time. FREEtree deals with longitudinal data by using a piecewise random effects model. It also exploits the network structure of the features by first clustering them using weighted correlation network analysis, namely WGCNA. It then conducts a screening step within each cluster of features and a selection step among the surviving features, that provides a relatively unbiased way to select features. By using dominant principle components as regression variables at each leaf and the original features as splitting variables at splitting nodes, FREEtree maintains its interpretability and improves its computational efficiency. The simulation results show that FREEtree outperforms other tree-based methods in terms of prediction accuracy, feature selection accuracy, as well as the ability to recover the underlying structure.


Item Type:Report or Paper (Discussion Paper)
Related URLs:
URLURL TypeDescription
http://arxiv.org/abs/2006.09693arXivDiscussion Paper
https://github.com/adzafirov/FREETreeRelated ItemCode
ORCID:
AuthorORCID
Alvarez, R. Michael0000-0002-8113-4451
Ramirez, Christina M.0000-0002-8435-0416
Subject Keywords:longitudinal data, random effects, regression trees, variable selection, machine learning interpretability
Record Number:CaltechAUTHORS:20210510-140844977
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:20210510-140844977
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:109040
Collection:CaltechAUTHORS
Deposited By: Tony Diaz
Deposited On:10 May 2021 21:22
Last Modified:10 May 2021 21:22

Repository Staff Only: item control page