Articles, Abstracts, and Reports

Using Autoencoders for Imputing Missing Data in eGFR Decline Trajectories of Patients with CKD

Davina J. Zamanzadeh
Panayiotis Petousis
Tyler Austin Davis
Andres Olav Garlid
Xiaoyan Wang
Keith C. Norris
Obidiugwu Duru
Katherine Tuttle, Providence St. Joseph HealthFollow
Alex Bui
Susanne B. Nicholas
CURE-CKD Registry Study Team

Document Type

Abstract

Publication Date

10-22-2020

Publication Title

ASN Kidney Week

Abstract

BACKGROUND

Using machine learning (ML) approaches to impute missing data has not been explored in CKD progression. We investigated the utility of a data-driven imputation to improve downstream classifier prediction of rapid eGFR decline in the CURE-CKD registry.

METHODS

We analyzed CKD patients at UCLA (N=13,206) over a 2-year period. We used: 1) the dataset with missing data; and 2) a censored subset with no missing data. We introduced 33% and 66% missingness by removing values by removing values either missing completely at random (MCAR); missing at random (MAR); or missing not at random (MNAR). We included: eGFR, hemoglobin (HbA1c), systolic blood pressure (SBP), number of ambulatory and inpatient visits, age, sex, ethnicity, rurality status, diagnosis of hypertension, diabetes mellitus (DM), pre-DM, and use of renin angiotensin aldosterone system inhibitors. We introduced missingness on SBP and HbA1c to mirror the original dataset. We imputed missing values using an autoencoder ML model. To predict a 40% eGFR decline over 2 years, we developed random forest models using the full and resultant imputed datasets.

RESULTS

On the full subset, the MNAR imputation method achieved a root mean squared error (RMSE) of 0. The MAR method achieved RMSE of 3.8 at 33% missingness and 5.4 at 66%. MCAR achieved RMSE of 38.5 at 33% missingness and 56.4 at 66%. Using the random forest model to predict rapid decline on the fully observed subset without removing and imputing data achieved a receiver operating characteristic (ROC) area under the curve (AUC) mean of 80.8%±1.1 and precision/recall (PR)-AUC mean of 23.9%±1.5; the same as our methodology on MNAR, which is explained by the RMSE of 0, shown in Table 1.

CONCLUSION

Our method accurately imputes clinical data values while accounting for uncertainty caused by missing values.

Clinical Institute

Kidney & Diabetes

Department

Endocrinology

Department

Nephrology

Recommended Citation

Zamanzadeh, Davina J.; Petousis, Panayiotis; Davis, Tyler Austin; Garlid, Andres Olav; Wang, Xiaoyan; Norris, Keith C.; Duru, Obidiugwu; Tuttle, Katherine; Bui, Alex; Nicholas, Susanne B.; and CURE-CKD Registry Study Team, "Using Autoencoders for Imputing Missing Data in eGFR Decline Trajectories of Patients with CKD" (2020). Articles, Abstracts, and Reports. 3954.
https://digitalcommons.providence.org/publications/3954

Link to Full Text

COinS

Articles, Abstracts, and Reports

Using Autoencoders for Imputing Missing Data in eGFR Decline Trajectories of Patients with CKD

Document Type

Publication Date

Publication Title

Abstract

BACKGROUND

METHODS

RESULTS

CONCLUSION

Clinical Institute

Department

Department

Recommended Citation

Browse

Links

Search

PSJH Research

Articles, Abstracts, and Reports

Using Autoencoders for Imputing Missing Data in eGFR Decline Trajectories of Patients with CKD

Authors

Document Type

Publication Date

Publication Title

Abstract

BACKGROUND

METHODS

RESULTS

CONCLUSION

Clinical Institute

Department

Department

Recommended Citation

Share

Browse

Links

Search

PSJH Research