Abstract 219: Use of Machine Learning Models to Identify Atherosclerotic Cardiovascular Disease Patients at Very High Risk for Future Events in a Multi-state Health Care System

Document Type


Publication Date


Publication Title

Circulation: Cardiovascular Quality and Outcomes




Background: In the 2018 AHA/ACC Blood Cholesterol Guideline, it is recommended that ASCVD patients be classified as very high-risk (VHR) vs not-VHR (NVHR) to guide treatment decisions. This has important implications for ezetimibe and PCSK9 inhibitor eligibility. We aimed to develop a tool that could assist in more easily identifying VHR patients based on machine learning (ML) techniques. This approach offers a powerful, assumption-free alternative to conventional methods, such as logistic regression, to identify potential interactions among risk factors while incorporating the hierarchy of interaction among variables.

Method: We used EHR-derived ICD-10 codes to identify patients within our health system with ASCVD. VHR was defined by ≥2 major ASCVD events (ACS ≤12 months, history of MI >12 months, ischemic stroke, or symptomatic PAD) or 1 major ASCVD event and ≥2 high-risk conditions (age ≥65, diabetes, hypertension, smoking, heterozygous familial hypercholesterolemia, CKD, CHF, persistently elevated LDL-C ≥100 mg/dl, or prior CABG/PCI). Patients not meeting these criteria were classified as NVHR. We randomly assigned patients into a training set and a testing set. Classification and regression tree (CART) modeling was performed on the training set and validated on the testing set. The results were compared with a random forest model. Variables in both models included age, sex, race, ethnicity, and each of the VHR criteria above. The primary outcome for both models was VHR classification. Performance of the two models were compared using area under the curve (AUC).

Result: A total of 180,669 ASCVD patients were identified in 2018: 104,123 (58%) were VHR and 76,546 (42%) were NVHR. Mean age and sex were 73.1±11.9 years, 55% male and 70.1±13.4 years, 54% male for the VHR and NVHR groups, respectively. Half the population was randomly selected as the training dataset (n=90,334) and the other half was used as the testing dataset (n=90,335). Both CART and random forest models identified recent ACS, ischemic stroke, hypertension, PAD, and history of MI as the top five predictors of VHR status. Ninety-six percent of patients with recent ACS were classified as VHR. Among patients with no recent ACS, 95% were classified as VHR if they had a stroke and hypertension. Among patients with no ACS or stroke, 89% were classified as VHR if they had PAD. Finally, among patients with no ACS, stroke or PAD, 90% were classified as VHR if they had a history of MI. The misclassification rate of the CART model on the testing set was 4.3%. The AUC for the CART and random forest models was 0.949 and 0.968, respectively.

Conclusion: Both ML methods were highly predictive of VHR status among those with ASCVD. Use of this approach affords a simplified means to drive clinical decision making at the point of care.

Clinical Institute

Cardiovascular (Heart)


Center for Cardiovascular Analytics, Research + Data Science (CARDS)