News: Machine learning algorithms using EHR data could help lung cancer patients, study shows

CDI Strategies - Volume 15, Issue 29

How can effective documentation, coding, and electronic health records help save lives? A new study published in the Journal of the American Medical Association (JAMA) found that machine learning algorithms could be used with EHR data to effectively estimate overall survival rates for patients with non-small cell lung cancer.

For the study, patients with lung cancer were identified among 76,643 patients with a minimum of one lunch cancer diagnostic code in their EHR over a 30-year period. These patients were identified using a “semisupervised machine learning algorithm, for which clinical information was extracted from structured and unstructured data via natural language processing tools.” Researchers compared the found data for completeness and accuracy to the Boston Lung Cancer Study.

Among the 76,643 patients in the study, 42,069 were identified as having lung cancer with a positive predictive value of 94.4%. The study cohort was made up of 35,375 patients after removing those with a history of lung cancer and less than 14 days of follow-up after the initial diagnosis.

“Our primary goal was to build a large and reliable lung cancer EHR cohort that could be used for studying lung cancer progression with a set of generalizable approaches. To this end, we combined structured data and unstructured data to identify patients with lung cancer and extract clinical variables. We evaluated the completeness and accuracy of the extracted data,” the authors wrote.

The median age at diagnosis was 66.7 years old. The curves of the prognostic model for overall survival with non-small cell lung cancer were:

  • 0.828 (95% CI, 0.815-0.842) for 1-year prediction
  • 0.825 (95% CI, 0.812-0.836) for 2-year prediction
  • 0.814 (95% CI, 0.800-0.826) for 3-year prediction
  • 0.814 (95% CI, 0.799-0.828) for 4-year prediction
  • 0.812 (95% CI, 0.798-0.825) for 5-year prediction

“These findings suggest the feasibility of assembling a large-scale EHR-based lung cancer cohort with detailed longitudinal clinical measurements and that EHR data may be applied in cancer progression with a set of generalizable approaches,” authors said.

Editor’s note: The JAMA published study can be found here.