Tissues and Organs
[Submitted on 29 Aug 2025]
Evaluating Attention-Based Learning of Patient Diagnosis Representations with Present On Admission Status for In-Hospital Mortality and Prolonged Length of Stay Prediction
Abstract: Predicting in-hospital outcomes such as mortality and prolonged length of stay using administrative hospital discharge records is crucial for risk stratification and resource management, requiring effective methods to leverage complex clinical information like diagnosis codes and their Present On Admission (POA) status. We developed a novel deep learning approach utilizing a Transformer encoder to learn contextualized patient representations from their set of diagnosis codes, where each diagnosis input token explicitly encodes both the diagnosis identity (truncated ICD-10-CM) and its associated POA status, including a distinct category for missing POA information. This learned patient embedding was then concatenated with other admission-time features including demographics, admission type, and an engineered count of diagnoses present on admission. Using data from the 2018 Texas Hospital Inpatient Discharge Public Use Data File, we trained and evaluated Logistic Regression and Gradient Boosting models on these combined features for predicting in-hospital mortality and prolonged length of stay, comparing performance against baseline models using only non-diagnostic features or simpler, explicit diagnosis encodings. While the attention-based encoder learned representations that captured some predictive signal in a proxy task, final prediction models incorporating these embeddings did not outperform baseline models, particularly those utilizing a simpler encoding of top diagnosis codes alongside other features, for either outcome. The number of diagnoses present on admission was consistently identified as a highly influential predictor across models. These findings suggest that while complex deep learning methods can learn representations from diagnosis-POA sequences, their effectiveness is highly dependent on sufficient training data (limited in this study by data subsampling for the Transformer) and careful integration with other relevant clinical features; simpler feature engineering approaches can provide strong performance baselines. \
| Subjects: | q-bio.TO; cs.LG |
| Cite as: | PX:2508.00048 |