Optimizing Large Language Models (LLMs) for ICU Readmission Prediction
Leveraging structured and unstructured healthcare data to train personalized machine learning models —PubMedBERT, and ensemble approaches — for accurate and equitable ICU Readmission predictions across underrepresented populations.
Serves as a foundational step toward fair and personalized medical AI. Code file, Presentation Slides
Audience
Over 45 industry and academic leaders, across;
Bristol Myers Squibb Pharmaceutical,
Montclair State University &
Rutgers University
Overview:
This project applied state-of-the-art natural language processing and machine learning methods to predict ICU readmissions. My primary goal was to personalize care for underrepresented groups—especially Black, Hispanic, and White patients—by fine-tuning clinical language models on real-world EHR data.
Technical Stack:
Embeddings: PubMedBERT
Models: Logistic Regression, LightGBM, Multilayer Perceptron
Pipeline: Data preprocessing → Demographic segmentation → Fine-tuned embeddings → Model training and evaluation
Metrics: Accuracy, AUC, Δ performance vs baseline
Key Contributions:
Fine-tuned LLM embeddings by race/ethnicity to detect clinical nuances often overlooked in generic models.
MLP consistently outperformed other models when paired with fine-tuned embeddings, revealing deeper pattern recognition from subgroup-specific signals.
Created a robust evaluation framework comparing baseline models to equity-optimized alternatives across race demographics.
Results:
🧠 Insight: Fine-tuning embeddings for Black patients significantly improved performance—especially for neural architectures like MLPs—revealing the benefits of tailored representation learning.
⚠️ Insight: Hispanic-focused fine-tuning reduced model performance in most cases, suggesting possible data scarcity or noisy representation—a key area for further investigation.
✅ Insight: White-specific embeddings enhanced AUC across all models, showing stable improvements with well-represented groups.