Optimizing Large Language Models (LLMs) for ICU Readmission Prediction

Leveraging structured and unstructured healthcare data to train personalized machine learning models —PubMedBERT, and ensemble approaches — for accurate and equitable ICU Readmission predictions across underrepresented populations.

Serves as a foundational step toward fair and personalized medical AI. Code file, Presentation Slides

Audience
Over 45 industry and academic leaders, across;

  • Bristol Myers Squibb Pharmaceutical,

  • Montclair State University &

  • Rutgers University

Overview:
This project applied state-of-the-art natural language processing and machine learning methods to predict ICU readmissions. My primary goal was to personalize care for underrepresented groups—especially Black, Hispanic, and White patients—by fine-tuning clinical language models on real-world EHR data.

Technical Stack:

  • Embeddings: PubMedBERT

  • Models: Logistic Regression, LightGBM, Multilayer Perceptron

  • Pipeline: Data preprocessing → Demographic segmentation → Fine-tuned embeddings → Model training and evaluation

  • Metrics: Accuracy, AUC, Δ performance vs baseline

Key Contributions:

  • Fine-tuned LLM embeddings by race/ethnicity to detect clinical nuances often overlooked in generic models.

  • MLP consistently outperformed other models when paired with fine-tuned embeddings, revealing deeper pattern recognition from subgroup-specific signals.

  • Created a robust evaluation framework comparing baseline models to equity-optimized alternatives across race demographics.

Results:

  • 🧠 Insight: Fine-tuning embeddings for Black patients significantly improved performance—especially for neural architectures like MLPs—revealing the benefits of tailored representation learning.

  • ⚠️ Insight: Hispanic-focused fine-tuning reduced model performance in most cases, suggesting possible data scarcity or noisy representation—a key area for further investigation.

  • ✅ Insight: White-specific embeddings enhanced AUC across all models, showing stable improvements with well-represented groups.

PubMedBERT excelled at understanding complex medical language. It demonstrated superior accuracy in predicting outcomes and personalizing recommendations, particularly for underrepresented demographic groups.

Takeaway: Employing advanced machine learning techniques in healthcare predictive modeling provides richer insights, enabling more accurate and equitable outcomes. These approaches enhance the model’s ability to identify critical patterns and relationships in complex medical data.

Industry Importance: Applying advanced LLM techniques holds transformative potential in healthcare. It can drive equitable treatment recommendations, improve decision-making for diverse populations, and support the development of personalized, data-driven healthcare solutions, making them indispensable tools for advancing healthcare equity and innovation.