Optimizing Large Language Models (LLMs) for ICU Readmission Prediction

Leveraging structured and unstructured healthcare data to train personalized machine learning models —PubMedBERT, and ensemble approaches — for accurate and equitable ICU Readmission predictions across underrepresented populations.

Serves as a foundational step toward fair and personalized medical AI. Code file, Presentation Slides

Audience
Over 45 industry and academic leaders, across;

Bristol Myers Squibb Pharmaceutical,
Montclair State University &
Rutgers University

Overview:
This project applied state-of-the-art natural language processing and machine learning methods to predict ICU readmissions. My primary goal was to personalize care for underrepresented groups—especially Black, Hispanic, and White patients—by fine-tuning clinical language models on real-world EHR data.

Technical Stack:

Embeddings: PubMedBERT
Models: Logistic Regression, LightGBM, Multilayer Perceptron
Pipeline: Data preprocessing → Demographic segmentation → Fine-tuned embeddings → Model training and evaluation
Metrics: Accuracy, AUC, Δ performance vs baseline

Key Contributions:

Fine-tuned LLM embeddings by race/ethnicity to detect clinical nuances often overlooked in generic models.
MLP consistently outperformed other models when paired with fine-tuned embeddings, revealing deeper pattern recognition from subgroup-specific signals.
Created a robust evaluation framework comparing baseline models to equity-optimized alternatives across race demographics.

Results:

🧠 Insight: Fine-tuning embeddings for Black patients significantly improved performance—especially for neural architectures like MLPs—revealing the benefits of tailored representation learning.
⚠️ Insight: Hispanic-focused fine-tuning reduced model performance in most cases, suggesting possible data scarcity or noisy representation—a key area for further investigation.
✅ Insight: White-specific embeddings enhanced AUC across all models, showing stable improvements with well-represented groups.

PubMedBERT excelled at understanding complex medical language. It demonstrated superior accuracy in predicting outcomes and personalizing recommendations, particularly for underrepresented demographic groups.

Takeaway: Employing advanced machine learning techniques in healthcare predictive modeling provides richer insights, enabling more accurate and equitable outcomes. These approaches enhance the model’s ability to identify critical patterns and relationships in complex medical data.

Industry Importance: Applying advanced LLM techniques holds transformative potential in healthcare. It can drive equitable treatment recommendations, improve decision-making for diverse populations, and support the development of personalized, data-driven healthcare solutions, making them indispensable tools for advancing healthcare equity and innovation.

Optimizing Large Language Models (LLMs) for ICU Readmission Prediction

Technical Stack:

Embeddings: PubMedBERT

Models: Logistic Regression, LightGBM, Multilayer Perceptron

Pipeline: Data preprocessing → Demographic segmentation → Fine-tuned embeddings → Model training and evaluation

Metrics: Accuracy, AUC, Δ performance vs baseline

Key Contributions:

Fine-tuned LLM embeddings by race/ethnicity to detect clinical nuances often overlooked in generic models.

MLP consistently outperformed other models when paired with fine-tuned embeddings, revealing deeper pattern recognition from subgroup-specific signals.

Created a robust evaluation framework comparing baseline models to equity-optimized alternatives across race demographics.

Results:

🧠 Insight: Fine-tuning embeddings for Black patients significantly improved performance—especially for neural architectures like MLPs—revealing the benefits of tailored representation learning.

⚠️ Insight: Hispanic-focused fine-tuning reduced model performance in most cases, suggesting possible data scarcity or noisy representation—a key area for further investigation.

✅ Insight: White-specific embeddings enhanced AUC across all models, showing stable improvements with well-represented groups.

PubMedBERT excelled at understanding complex medical language. It demonstrated superior accuracy in predicting outcomes and personalizing recommendations, particularly for underrepresented demographic groups.

Takeaway: Employing advanced machine learning techniques in healthcare predictive modeling provides richer insights, enabling more accurate and equitable outcomes. These approaches enhance the model’s ability to identify critical patterns and relationships in complex medical data.

Ernest