Artificial Intelligence for Early Pancreatic Cancer Risk Prediction and Personalized Surveillance

Overview

We are building a multimodal artificial intelligence system to provide early estimate of pancreatic cancer risk, enable early detection, facilitate cost-effective surveillance, and support timely intervention.

Pancreatic cancer is among the deadliest malignancies, largely because it is typically diagnosed at an advanced stage when curative treatments are no longer possible. In this project, we are building an AI system that integrates multimodal and longitudinal data to predict a patient’s near- and long-term risk of pancreatic cancer, guide personalized surveillance, and inform management strategies.

To enable such a system, we are constructing a comprehensive longitudinal database that combines imaging (CT and MRI), demographics, family and social history, comorbidities, and laboratory results. The AI model will learn to dynamically estimate an indifvidual’s riwk of developing pancreatic cancer by analyzing trajectories of subtle tissue and ductal changes on imaging alongside evolving clinical variables.

A personalized system for early prediction of pancreatic cancer risk stands to support clinical care in several ways: by optimizing the frequency and modality of imaging surveillance, identifying when invasive diagnostic workups are warranted, and guiding surgical decisions in borderline cases. Furthermore, the system we envision has the potential to reveal imaging biomarkers that precede the onset of cancer by years and to offer new insights into the biological mechanisms of malignant transformation.

Keywords

Pancreas
AI
CT

**Figure 1.** We developed and evaluated an LLM-based system for the automatic extraction and risk categorization of pancreatic cystic lesions (PCLs) from radiology reports. The system processes free-text reports to extract clinically relevant PCL features, which are then mapped to risk categories based on established guidelines. We evaluated both proprietary (GPT-4o) and open-source (LLaMA, DeepSeek) models, including fine-tuned variants, across five dimensions: PCL feature extraction accuracy, PCL risk categorization performance, error and hallucination analyses, radiologist agreement assessment, and cost-efficiency. Fine-tuned models achieved feature extraction accuracy comparable to GPT-4o (LLAMA-FT: 97% [95% CI: 97–98%], LLAMA-FT-CoT: 97% [97–98%], DeepSeek-FT-CoT: 98% [97–98%], GPT-CoT: 97% [97–98%]). Risk categorization F1 scores were similarly high (LLAMA-FT: 0.95 [0.91–0.98], LLAMA-FT-CoT: 0.93 [0.89–0.97], DeepSeek-FT-CoT: 0.94 [0.90–0.98], GPT-CoT: 0.97 [0.93–0.99]). The radiologist agreement assessment showed strong agreement between models and expert radiologists (Fleiss’ kappa: radiologists alone = 0.888; radiologists + DeepSeek-FT-CoT = 0.893; radiologists + GPT-CoT = 0.897), indicating LLM-derived outputs are on par with expert interpretation.

**Figure 2. Workflow of the AI system.** Data Collection: CT studies with multiple series were used in this study. Severity labels were assigned based on the Revised Atlanta Classification (RAC). Image Preprocessing: During training, MedSAM, an automated deep learning segmentation model, was used to generate pancreas masks and identify relevant slices. These were combined with randomly sampled non-pancreas slices to create preprocessed input series. Segmentation is not required at inference time. Model Development: A transformer-based backbone (pretrained via self-supervised learning) extracted slice-level features. Slice scores derived from saliency maps were aggregated using attention (ATTN) to generate severity predictions, optimized with binary cross entropy (BCE), Dice, and L1 losses. Model Evaluation: Study-level predictions were generated by using the maximum series-level score. Retrospective Triage Simulation was performed on both internal and external test sets.

Project Team

External Collaborators

Chenchan Huang, MD (Project Lead), NYU Langone Health
Tamas A. Gonda, MD , NYU Langone Health
Ebrahim Rasromani , NYU Center for Data Science
Stella Kang, MD, MS , Columbia University
Peter Yu , NYU Langone Health
Kenneth Csehak, MD , NYU Langone Health
Michael D. Kluger, MD, MPH , NYU Langone Health
Wenqing "Wendy" Cao, MD , NYU Langone Health

Publications

Huang C, Thakore NL, Shen Y, et al. Patient and lesion characteristics associated with follow-up completion for pancreatic cystic lesions detected on MRI. Abdom Radiol (NY). 2026;51(5):2428-2438. doi:10.1007/s00261-025-05230-1
Rasromani E, Kang SK, Xu Y, Liu B, Luhadia G, Chui WF, Pasadyn FL, Hung YC, An JY, Mathieu E, Gu Z. Leveraging fine-tuned large language models for interpretable pancreatic cystic lesion feature extraction and risk categorization. arXiv. Published July 26, 2025. doi:10.48550/arXiv.2507.19973
Xu Y, Teutsch B, Zeng W, Hu Y, Rastogi S, Hu EY, DeGregorio I, Chui WF, Richter BI, Cummings R, Goldberg JE. Development and international validation of a deep learning model for predicting acute pancreatitis severity from CT scans. medRxiv. Published July 7, 2025. doi:10.1101/2025.07.07.25331406
Huang C, Shen Y, Galgano SJ, et al. Advancements in early detection of pancreatic cancer: the role of artificial intelligence and novel imaging techniques. Abdom Radiol (NY). 2025;50(4):1731-1743. doi:10.1007/s00261-024-04644-7

Acknowledgements

We acknowledge support from the following grants: Society of Abdominal Radiology Research Award 2025, RSNA 2024 Research / Education Grant, and Hirshberg Foundation Seed Grant 2024.