当前位置: 首页 > article >正文

PathoDuet: HE 和 IHC 染色病理切片分析的基础模型|文献速递-Transformer架构在医学影像分析中的应用



PathoDuet: Foundation models for pathological slide analysis of H&E and IHC stains

PathoDuet: H&E 和 IHC 染色病理切片分析的基础模型









我们还引入了一个预文本标记机制来统一这两个预训练任务。两项任务都需要一种不同形式的辅助输入,即一个更小的图像块或一个染色提示。与设计单独的网络来处理额外输入相反,我们将包含辅助信息的额外标记输入Vision Transformer(ViT)模型,并通过网络训练过程将跨尺度或跨染色信息与原始表示结合。随后通过一个精心设计的模块,即任务增强器,明确关联这两种形式的输入。该机制以轻量化的方式增强了模型发现和利用任务与染色模式之间内在关联的能力。



Large amounts of digitized histopathological data display a promising future for developing pathologicalfoundation models via self-supervised learning methods. Foundation models pretrained with these methodsserve as a good basis for downstream tasks. However, the gap between natural and histopathological imageshinders the direct application of existing methods. In this work, we present PathoDuet, a series of pretrainedmodels on histopathological images, and a new self-supervised learning framework in histopathology. Theframework is featured by a newly-introduced pretext token and later task raisers to explicitly utilize certainrelations between images, like multiple magnifications and multiple stains. Based on this, two pretext tasks,cross-scale positioning and cross-stain transferring, are designed to pretrain the model on Hematoxylin andEosin (H&E) images and transfer the model to immunohistochemistry (IHC) images, respectively. To validatethe efficacy of our models, we evaluate the performance over a wide variety of downstream tasks, includingpatch-level colorectal cancer subtyping and whole slide image (WSI)-level classification in H&E field, togetherwith expression level prediction of IHC marker, tumor identification and slide-level qualitative analysis in IHCfield. The experimental results show the superiority of our models over most tasks and the efficacy of proposedpretext tasks.




In this section, we first describe the introduction of a pretext tokenand subsequent task raiser module to unify the proposed two pretexttasks. The details of the tasks are discussed in the following subsections, including the real-world inspiration and the imitation with thecontrastive learning framework.




We introduce PathoDuet, a series of foundation models on computational pathology, covering both H&E and IHC images, and proposea new self-supervised learning framework with two pretext tasks inpathology. The key to this framework is the introduction of a pretexttoken and following task raisers. It consists of both a model pretrainingtask, cross-scale positioning, and a model adaptation task, cross-staintransferring. In cross-scale positioning, we bridge the local and globalrepresentations of H&E patches to enhance pathological image understanding in various magnifications. In cross-stain transferring, weutilize adaptive instance normalized H&E features to provide pseudoIHC features injected with structural information. The original H&Emodel is therefore transferred to an interpreter of IHC images. Weevaluate the performance of our models over a wide variety of downstream tasks, and the experimental results show the efficacy of ourmodels on most tasks. Besides, we also investigate the downstreamdata requirements and comparison with giant pathological models,to discover the power of data and delicately designed SSL methodstailored to pathological images. PathoDuet highlights the importanceof training strategy, while the giants, UNI and Virchow, point out theadvantage of preparing sufficient training data. Hence, we will take allefforts to collect more data to iterate and upgrade our models in thefuture.





In Table 1, we evaluate our H&E model using the linearprobing method under different amounts of data. From the result,we can see that our model performs well across various amountsof training data over other pretrained models. Meanwhile, it can beobserved that a generally consistent increasing trend exists with thegrowth of amounts of training data, but the difference is relativelysmall for most models. A further study is conducted in Section 5.2on the training data requirements of foundation models. Notably, thegiant UNI shows a dominant performance when the training data isextremely limited, which demonstrates its general interpretability ofpathological images. In Table 2, we present the evaluation of models’performance under different training strategies using the whole NCTCRC-HE dataset. The results demonstrate that the proposed model is agood interpreter of H&E images under both a quick linear transferringmanner and a thorough full fine-tuning protocol. The performance gaincan be owed to the cross-scale positioning task, which enhances themodel’s understanding under a broader view. To verify the assumption,an ablating study is discussed in Section 5.1. UNI also provides decentperformance, which shows its great understanding in pathology andpowerful ViT-Large architecture.

在表 1 中,我们使用线性探测方法在不同的数据量下评估了我们的 H&E 模型。结果显示,我们的模型在不同训练数据量下相较于其他预训练模型表现良好。同时,可以观察到随着训练数据量的增加,模型的表现呈现出总体一致的增长趋势,但对于大多数模型来说,差异相对较小。在第 5.2 节中对基础模型的训练数据需求进行了进一步研究。值得注意的是,巨大的 UNI 模型在训练数据极其有限时表现出色,展示了其对病理图像的广泛解释能力。

在表 2 中,我们展示了使用完整 NCT CRC-HE 数据集在不同训练策略下的模型表现评估。结果表明,所提出的模型无论是在快速线性迁移方式下还是在完整的全量微调协议下,都是 H&E 图像的良好解释者。性能的提升归因于跨尺度定位任务,该任务增强了模型在更广泛视角下的理解能力。为了验证这一假设,第 5.1 节讨论了消融研究。UNI 也提供了出色的性能,展示了其在病理学中的卓越理解能力和强大的 ViT-Large 架构。



Fig. 1. An overview of PathoDuet. Left: two pretext tasks, cross-scale positioning and cross-stain transferring, are designed to develop H&E and IHC models. Right: a series ofdownstream tasks, covering both H&E and IHC ones, are used to evaluate models’ performance in application.

图 1. PathoDuet 概览。左侧:设计了跨尺度定位和跨染色迁移两个预训练任务,用于开发 H&E 和 IHC 模型。右侧:一系列涵盖 H&E 和 IHC 的下游任务用于评估模型在实际应用中的性能。


Fig. 2. An overall performance visualization. Each task is named as training dataset,(special settings,) evaluating metric. H&E tasks are colored purple, and IHC ones areyellow.

图 2. 整体性能可视化。每个任务以 训练数据集, (特殊设置,) 评估指标 命名。H&E 任务以紫色表示,IHC 任务以黄色表示。


Fig. 3. Detailed networks of two pretext tasks. The flow of pretext token is represented by the black arrows, 𝜖 is the placeholder, and the task raisers (positioner and transferer)are presented in the white blocks. (a) Three-branch cross-scale positioning network. (b) Two-branch cross-stain transferring network and the transferer module.

图 3. 两个预训练任务的详细网络结构。预文本标记的流动由黑色箭头表示,𝜖 为占位符,任务增强模块(定位器和迁移器)在白色块中显示。(a) 三分支跨尺度定位网络。(b) 双分支跨染色迁移网络及迁移模块。


Fig. 4. Data requirement study on NCT-CRC-HE dataset. PathoDuet is compared withImageSup and the performance with the full dataset as an upper bound is representedby the dotted line with the same color.

图 4. NCT-CRC-HE 数据集上的数据需求研究。将 PathoDuet 与 ImageSup 进行比较,并以相同颜色的虚线表示完整数据集的性能上限。


Fig. 5. Data requirement study on different datasets using PathoDuet

图 5. 使用 PathoDuet 在不同数据集上的数据需求研究。



Table 1Linear evaluation results on NCT-CRC-HE dataset with different amounts of training data. The best performance in each column is bold, andthe second best is underlined.

表 1 不同训练数据量下NCT-CRC-HE数据集的线性评估结果。每列中的最佳性能以粗体显示,次优性能以下划线标出。


Table 2Results on NCT-CRC-HE dataset for 2 different strategies: linear evaluation, fullfine-tuning. The best performance in each column is bold, and the second best isunderlined.

表 2 NCT-CRC-HE 数据集上两种不同策略的结果:线性评估和全量微调。每列中的最佳性能以粗体显示,次优性能以下划线标出。


Table 3Results of weakly-supervised WSI classification on three public datasets. The bestperformance in each column is bold, and the second best is underlined.

表 3三个公开数据集上弱监督WSI分类的结果。每列中的最佳性能以粗体显示,次优性能以下划线标出。


Table 4Results of PD-L1 expression level assessment. The best performance in each column isbold, and the second best is underlined.

表 4PD-L1 表达水平评估的结果。每列中的最佳性能以粗体显示,次优性能以下划线标出。


Table 5Results of patch-level tumor identification in IHC images. The best performance in each column is bold, and the second bestis underlined.

表 5IHC 图像中切片级别肿瘤识别的结果。每列中的最佳性能以粗体显示,次优性能以下划线标出。


Table 6Results of slide-level prediction of CD5. The best performance in each column is bold,and the second best is underlined.

表 6CD5 切片级别预测的结果。每列中的最佳性能以粗体显示,次优性能以下划线标出。


Table 7Results of slide-level prediction of CD10. The best performance in each column is bold,and the second best is underlined.

表 7CD10 切片级别预测的结果。每列中的最佳性能以粗体显示,次优性能以下划线标出


Table 8Results of slide-level prediction of CD21. The best performance in each column is bold,and the second best is underlined.

表 8CD21 切片级别预测的结果。每列中的最佳性能以粗体显示,次优性能以下划线标出。


Table 9Ablation study: performance on WSI classification.

表 9消融研究:WSI 分类任务的性能表现。


Table 10Ablation study: performance on H&E patch classification.

表 10消融研究:H&E 切片分类任务的性能表现。


Table 11Ablation study: performance on PD-L1 expression level assessment.

表 11消融研究:PD-L1 表达水平评估的性能表现。


Table 12Comparative study on NCT-CRC-HE and NCT-CRC-HE-NONORM dataset. SwinT* means a hybrid model of CNN and Swin Transformer. Thebest performance in each column is bold, and the second best is underlined.

表 12NCT-CRC-HE 和 NCT-CRC-HE-NONORM 数据集的比较研究。SwinT 表示 CNN 和 Swin Transformer 的混合模型。每列中的最佳性能以粗体显示,次优性能以下划线标出。



  • PHP 环境搭建教程
  • Gin渲染
  • 变电站缺陷数据集8307张,带xml标注和txt标注,可以直接用于yolo训练
  • 基于深度学习的零售柜商品识别系统实战思路
  • 阅信云CTO向永清:35岁不应该成为技术职业发展的瓶颈|OceanBase 《DB大咖说》
  • Elasticsearch知识点整理
  • 【计算机毕业设计】医院电子病历
  • 线程池的执行流程
  • Java中的语法糖:让编程更简洁的特性
  • neo4j安装为服务+配置环境变量
  • linux之mysql安装
  • pip清华源地址
  • Vue 自定义指令实战
  • Vue 常见的几种通信方式(总结)
  • ShouldSniffAttr解说
  • Linux: debug:dump_stack 实例
  • 极狐GitLab 重要安全版本:17.3.3, 17.2.7, 17.1.8, 17.0.8, 16.11.10
  • C#使用HttpWebRequest下载文件
  • Java通信协议——UDP通信协议,模拟聊天室(完整详解,附有代码)
  • android含有EditText的键盘弹出后界面的正确处理