Skip to content

CAMMA-public/camma_dataset_overlaps

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The surgical data science community is rapidly growing, with researchers increasingly combining different surgical datasets to develop more powerful models. This repository aims to provide a comprehensive analysis of video overlaps across different splits of the Cholec80, CholecT50, and Endoscapes datasets. The goal is to support the community in making informed decisions when selecting dataset splits, thereby helping to prevent evaluation bias and contamination.

This repository provides:

  • A summary of overlapping videos across different splits of the Cholec80, CholecT50, and Endoscapes datasets and recommendations on how to appropriately combine these datasets to ensure fair and reliable evaluation.
  • Recommendation on using the M2CAI challenge dataset for surgical workflow analysis.

To perform the overlap analysis yourself, please refer to the script:

python overlap_analysis.py

The following table summarizes the video overlaps among the Cholec80, CholecT50, and Endoscapes datasets:

Summary

Dataset A Dataset B Overlap Count Video IDs in Dataset B Video IDs in Dataset A (if applicable)
Cholec80-train CholecT50-train 16 [1, 2, 4, 5, 13, 15, 18, 22, 23, 25, 26, 27, 31, 35, 36, 40] -
Cholec80-train CholecT50-val 3 [8, 12, 29] -
Cholec80-train CholecT50-test 4 [6, 10, 14, 32] -
Cholec80-val CholecT50-train 3 [43, 47, 48] -
Cholec80-val CholecT50-val 0 - -
Cholec80-val CholecT50-test 1 [42] -
Cholec80-test CholecT50-train 12 [49, 52, 56, 57, 60, 62, 65, 66, 68, 70, 75, 79] -
Cholec80-test CholecT50-val 2 [50, 78] -
Cholec80-test CholecT50-test 4 [51, 73, 74, 80] -
Endoscapes-train Cholec80-train 0 - -
Endoscapes-train Cholec80-val 0 - -
Endoscapes-train Cholec80-test 5 [67, 68, 70, 71, 72] [9606, 9624, 9674, 9680, 9762]
Endoscapes-val Cholec80-train 0 - -
Endoscapes-val Cholec80-val 0 - -
Endoscapes-val Cholec80-test 1 [66] [9559]
Endoscapes-test Cholec80-train 0 - -
Endoscapes-test Cholec80-val 0 - -
Endoscapes-test Cholec80-test 0 - -
Endoscapes-train CholecT50-train 4 [68, 70, 96, 110] [9624, 9674, 10981, 11488]
Endoscapes-train CholecT50-val 0 - -
Endoscapes-train CholecT50-test 0 - -
Endoscapes-val CholecT50-train 2 [66, 103] [9559, 11132]
Endoscapes-val CholecT50-val 0 - -
Endoscapes-val CholecT50-test 0 - -
Endoscapes-test CholecT50-train 0 - -
Endoscapes-test CholecT50-val 0 - -
Endoscapes-test CholecT50-test 0 - -

Recommandations

1. Dataset Combination Strategy: Cholec80, CholecT50, and Endoscape

To create a combined dataset from Cholec80, CholecT50, and Endoscapes while maintaining test set integrity, we recommend preserving the complete Endoscapes and CholecT50 datasets and selectively adjusting Cholec80 splits to prevent contamination as Endoscapes and CholecT50 represent substantial investments in annotation effort and clinical expertise.

Recommended Cholec80 Adjustments

Training Set Modifications

Remove 4 videos overlapping with CholecT50 test set:

  • Videos: 6, 10, 14, 32
  • Result: 36 videos remain in Cholec80 training set

Validation Set Modifications

Remove 1 video overlapping with CholecT50 test set:

  • Video: 42
  • Result: 7 videos remain in Cholec80 validation set

Test Set Modifications

Remove 17 videos to prevent multiple contamination sources:

Overlap with CholecT50 validation set (2 videos):

  • Videos: 50, 78

Overlap with CholecT50 training and Endoscapes training sets (15 videos):

  • Videos: 49, 52, 56, 57, 60, 62, 65, 66, 67, 68, 70, 71, 72, 75, 79

Result: 15 videos remain in Cholec80 test set


Final Dataset Composition

Dataset Component Training Videos Validation Videos Test Videos
Endoscapes 120 41 40
CholecT50 35 5 10
Cholec80 (Adjusted) 36 7 15
Total Combined 191 53 65

The dataset combination strategy described above has been implemented in the following recent publication:

📄 Soham Walimbe, Britty Baby, Vinkle Srivastav, and Nicolas Padoy (2025)

"Adaptation of Multi-modal Representation Models for Multi-task Surgical Computer Vision"
International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2025

| 📖 arXiv | https://arxiv.org/pdf/2507.05020 |

| 💻 code | https://github.com/CAMMA-public/MML-SurgAdapt |

2. M2CAI

We have observed that the surgical data science community often relies on the M2CAI challenge dataset for evaluating surgical workflow recognition approaches. The M2CAI dataset consists of:

  • Strasbourg center videos (with phase and tool annotations)
  • TUM center videos (with phase annotations only)

It is important to emphasize that the Cholec80 dataset was released to extend and replace the Strasbourg videos of the M2CAI dataset. Therefore:

  • When using Cholec80, only the M2CAI-Munich subset should be included. For example as done in the FedCy paper.
  • The M2CAI-Strasbourg subset is best excluded, as it may overlap with the Cholec80 test set.

Citation

@article{twinanda2016endonet,
  title={Endonet: a deep architecture for recognition tasks on laparoscopic videos},
  author={Twinanda, Andru P and Shehata, Sherif and Mutter, Didier and Marescaux, Jacques and De Mathelin, Michel and Padoy, Nicolas},
  journal={IEEE transactions on medical imaging},
  volume={36},
  number={1},
  pages={86--97},
  year={2016},
  publisher={IEEE}
}
@article{nwoye2022rendezvous,
  title={Rendezvous: Attention mechanisms for the recognition of surgical action triplets in endoscopic videos},
  author={Nwoye, Chinedu Innocent and Yu, Tong and Gonzalez, Cristians and Seeliger, Barbara and Mascagni, Pietro and Mutter, Didier and Marescaux, Jacques and Padoy, Nicolas},
  journal={Medical Image Analysis},
  volume={78},
  pages={102433},
  year={2022},
  publisher={Elsevier}
}

@article{murali2023endoscapes,
  title={The endoscapes dataset for surgical scene segmentation, object detection, and critical view of safety assessment: Official splits and benchmark},
  author={Murali, Aditya and Alapatt, Deepak and Mascagni, Pietro and Vardazaryan, Armine and Garcia, Alain and Okamoto, Nariaki and Costamagna, Guido and Mutter, Didier and Marescaux, Jacques and Dallemagne, Bernard and others},
  journal={arXiv preprint arXiv:2312.12429},
  year={2023}
}

@article{murali2023latent,
  author={Murali, Aditya and Alapatt, Deepak and Mascagni, Pietro and Vardazaryan, Armine and Garcia, Alain and Okamoto, Nariaki and Mutter, Didier and Padoy, Nicolas},
  journal={IEEE Transactions on Medical Imaging},
  title={Latent Graph Representations for Critical View of Safety Assessment}, 
  year={2023},
  volume={},
  number={},
  pages={1-1},
  doi={10.1109/TMI.2023.3333034}
}

License

This code, models, and datasets are available for non-commercial scientific research purposes as defined in the CC BY-NC-SA 4.0. By downloading and using this code you agree to the terms in the LICENSE.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages