Name	Name	Last commit message	Last commit date
Latest commit History 1 Commit
resources	resources
.gitignore	.gitignore
LICENSE	LICENSE
README.md	README.md
overlap_analysis.py	overlap_analysis.py

The surgical data science community is rapidly growing, with researchers increasingly combining different surgical datasets to develop more powerful models. This repository aims to provide a comprehensive analysis of video overlaps across different splits of the Cholec80, CholecT50, and Endoscapes datasets. The goal is to support the community in making informed decisions when selecting dataset splits, thereby helping to prevent evaluation bias and contamination.

This repository provides:

A summary of overlapping videos across different splits of the Cholec80, CholecT50, and Endoscapes datasets and recommendations on how to appropriately combine these datasets to ensure fair and reliable evaluation.
Recommendation on using the M2CAI challenge dataset for surgical workflow analysis.

To perform the overlap analysis yourself, please refer to the script:

python overlap_analysis.py

The following table summarizes the video overlaps among the Cholec80, CholecT50, and Endoscapes datasets:

Summary

Dataset A	Dataset B	Overlap Count	Video IDs in Dataset B	Video IDs in Dataset A (if applicable)
Cholec80-train	CholecT50-train	16	[1, 2, 4, 5, 13, 15, 18, 22, 23, 25, 26, 27, 31, 35, 36, 40]	-
Cholec80-train	CholecT50-val	3	[8, 12, 29]	-
Cholec80-train	CholecT50-test	4	[6, 10, 14, 32]	-
Cholec80-val	CholecT50-train	3	[43, 47, 48]	-
Cholec80-val	CholecT50-val	0	-	-
Cholec80-val	CholecT50-test	1	[42]	-
Cholec80-test	CholecT50-train	12	[49, 52, 56, 57, 60, 62, 65, 66, 68, 70, 75, 79]	-
Cholec80-test	CholecT50-val	2	[50, 78]	-
Cholec80-test	CholecT50-test	4	[51, 73, 74, 80]	-
Endoscapes-train	Cholec80-train	0	-	-
Endoscapes-train	Cholec80-val	0	-	-
Endoscapes-train	Cholec80-test	5	[67, 68, 70, 71, 72]	[9606, 9624, 9674, 9680, 9762]
Endoscapes-val	Cholec80-train	0	-	-
Endoscapes-val	Cholec80-val	0	-	-
Endoscapes-val	Cholec80-test	1	[66]	[9559]
Endoscapes-test	Cholec80-train	0	-	-
Endoscapes-test	Cholec80-val	0	-	-
Endoscapes-test	Cholec80-test	0	-	-
Endoscapes-train	CholecT50-train	4	[68, 70, 96, 110]	[9624, 9674, 10981, 11488]
Endoscapes-train	CholecT50-val	0	-	-
Endoscapes-train	CholecT50-test	0	-	-
Endoscapes-val	CholecT50-train	2	[66, 103]	[9559, 11132]
Endoscapes-val	CholecT50-val	0	-	-
Endoscapes-val	CholecT50-test	0	-	-
Endoscapes-test	CholecT50-train	0	-	-
Endoscapes-test	CholecT50-val	0	-	-
Endoscapes-test	CholecT50-test	0	-	-

Recommandations

1. Dataset Combination Strategy: Cholec80, CholecT50, and Endoscape

To create a combined dataset from Cholec80, CholecT50, and Endoscapes while maintaining test set integrity, we recommend preserving the complete Endoscapes and CholecT50 datasets and selectively adjusting Cholec80 splits to prevent contamination as Endoscapes and CholecT50 represent substantial investments in annotation effort and clinical expertise.

Recommended Cholec80 Adjustments

Training Set Modifications

Remove 4 videos overlapping with CholecT50 test set:

Videos: 6, 10, 14, 32
Result: 36 videos remain in Cholec80 training set

Validation Set Modifications

Remove 1 video overlapping with CholecT50 test set:

Video: 42
Result: 7 videos remain in Cholec80 validation set

Test Set Modifications

Remove 17 videos to prevent multiple contamination sources:

Overlap with CholecT50 validation set (2 videos):

Videos: 50, 78

Overlap with CholecT50 training and Endoscapes training sets (15 videos):

Videos: 49, 52, 56, 57, 60, 62, 65, 66, 67, 68, 70, 71, 72, 75, 79

Result: 15 videos remain in Cholec80 test set

Final Dataset Composition

Dataset Component	Training Videos	Validation Videos	Test Videos
Endoscapes	120	41	40
CholecT50	35	5	10
Cholec80 (Adjusted)	36	7	15
Total Combined	191	53	65

The dataset combination strategy described above has been implemented in the following recent publication:

📄 Soham Walimbe, Britty Baby, Vinkle Srivastav, and Nicolas Padoy (2025)

"Adaptation of Multi-modal Representation Models for Multi-task Surgical Computer Vision"
International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2025

| 📖 arXiv | https://arxiv.org/pdf/2507.05020 |

| 💻 code | https://github.com/CAMMA-public/MML-SurgAdapt |

2. M2CAI

We have observed that the surgical data science community often relies on the M2CAI challenge dataset for evaluating surgical workflow recognition approaches. The M2CAI dataset consists of:

Strasbourg center videos (with phase and tool annotations)
TUM center videos (with phase annotations only)

It is important to emphasize that the Cholec80 dataset was released to extend and replace the Strasbourg videos of the M2CAI dataset. Therefore:

When using Cholec80, only the M2CAI-Munich subset should be included. For example as done in the FedCy paper.
The M2CAI-Strasbourg subset is best excluded, as it may overlap with the Cholec80 test set.

Citation

@article{twinanda2016endonet,
  title={Endonet: a deep architecture for recognition tasks on laparoscopic videos},
  author={Twinanda, Andru P and Shehata, Sherif and Mutter, Didier and Marescaux, Jacques and De Mathelin, Michel and Padoy, Nicolas},
  journal={IEEE transactions on medical imaging},
  volume={36},
  number={1},
  pages={86--97},
  year={2016},
  publisher={IEEE}
}
@article{nwoye2022rendezvous,
  title={Rendezvous: Attention mechanisms for the recognition of surgical action triplets in endoscopic videos},
  author={Nwoye, Chinedu Innocent and Yu, Tong and Gonzalez, Cristians and Seeliger, Barbara and Mascagni, Pietro and Mutter, Didier and Marescaux, Jacques and Padoy, Nicolas},
  journal={Medical Image Analysis},
  volume={78},
  pages={102433},
  year={2022},
  publisher={Elsevier}
}

@article{murali2023endoscapes,
  title={The endoscapes dataset for surgical scene segmentation, object detection, and critical view of safety assessment: Official splits and benchmark},
  author={Murali, Aditya and Alapatt, Deepak and Mascagni, Pietro and Vardazaryan, Armine and Garcia, Alain and Okamoto, Nariaki and Costamagna, Guido and Mutter, Didier and Marescaux, Jacques and Dallemagne, Bernard and others},
  journal={arXiv preprint arXiv:2312.12429},
  year={2023}
}

@article{murali2023latent,
  author={Murali, Aditya and Alapatt, Deepak and Mascagni, Pietro and Vardazaryan, Armine and Garcia, Alain and Okamoto, Nariaki and Mutter, Didier and Padoy, Nicolas},
  journal={IEEE Transactions on Medical Imaging},
  title={Latent Graph Representations for Critical View of Safety Assessment}, 
  year={2023},
  volume={},
  number={},
  pages={1-1},
  doi={10.1109/TMI.2023.3333034}
}

License

This code, models, and datasets are available for non-commercial scientific research purposes as defined in the CC BY-NC-SA 4.0. By downloading and using this code you agree to the terms in the LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Summary

Recommandations

1. Dataset Combination Strategy: Cholec80, CholecT50, and Endoscape

Recommended Cholec80 Adjustments

Training Set Modifications

Validation Set Modifications

Test Set Modifications

Final Dataset Composition

📄 Soham Walimbe, Britty Baby, Vinkle Srivastav, and Nicolas Padoy (2025)

2. M2CAI

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Summary

Recommandations

1. Dataset Combination Strategy: Cholec80, CholecT50, and Endoscape

Recommended Cholec80 Adjustments

Training Set Modifications

Validation Set Modifications

Test Set Modifications

Final Dataset Composition

📄 Soham Walimbe, Britty Baby, Vinkle Srivastav, and Nicolas Padoy (2025)

2. M2CAI

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages