Skip to content

Bug: 'traccc_seq_example' fails to read official ODD data with 'Unsupported data format' error #1227

@KaranSinghDev

Description

@KaranSinghDev

Greetings Developers,

I've been working on benchmarking the GPU capabilities of the traccc framework and have encountered a reproducible bug where the main example executables fail to process the official ODD dataset. After a thorough investigation, I believe I have isolated the root cause.

Environment Details

  • OS: Windows 11 with WSL2 (Ubuntu 22.04)
  • Compiler: g++ 13 (installed via Conda from conda-forge)
  • CUDA: 12.6 (installed via Conda from conda-forge)
  • traccc Version: Cloned from the main branch on January 8, 2026

The Problem

The high-level executables (traccc_seq_example, traccc_seq_example_cuda, and traccc_throughput_mt) consistently crash with a file format error when run on the ODD dataset downloaded via the official data/traccc_data_get_files.sh script.

Error Message: std::invalid_argument (what(): Unsupported data format) or std::__ios_failure (iostream error)

Steps to Reproduce

1. Build the project from the main branch

A minimal build command that reproduces the issue:

cmake -B build -S . -DTRACCC_BUILD_CUDA=ON -DTRACCC_BUILD_EXAMPLES=ON -DTRACCC_USE_ROOT=OFF
cmake --build build -j4

2. Download the ODD data

./data/traccc_data_get_files.sh

3. Run the sequential CPU example on the downloaded data

./build/bin/traccc_seq_example --input-directory="odd/geant4_10muon_10GeV/" --input-events=1

Expected vs. Actual Behavior

Expected: The program should process one event and exit gracefully.

Actual: The program prints a "duplicate cells" warning and then aborts with the "Unsupported data format" exception.

Root Cause Analysis

I tried a detailed investigation of the source code and data files and found :

  1. The stack trace points to a failure within the io::read_cells function
  2. Source code analysis of examples/run/common/throughput_mt.ipp and examples/run/cpu/seq_example.cpp confirms these executables are hardcoded to call io::read_cells
  3. This function is hardcoded to open files ending in -cells.csv
  4. The CSV header in event files is: geometry_id,measurement_id,channel0,channel1,timestamp,value
  5. Analysis of the C++ parser in io/src/csv/read_cells.cpp shows it is not designed to handle the measurement_id column in that position, causing the parser to fail

Workaround / Additional Information

I got an alternative way to do my work but I am not sure if this is the supposed way-

  1. I ran the crashing traccc_seq_example with an --output-directory flag, which successfully generated all intermediate data files (including spacepoints.csv) before crashing
  2. I then ran traccc_seeding_example_cuda, which is hardcoded to call io::read_spacepoints
  3. This second executable ran to completion successfully, performing the full seeding, track finding (CKF), and fitting on the GPU

Conclusion: The data itself is valid, but the specific C++ parser for cells.csv in the higher-level executables is out of sync with the data format provided by the official download script.

Summary

This detailed report indicates that the issue is a parser incompatibility rather than invalid data. The io::read_cells function requires updating to handle the measurement_id column position in the official ODD dataset format.

I hope this detailed report is helpful for funderstanding the case. Thank you for your work on this great project.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ai slopLow-quality content created by AI

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions