Add vessel graph normalization step to emit node/edge parquet tables with Dask-distributed processing#168
Conversation
Agent-Logs-Url: https://github.com/khanlab/SPIMquant/sessions/024cd4d7-fd0e-4734-a3e2-af63ea407349 Co-authored-by: akhanf <11492701+akhanf@users.noreply.github.com>
Agent-Logs-Url: https://github.com/khanlab/SPIMquant/sessions/024cd4d7-fd0e-4734-a3e2-af63ea407349 Co-authored-by: akhanf <11492701+akhanf@users.noreply.github.com>
|
@copilot: peak memory consumption for this script is excessive (greater than 128GB) can a more memory efficient alternative be implemented? Maybe do nodes then edges |
Agent-Logs-Url: https://github.com/khanlab/SPIMquant/sessions/6a3c1a5f-afe2-48f2-bb15-a0283215514e Co-authored-by: akhanf <11492701+akhanf@users.noreply.github.com>
Agent-Logs-Url: https://github.com/khanlab/SPIMquant/sessions/6a3c1a5f-afe2-48f2-bb15-a0283215514e Co-authored-by: akhanf <11492701+akhanf@users.noreply.github.com>
Implemented a low-memory two-pass conversion in |
|
@copilot: Still using up over 128GB of memory. Implement a solution using dask distributed to avoid memory load |
Agent-Logs-Url: https://github.com/khanlab/SPIMquant/sessions/3f0f16fe-2737-4b54-9a36-f4a66307d4ea Co-authored-by: akhanf <11492701+akhanf@users.noreply.github.com>
Agent-Logs-Url: https://github.com/khanlab/SPIMquant/sessions/3f0f16fe-2737-4b54-9a36-f4a66307d4ea Co-authored-by: akhanf <11492701+akhanf@users.noreply.github.com>
Implemented in |
|
Works now, running with 128 cores and 256G memory |
Adds vessel graph normalization that materializes explicit
nodesandedgesparquet tables from the skeleton edge-list parquet, and updates the conversion path to use Dask-distributed processing to reduce memory pressure on large inputs.Changes Made
...+skeleton_nodes.parquet...+skeleton_edges.parquetnodes:node_id, channel, vox_x, vox_y, vox_z, x, y, z, radiusedges:edge_id, channel, src_node_id, dst_node_id, edge_length, src_vox_*, dst_vox_*Validation