Hi Xavier--
It's me, the 16,000 modules guy again 🙂. Claude debugged this problem and gave me the text I'm copying below. Thanks! --Myles
Environment
- Environment Modules version: 5.6.1 (likely affects earlier versions)
- Tcl version: 8.6.8
- OS: RHEL 8.10 (x86_64)
- Terminal width: 129 columns
- MODULEPATH contents: ~16,250 modulefiles across multiple directories
Description
module avail takes ~45 seconds to complete when MODULEPATH contains ~16,000 modules, while module avail --terse completes in ~2 seconds. The bottleneck is pure CPU time in the column layout optimizer inside displayElementList, not NFS I/O, pager startup, or terminal rendering.
Root Cause
The last_round optimization loop in displayElementList (modulecmd.tcl, around line 4540) exhibits O(N²) behavior. When the optimizer finds that cur_cols columns don't fit, it enters last_round mode and increments cur_rows one at a time from ceil(N/cur_cols) up to rows (the row count from the previous successful column count). Each increment re-scans all N elements.
With our module set:
| Parameter |
Value |
| N (module count) |
16,250 |
| max module name length |
60 chars (+2 suffix = 62) |
| tty_cols |
129 |
Initial cur_cols |
floor(129/62) = 2 |
rows after cur_cols=2 succeeds |
8,125 |
cur_rows when trying cur_cols=3 |
ceil(16250/3) = 5,417 |
last_round increments needed |
8,125 − 5,417 = 2,708 |
| Inner iterations per increment |
~5,417 |
| Total inner loop iterations |
~14.7 million |
At Tcl's interpreted throughput of ~500K iterations/sec for this workload, this takes ~30s of pure user CPU time.
How to Reproduce
# Create a directory with ~16,000 modulefiles with long names
mkdir -p /tmp/modules_perf_test/toolname
for i in $(seq 1 16000); do
printf '#%%Module\n' > "/tmp/modules_perf_test/toolname/some_long_module_name_prefix_${i}"
done
# Slow (~30-45s depending on terminal width and CPU)
MODULEPATH=/tmp/modules_perf_test time modulecmd.tcl bash avail --no-pager
# Fast (~1s) — bypasses the column optimizer entirely
MODULEPATH=/tmp/modules_perf_test time modulecmd.tcl bash avail --terse
Suggested Fix
In the last_round branch, replace the linear incr cur_rows with a direct jump to rows:
if {!$restart_loop} {
if {$last_round} {
- incr cur_rows
+ set cur_rows $rows
} else {
set cur_rows [expr {int(ceil(double($elt_cnt) / $cur_cols))}]
}
Why this works: The purpose of last_round is to determine whether cur_cols columns can fit if given more rows. By jumping cur_rows directly to rows (the previous successful row count — the most generous possible value), the very next overflow triggers the cur_rows == rows bailout and immediately decrements cur_cols. This reduces the last_round phase from O(N) outer iterations to O(1).
Tradeoff: In rare cases the chosen layout may have 1–2 more rows than the theoretical minimum for that column count. This is invisible in practice (e.g., 8,125 rows vs 8,123 rows for 16,000 modules).
Measured improvement:
| Metric |
Before |
After |
| Inner loop iterations (N=16,250) |
>14,700,000 |
~43,000 |
| Wall time |
~45s |
~2s |
| Computational complexity |
O(N²) |
O(N) |
Workarounds
For users who cannot patch modulecmd.tcl:
export MODULES_TERM_WIDTH=1 — forces single-column output, bypasses the optimizer
module avail --terse — uses join instead of column formatting
export MODULES_TERM_WIDTH=61 — any value less than max_name_len forces single-column
Hi Xavier--
It's me, the 16,000 modules guy again 🙂. Claude debugged this problem and gave me the text I'm copying below. Thanks! --Myles
Environment
Description
module availtakes ~45 seconds to complete when MODULEPATH contains ~16,000 modules, whilemodule avail --tersecompletes in ~2 seconds. The bottleneck is pure CPU time in the column layout optimizer insidedisplayElementList, not NFS I/O, pager startup, or terminal rendering.Root Cause
The
last_roundoptimization loop indisplayElementList(modulecmd.tcl, around line 4540) exhibits O(N²) behavior. When the optimizer finds thatcur_colscolumns don't fit, it enterslast_roundmode and incrementscur_rowsone at a time fromceil(N/cur_cols)up torows(the row count from the previous successful column count). Each increment re-scans all N elements.With our module set:
cur_colsfloor(129/62)= 2rowsaftercur_cols=2succeedscur_rowswhen tryingcur_cols=3ceil(16250/3)= 5,417last_roundincrements neededAt Tcl's interpreted throughput of ~500K iterations/sec for this workload, this takes ~30s of pure user CPU time.
How to Reproduce
Suggested Fix
In the
last_roundbranch, replace the linearincr cur_rowswith a direct jump torows:if {!$restart_loop} { if {$last_round} { - incr cur_rows + set cur_rows $rows } else { set cur_rows [expr {int(ceil(double($elt_cnt) / $cur_cols))}] }Why this works: The purpose of
last_roundis to determine whethercur_colscolumns can fit if given more rows. By jumpingcur_rowsdirectly torows(the previous successful row count — the most generous possible value), the very next overflow triggers thecur_rows == rowsbailout and immediately decrementscur_cols. This reduces thelast_roundphase from O(N) outer iterations to O(1).Tradeoff: In rare cases the chosen layout may have 1–2 more rows than the theoretical minimum for that column count. This is invisible in practice (e.g., 8,125 rows vs 8,123 rows for 16,000 modules).
Measured improvement:
Workarounds
For users who cannot patch
modulecmd.tcl:export MODULES_TERM_WIDTH=1— forces single-column output, bypasses the optimizermodule avail --terse— usesjoininstead of column formattingexport MODULES_TERM_WIDTH=61— any value less thanmax_name_lenforces single-column