There seem to be huge differences in how memory is allocated between repeated runs of the same benchmark, as reported by CodSpeed. That completely makes the regression checks useless. For example
vs.
for two runs of 1-conn/1-100mb-req (aka. Upload).
Do we have any idea why that is?