-
Notifications
You must be signed in to change notification settings - Fork 83
Description
Making a new issue for this at the request of @jeromekelleher since #3306 was merged and closed. Tagging @petrelharp since he'll be a client of this API too, in pyslim.
Copy-paste from #3306:
One more thing that occurs to me @benjeffery: it would perhaps be nice to provide a memory alignment guarantee for the binary portion (assuming that the metadata chunk as a whole is at an aligned address, which is guaranteed by malloc). On my end, when writing out the metadata, I could of course insert null bytes in between the JSON and the binary to align the binary portion; but this gets tricky because those null bytes would have to be described in the binary schema, and the number of null bytes is variable since the length of the JSON string is variable, so it makes everything a bit of a mess. (I guess SLiM would have eight different possible binary schemas that it uses, with from 0 to 7 null bytes at the start of the binary data as needed, eww.) I could put spaces at the end of the JSON string instead to produce alignment, but I'm worried that that would make the JSON string non-canonical and make tskit produce an error, so I think (?) that solution is out.
Having an alignment guarantee (I'd suggest 8-byte) would be nice because SLiM will be writing and reading lots of data directly to/from the binary blob, so if it's unaligned it just makes for additional headaches and lower performance in the read/write code. If this alignment guarantee were built into the format expected by the json+struct codec, it would smooth the path. Thoughts?
Addendum now:
I've implemented my own alignment bytes for the binary blob in SLiM, and it was indeed a pain in the tuchus. It requires eight different versions of the binary schema, specifying the number of alignment bytes used as being from 0 to 7 (to produce 8-byte alignment). Lots of extra faffing about at both write and read time, which pyslim and anybody else who cares will have to duplicate. I strongly recommend that automatic 8-byte alignment be added to this codec so clients don't have to worry about all this.