Skip to content

Add the ability to specify frame alignment#20

Open
asayers wants to merge 3 commits intororosen:mainfrom
asayers:alignment
Open

Add the ability to specify frame alignment#20
asayers wants to merge 3 commits intororosen:mainfrom
asayers:alignment

Conversation

@asayers
Copy link

@asayers asayers commented Oct 26, 2025

This PR is a proof-of-concept which adds a --align=<n> flag which forces all frames to be aligned to an n-byte boundary in the resulting file. It does this by inserting skippable frames as padding.

Demo

Frame Index     Compressed    Uncompressed    Compressed Offset    Uncompressed Offset 
          0    1008.69 KiB        5.75 MiB                  0 B                    0 B                 
          1      15.31 KiB             0 B          1008.69 KiB               5.75 MiB            
          2    1015.99 KiB        3.75 MiB             1.00 MiB               5.75 MiB            
          3       8.01 KiB             0 B             1.99 MiB               9.50 MiB            
          4     998.34 KiB        3.62 MiB             2.00 MiB               9.50 MiB            

This is the result of running with --align=1M. The odd-numbered frames are the regular ones (the ones you'd want to read from). Notice how they all start on 1 MiB boundaries. The even-numbered frames are skippable "padding" frames.

Motivation

In theory, setting --align=4K should reduce read amplification and pagecache usage when seeking. The effect would be more noticeable with smaller frames. Eg. if you wanted a perfectly seekable file, you'd want 4 KiB frames. In that case, misaligned frames would be amplified by 100%. I haven't measured anything though, so I don't really know how big the effect is in practice.

@asayers
Copy link
Author

asayers commented Oct 26, 2025

I wouldn't be surprised if there are bugs in this implementation. This PR is meant to be more of a conversation-starter than something that can be merged as-is.

@rorosen
Copy link
Owner

rorosen commented Nov 12, 2025

Very interesting, thanks.

Do you have a use case for aligned frames? I like the idea but suspect that it will be rarely useful in practice

@asayers
Copy link
Author

asayers commented Nov 13, 2025

The idea is based on the fact that files are not really random-access: you can only really seek to a multiple of 4k. So if the frame you want isn't 4k-aligned then the kernel will end up reading some useless data from the end of the previous frame. This doesn't matter much if you seek rarely, so this --align is meant to be a performance optimisation for the case where you do a lot of small random reads. That's the theory, but I still haven't benchmarked anything, so I still have no idea if it's effective!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants