TarReader: implement GNU sparse format 1.0 (PAX)#125283
TarReader: implement GNU sparse format 1.0 (PAX)#125283
Conversation
…rse.name and GNU.sparse.realsize Co-authored-by: lewing <24063+lewing@users.noreply.github.com>
…sertions for data stream integrity Co-authored-by: lewing <24063+lewing@users.noreply.github.com>
|
@copilot What does happen when we attempt to extract such archives? If we are to transparently handle the real name and size, then we should also expand the contents when reading the entrys DataStream |
Extracting GNU sparse format 1.0 PAX entries with the current code produces a file with the correct name (from our fix) but incorrect content. Here's what happens: the
When Properly expanding the DataStream would require:
This is significantly more complex than the current fix. The current PR is scoped to fix the metadata issues ( I can revert the |
|
Tagging subscribers to this area: @dotnet/area-system-formats-tar |
|
@copilot simply fixing the exposed metadata does not address the original reported issue, you need to implement complete handling of the pax 1.0 sparse format (only for reading, no need to be able to produce them) I think it is okay to avoid implementing the older sparse file formats though. |
…SparseStream Co-authored-by: rzikm <32671551+rzikm@users.noreply.github.com>
…ed offsets, fix CanSeek order, fix async allocation Co-authored-by: rzikm <32671551+rzikm@users.noreply.github.com>
Done in commit
Code review fixes applied: |
src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarHeader.Read.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarReader.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Formats.Tar/tests/TarReader/TarReader.GetNextEntry.Tests.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Formats.Tar/tests/TarReader/TarReader.GetNextEntry.Tests.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Formats.Tar/tests/TarReader/TarReader.GetNextEntry.Tests.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Formats.Tar/src/System/Formats/Tar/GnuSparseStream.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Formats.Tar/src/System/Formats/Tar/GnuSparseStream.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Formats.Tar/src/System/Formats/Tar/GnuSparseStream.cs
Show resolved
Hide resolved
src/libraries/System.Formats.Tar/src/System/Formats/Tar/GnuSparseStream.cs
Outdated
Show resolved
Hide resolved
|
@copilot Address the code review comments, for the segments binary search, preserve the behavior that we don't perform unnecessary searches when doing sequential read of the entire entry. |
…tial scan for forward reads; fix overflow check Co-authored-by: rzikm <32671551+rzikm@users.noreply.github.com>
Addressed in
|
src/libraries/System.Formats.Tar/src/System/Formats/Tar/GnuSparseStream.cs
Show resolved
Hide resolved
src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarHeader.Read.cs
Show resolved
Hide resolved
|
@copilot Address code review feedback. Throw if sparse size is negative and add a test for it. |
…tream, add test Co-authored-by: rzikm <32671551+rzikm@users.noreply.github.com> Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/b144a839-fb32-4a5b-a189-ccc44b7a928b
Done in |
src/libraries/System.Formats.Tar/src/System/Formats/Tar/GnuSparseStream.cs
Show resolved
Hide resolved
src/libraries/System.Formats.Tar/tests/TarReader/TarReader.SparseFile.Tests.cs
Show resolved
Hide resolved
|
@copilot please address the code review comment about the helpers in tests, also check if there is any other duplication that we could remove. There also seem to be some failing tests introduced by this PR, please take a look |
Addressed in the latest commit. Extracted a Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
…or archive construction Co-authored-by: rzikm <32671551+rzikm@users.noreply.github.com> Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/4d462bd2-ae06-4bb8-bfdd-6b7866781e1f
alinpahontu2912
left a comment
There was a problem hiding this comment.
LGTM, but maybe @iremyux might want to take a look too
TarReaderwas not handling GNU sparse format 1.0 PAX entries, causing ~46% of entries from bsdtar-created archives (e.g., .NET SDK tarballs built on macOS/APFS) to expose internal placeholder paths likeGNUSparseFile.0/real-file.dll, incorrect sizes, and corrupted extracted content.Changes
Added read-only support for GNU sparse format 1.0 (PAX). When
TarReaderencounters PAX extended attributesGNU.sparse.major=1andGNU.sparse.minor=0, it resolves the real file name fromGNU.sparse.name, reports the expanded size fromGNU.sparse.realsize, and wraps the raw data stream withGnuSparseStreamwhich presents the expanded virtual file content (zeros for holes, packed data at correct offsets).The sparse map embedded in the data section is parsed lazily on first
Read, so_dataStreamremains unconsumed during entry construction. This allowsTarWriter.WriteEntryto round-trip the condensed sparse data correctly for both seekable and non-seekable source archives.Older GNU sparse formats (0.0, 0.1) and write support are not addressed.
Additional correctness and robustness improvements based on code review:
GnuSparseStreamnow overridesDisposeAsyncto properly await async disposal of the underlying raw stream.TarHeader.Readnow throwsInvalidDataExceptionifGNU.sparse.realsizeis negative, consistent with validation of the regular_sizefield.offset > _realSize || length > _realSize - offset).FindSegmentFromCurrentuses binary search (O(log n)) for backward seeks, preserving the O(1) amortized forward scan for the common sequential-read case.Testing
All existing tests pass. New
TarReader.SparseFile.Tests.cscovers:copyData× sync/asyncGNU.sparse.realsizevalue throwsInvalidDataException(sync and async)pax-nil-sparse-data,pax-nil-sparse-hole,pax-sparse-big)AdvancePastEntry_DoesNotCorruptNextEntryandCopySparseEntryToNewArchive_PreservesExpandedContentnow share archive construction helpers with the rest of the test suite💬 Send tasks to Copilot coding agent from Slack and Teams to turn conversations into code. Copilot posts an update in your thread when it's finished.