Feature/zip archive forward read#126646
Conversation
Adds a new ForwardRead mode to ZipArchive that enables forward-only sequential reading of ZIP entries from non-seekable streams, using local file headers instead of the central directory. Changes: - ZipArchiveMode.cs: Add ForwardRead = 3 enum value - ZipCustomStreams.cs: Add BoundedReadOnlyStream and ReadAheadStream helper stream classes - ZipArchive.cs: Add GetNextEntry()/GetNextEntryAsync() methods, ForwardRead constructor case, ValidateMode/DecideArchiveStream support, data descriptor parsing, and property guards - ZipArchive.Async.cs: Add ForwardRead cases to CreateAsync and DisposeAsyncCore - ZipArchiveEntry.cs: Add forward-read constructor, ForwardReadDataStream property, UpdateFromDataDescriptor method, OpenInForwardReadMode, and property setter guards - Strings.resx: Add ForwardRead error message strings - ref/System.IO.Compression.cs: Add public API surface - Tests: Add comprehensive zip_ForwardReadTests covering deflate, stored, data descriptors, non-seekable streams, empty archives, partial reads, error cases, and async operations Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR introduces an experimental forward-only streaming read mode for System.IO.Compression.ZipArchive to support sequential iteration over ZIP entries via local file headers (instead of loading the central directory up-front), including async iteration and new unit tests.
Changes:
- Add
ZipArchiveMode.ForwardReadplus publicZipArchive.GetNextEntry()/GetNextEntryAsync(...)APIs for forward-only entry iteration. - Implement forward-read parsing and stream wrappers (
ReadAheadStream,BoundedReadOnlyStream) to enable non-seekable streaming scenarios. - Add new unit tests validating basic ForwardRead behavior and unsupported operations.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| src/libraries/System.IO.Compression/tests/ZipArchive/zip_ForwardReadTests.cs | Adds new test coverage for ForwardRead iteration, sync/async behavior, and unsupported APIs. |
| src/libraries/System.IO.Compression/tests/System.IO.Compression.Tests.csproj | Includes the new ForwardRead test file in the test project. |
| src/libraries/System.IO.Compression/src/System/IO/Compression/ZipCustomStreams.cs | Adds BoundedReadOnlyStream and ReadAheadStream used by ForwardRead mode. |
| src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveMode.cs | Introduces ZipArchiveMode.ForwardRead enum value with documentation. |
| src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs | Adds ForwardRead entry initialization and Open() support for ForwardRead mode. |
| src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.cs | Implements ForwardRead iteration, local header parsing, data-descriptor handling, and non-seekable wrapping. |
| src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.Async.cs | Enables ForwardRead setup for async factory and disposal paths. |
| src/libraries/System.IO.Compression/src/Resources/Strings.resx | Adds new SR strings for ForwardRead error messages. |
| src/libraries/System.IO.Compression/ref/System.IO.Compression.cs | Updates public surface area (new mode + new ZipArchive methods). |
| // Known size, not encrypted | ||
| Stream bounded = new BoundedReadOnlyStream(_archiveStream, compressedSize); | ||
| Stream decompressor = CreateForwardReadDecompressor(bounded, compressionMethod, uncompressedSize, leaveOpen: false); | ||
| dataStream = new CrcValidatingReadStream(decompressor, crc32, uncompressedSize); |
There was a problem hiding this comment.
I wonder if we should postpone DataStream creation and do it lazily like we do for the other entries.
There was a problem hiding this comment.
I think I can defer that for the on data descriptor entries
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 14 comments.
Comments suppressed due to low confidence (1)
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs:475
- The XML documentation for
Open()(andOpen(FileAccess)) was not updated to describeZipArchiveMode.ForwardRead. The summary still says "If the archive that the entry belongs to was opened in Read mode, the returned stream will be readable, and it may or may not be seekable. If Create mode, ... If Update mode, ..." with no mention of ForwardRead, even though this PR introduces a new code path in the switch. Similarly, the<list>inOpen(FileAccess)doesn't include the ForwardRead bullet, and the exception list omits the newNotSupportedException/InvalidOperationExceptioncases (encrypted, no data stream, stream already opened). Please extend the doc comments to cover the new mode.
/// <summary>
/// Opens the entry. If the archive that the entry belongs to was opened in Read mode, the returned stream will be readable, and it may or may not be seekable. If Create mode, the returned stream will be writable and not seekable. If Update mode, the returned stream will be readable, writable, seekable, and support SetLength.
/// </summary>
/// <returns>A Stream that represents the contents of the entry.</returns>
/// <exception cref="IOException">The entry is already currently open for writing. -or- The entry has been deleted from the archive. -or- The archive that this entry belongs to was opened in ZipArchiveMode.Create, and this entry has already been written to once.</exception>
/// <exception cref="InvalidDataException">The entry is missing from the archive or is corrupt and cannot be read. -or- The entry has been compressed using a compression method that is not supported.</exception>
/// <exception cref="ObjectDisposedException">The ZipArchive that this entry belongs to has been disposed.</exception>
public Stream Open()
{
ThrowIfInvalidArchive();
switch (_archive.Mode)
{
case ZipArchiveMode.Read:
return OpenInReadMode(checkOpenable: true);
case ZipArchiveMode.Create:
return OpenInWriteMode();
case ZipArchiveMode.ForwardRead:
return OpenInForwardReadMode();
case ZipArchiveMode.Update:
default:
Debug.Assert(_archive.Mode == ZipArchiveMode.Update);
return OpenInUpdateMode();
}
}
/// <summary>
/// Opens the entry with the specified access mode. This allows for more granular control over the returned stream's capabilities.
/// </summary>
/// <param name="access">The file access mode for the returned stream.</param>
/// <returns>A <see cref="Stream"/> that represents the contents of the entry with the specified access capabilities.</returns>
/// <remarks>
/// <para>The allowed <paramref name="access"/> values depend on the <see cref="ZipArchiveMode"/>:</para>
/// <list type="bullet">
/// <item><description><see cref="ZipArchiveMode.Read"/>: Only <see cref="FileAccess.Read"/> is allowed.</description></item>
/// <item><description><see cref="ZipArchiveMode.Create"/>: <see cref="FileAccess.Write"/> and <see cref="FileAccess.ReadWrite"/> are allowed (both write-only).</description></item>
/// <item><description><see cref="ZipArchiveMode.Update"/>: All values are allowed. <see cref="FileAccess.Read"/> reads directly from the archive. <see cref="FileAccess.Write"/> discards existing content and provides an empty writable stream. <see cref="FileAccess.ReadWrite"/> loads existing content into memory (equivalent to <see cref="Open()"/>).</description></item>
/// </list>
/// </remarks>
/// <exception cref="ArgumentOutOfRangeException"><paramref name="access"/> is not a valid <see cref="FileAccess"/> value.</exception>
/// <exception cref="InvalidOperationException">The requested access is not compatible with the archive's open mode.</exception>
/// <exception cref="IOException">The entry is already currently open for writing. -or- The entry has been deleted from the archive. -or- The archive that this entry belongs to was opened in ZipArchiveMode.Create, and this entry has already been written to once.</exception>
/// <exception cref="InvalidDataException">The entry is missing from the archive or is corrupt and cannot be read. -or- The entry has been compressed using a compression method that is not supported.</exception>
/// <exception cref="ObjectDisposedException">The ZipArchive that this entry belongs to has been disposed.</exception>
| // ── Core reading scenarios ────────────────────────────────────────── | ||
|
|
||
| [Theory] | ||
| [MemberData(nameof(Get_Booleans_Data))] | ||
| public async Task NonSeekableStream_ConsumeSkipConsume_ReadsCorrectly(bool async) | ||
| { | ||
| byte[] zipBytes = CreateZipWithEntries(CompressionLevel.Optimal, seekable: false); | ||
| byte[][] expected = [s_smallContent, s_mediumContent, s_largeContent]; | ||
|
|
||
| using MemoryStream archiveStream = new(zipBytes); | ||
| using WrappedStream nonSeekable = new(archiveStream, canRead: true, canWrite: false, canSeek: false, null); | ||
| using ZipArchive archive = new(nonSeekable, ZipArchiveMode.ForwardRead); | ||
|
|
||
| ZipArchiveEntry? first = await GetNextEntry(archive, async); | ||
| Assert.NotNull(first); | ||
| using (Stream ds = first.Open()) | ||
| { | ||
| Assert.Equal(expected[0], await ReadStreamFully(ds, async)); | ||
| } | ||
|
|
||
| // Skip second entry without opening | ||
| ZipArchiveEntry? second = await GetNextEntry(archive, async); | ||
| Assert.NotNull(second); | ||
| Assert.Equal("medium.bin", second.FullName); | ||
|
|
||
| ZipArchiveEntry? third = await GetNextEntry(archive, async); | ||
| Assert.NotNull(third); | ||
| using (Stream ds = third.Open()) | ||
| { | ||
| Assert.Equal(expected[2], await ReadStreamFully(ds, async)); | ||
| } | ||
|
|
||
| Assert.Null(await GetNextEntry(archive, async)); | ||
| } | ||
|
|
||
| [Theory] | ||
| [InlineData(true, true)] | ||
| [InlineData(true, false)] | ||
| [InlineData(false, true)] | ||
| [InlineData(false, false)] | ||
| public async Task StoredEntries_SeekableAndNonSeekable_ReadCorrectly(bool async, bool readSeekable) | ||
| { | ||
| // Always created on seekable stream → known sizes, no data descriptors | ||
| byte[] zipBytes = CreateZipWithEntries(CompressionLevel.NoCompression, seekable: true); | ||
| byte[][] expected = [s_smallContent, s_mediumContent, s_largeContent]; | ||
|
|
||
| using MemoryStream archiveStream = new(zipBytes); | ||
| Stream readStream = readSeekable | ||
| ? archiveStream | ||
| : new WrappedStream(archiveStream, canRead: true, canWrite: false, canSeek: false, null); | ||
| using ZipArchive archive = new(readStream, ZipArchiveMode.ForwardRead); | ||
|
|
||
| for (int i = 0; i < expected.Length; i++) | ||
| { | ||
| ZipArchiveEntry? entry = await GetNextEntry(archive, async); | ||
| Assert.NotNull(entry); | ||
| Assert.Equal(ZipCompressionMethod.Stored, entry.CompressionMethod); | ||
|
|
||
| using Stream ds = entry.Open(); | ||
| Assert.Equal(expected[i], await ReadStreamFully(ds, async)); | ||
| } | ||
|
|
||
| Assert.Null(await GetNextEntry(archive, async)); | ||
| } | ||
|
|
||
| [Theory] | ||
| [MemberData(nameof(Get_Booleans_Data))] | ||
| public async Task PartialRead_ThenAdvance_ReadsNextEntryCorrectly(bool async) | ||
| { | ||
| byte[] zipBytes = CreateZipWithEntries(CompressionLevel.Optimal, seekable: false); | ||
|
|
||
| using MemoryStream archiveStream = new(zipBytes); | ||
| using ZipArchive archive = new(archiveStream, ZipArchiveMode.ForwardRead); | ||
|
|
||
| ZipArchiveEntry? first = await GetNextEntry(archive, async); | ||
| Assert.NotNull(first); | ||
|
|
||
| using (Stream ds = first.Open()) | ||
| { | ||
| byte[] partial = new byte[3]; | ||
| await ReadStream(ds, partial, async); | ||
| } | ||
|
|
||
| ZipArchiveEntry? second = await GetNextEntry(archive, async); | ||
| Assert.NotNull(second); | ||
| Assert.Equal("medium.bin", second.FullName); | ||
|
|
||
| using Stream ds2 = second.Open(); | ||
| Assert.Equal(s_mediumContent, await ReadStreamFully(ds2, async)); | ||
| } | ||
|
|
Fixes #1550
Experiment with new Forward only reading mode for ziparchvie, that relies on the local headers of entries instead of the central directory, avoiding loading everything into memory at once