Skip to content

Feature/zip archive forward read#126646

Draft
alinpahontu2912 wants to merge 10 commits into
dotnet:mainfrom
alinpahontu2912:feature/zip-archive-forward-read
Draft

Feature/zip archive forward read#126646
alinpahontu2912 wants to merge 10 commits into
dotnet:mainfrom
alinpahontu2912:feature/zip-archive-forward-read

Conversation

@alinpahontu2912
Copy link
Copy Markdown
Member

Fixes #1550

Experiment with new Forward only reading mode for ziparchvie, that relies on the local headers of entries instead of the central directory, avoiding loading everything into memory at once

alinpahontu2912 and others added 3 commits April 7, 2026 16:01
Adds a new ForwardRead mode to ZipArchive that enables forward-only
sequential reading of ZIP entries from non-seekable streams, using
local file headers instead of the central directory.

Changes:
- ZipArchiveMode.cs: Add ForwardRead = 3 enum value
- ZipCustomStreams.cs: Add BoundedReadOnlyStream and ReadAheadStream
  helper stream classes
- ZipArchive.cs: Add GetNextEntry()/GetNextEntryAsync() methods,
  ForwardRead constructor case, ValidateMode/DecideArchiveStream
  support, data descriptor parsing, and property guards
- ZipArchive.Async.cs: Add ForwardRead cases to CreateAsync and
  DisposeAsyncCore
- ZipArchiveEntry.cs: Add forward-read constructor, ForwardReadDataStream
  property, UpdateFromDataDescriptor method, OpenInForwardReadMode,
  and property setter guards
- Strings.resx: Add ForwardRead error message strings
- ref/System.IO.Compression.cs: Add public API surface
- Tests: Add comprehensive zip_ForwardReadTests covering deflate,
  stored, data descriptors, non-seekable streams, empty archives,
  partial reads, error cases, and async operations

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an experimental forward-only streaming read mode for System.IO.Compression.ZipArchive to support sequential iteration over ZIP entries via local file headers (instead of loading the central directory up-front), including async iteration and new unit tests.

Changes:

  • Add ZipArchiveMode.ForwardRead plus public ZipArchive.GetNextEntry() / GetNextEntryAsync(...) APIs for forward-only entry iteration.
  • Implement forward-read parsing and stream wrappers (ReadAheadStream, BoundedReadOnlyStream) to enable non-seekable streaming scenarios.
  • Add new unit tests validating basic ForwardRead behavior and unsupported operations.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
src/libraries/System.IO.Compression/tests/ZipArchive/zip_ForwardReadTests.cs Adds new test coverage for ForwardRead iteration, sync/async behavior, and unsupported APIs.
src/libraries/System.IO.Compression/tests/System.IO.Compression.Tests.csproj Includes the new ForwardRead test file in the test project.
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipCustomStreams.cs Adds BoundedReadOnlyStream and ReadAheadStream used by ForwardRead mode.
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveMode.cs Introduces ZipArchiveMode.ForwardRead enum value with documentation.
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs Adds ForwardRead entry initialization and Open() support for ForwardRead mode.
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.cs Implements ForwardRead iteration, local header parsing, data-descriptor handling, and non-seekable wrapping.
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.Async.cs Enables ForwardRead setup for async factory and disposal paths.
src/libraries/System.IO.Compression/src/Resources/Strings.resx Adds new SR strings for ForwardRead error messages.
src/libraries/System.IO.Compression/ref/System.IO.Compression.cs Updates public surface area (new mode + new ZipArchive methods).

Comment thread src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.cs Outdated
Comment thread src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.cs Outdated
Comment thread src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.cs Outdated
Comment thread src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.cs Outdated
Comment thread src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.cs Outdated
Comment thread src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.cs Outdated
Comment thread src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs Outdated
Comment thread src/libraries/System.IO.Compression/tests/ZipArchive/zip_ForwardReadTests.cs Outdated
Comment thread src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.Async.cs Outdated
Comment thread src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.cs Outdated
Comment thread src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.cs Outdated
Comment thread src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.cs Outdated
Comment thread src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.cs Outdated
Comment thread src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.cs Outdated
Comment thread src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.cs Outdated
Comment on lines +695 to +698
// Known size, not encrypted
Stream bounded = new BoundedReadOnlyStream(_archiveStream, compressedSize);
Stream decompressor = CreateForwardReadDecompressor(bounded, compressionMethod, uncompressedSize, leaveOpen: false);
dataStream = new CrcValidatingReadStream(decompressor, crc32, uncompressedSize);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should postpone DataStream creation and do it lazily like we do for the other entries.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I can defer that for the on data descriptor entries

Comment thread src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs Outdated
Copilot AI review requested due to automatic review settings April 21, 2026 10:23
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Comment thread src/libraries/System.IO.Compression/tests/ZipArchive/zip_ForwardReadTests.cs Outdated
Comment thread src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.cs Outdated
Copilot AI review requested due to automatic review settings April 22, 2026 09:42
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

Comment thread src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.cs Outdated
Copilot AI review requested due to automatic review settings May 18, 2026 09:03
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 14 comments.

Comments suppressed due to low confidence (1)

src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs:475

  • The XML documentation for Open() (and Open(FileAccess)) was not updated to describe ZipArchiveMode.ForwardRead. The summary still says "If the archive that the entry belongs to was opened in Read mode, the returned stream will be readable, and it may or may not be seekable. If Create mode, ... If Update mode, ..." with no mention of ForwardRead, even though this PR introduces a new code path in the switch. Similarly, the <list> in Open(FileAccess) doesn't include the ForwardRead bullet, and the exception list omits the new NotSupportedException/InvalidOperationException cases (encrypted, no data stream, stream already opened). Please extend the doc comments to cover the new mode.
        /// <summary>
        /// Opens the entry. If the archive that the entry belongs to was opened in Read mode, the returned stream will be readable, and it may or may not be seekable. If Create mode, the returned stream will be writable and not seekable. If Update mode, the returned stream will be readable, writable, seekable, and support SetLength.
        /// </summary>
        /// <returns>A Stream that represents the contents of the entry.</returns>
        /// <exception cref="IOException">The entry is already currently open for writing. -or- The entry has been deleted from the archive. -or- The archive that this entry belongs to was opened in ZipArchiveMode.Create, and this entry has already been written to once.</exception>
        /// <exception cref="InvalidDataException">The entry is missing from the archive or is corrupt and cannot be read. -or- The entry has been compressed using a compression method that is not supported.</exception>
        /// <exception cref="ObjectDisposedException">The ZipArchive that this entry belongs to has been disposed.</exception>
        public Stream Open()
        {
            ThrowIfInvalidArchive();

            switch (_archive.Mode)
            {
                case ZipArchiveMode.Read:
                    return OpenInReadMode(checkOpenable: true);
                case ZipArchiveMode.Create:
                    return OpenInWriteMode();
                case ZipArchiveMode.ForwardRead:
                    return OpenInForwardReadMode();
                case ZipArchiveMode.Update:
                default:
                    Debug.Assert(_archive.Mode == ZipArchiveMode.Update);
                    return OpenInUpdateMode();
            }
        }

        /// <summary>
        /// Opens the entry with the specified access mode. This allows for more granular control over the returned stream's capabilities.
        /// </summary>
        /// <param name="access">The file access mode for the returned stream.</param>
        /// <returns>A <see cref="Stream"/> that represents the contents of the entry with the specified access capabilities.</returns>
        /// <remarks>
        /// <para>The allowed <paramref name="access"/> values depend on the <see cref="ZipArchiveMode"/>:</para>
        /// <list type="bullet">
        /// <item><description><see cref="ZipArchiveMode.Read"/>: Only <see cref="FileAccess.Read"/> is allowed.</description></item>
        /// <item><description><see cref="ZipArchiveMode.Create"/>: <see cref="FileAccess.Write"/> and <see cref="FileAccess.ReadWrite"/> are allowed (both write-only).</description></item>
        /// <item><description><see cref="ZipArchiveMode.Update"/>: All values are allowed. <see cref="FileAccess.Read"/> reads directly from the archive. <see cref="FileAccess.Write"/> discards existing content and provides an empty writable stream. <see cref="FileAccess.ReadWrite"/> loads existing content into memory (equivalent to <see cref="Open()"/>).</description></item>
        /// </list>
        /// </remarks>
        /// <exception cref="ArgumentOutOfRangeException"><paramref name="access"/> is not a valid <see cref="FileAccess"/> value.</exception>
        /// <exception cref="InvalidOperationException">The requested access is not compatible with the archive's open mode.</exception>
        /// <exception cref="IOException">The entry is already currently open for writing. -or- The entry has been deleted from the archive. -or- The archive that this entry belongs to was opened in ZipArchiveMode.Create, and this entry has already been written to once.</exception>
        /// <exception cref="InvalidDataException">The entry is missing from the archive or is corrupt and cannot be read. -or- The entry has been compressed using a compression method that is not supported.</exception>
        /// <exception cref="ObjectDisposedException">The ZipArchive that this entry belongs to has been disposed.</exception>

Comment thread src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs Outdated
Comment thread src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.cs Outdated
Comment thread src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.cs Outdated
Comment on lines +22 to +112
// ── Core reading scenarios ──────────────────────────────────────────

[Theory]
[MemberData(nameof(Get_Booleans_Data))]
public async Task NonSeekableStream_ConsumeSkipConsume_ReadsCorrectly(bool async)
{
byte[] zipBytes = CreateZipWithEntries(CompressionLevel.Optimal, seekable: false);
byte[][] expected = [s_smallContent, s_mediumContent, s_largeContent];

using MemoryStream archiveStream = new(zipBytes);
using WrappedStream nonSeekable = new(archiveStream, canRead: true, canWrite: false, canSeek: false, null);
using ZipArchive archive = new(nonSeekable, ZipArchiveMode.ForwardRead);

ZipArchiveEntry? first = await GetNextEntry(archive, async);
Assert.NotNull(first);
using (Stream ds = first.Open())
{
Assert.Equal(expected[0], await ReadStreamFully(ds, async));
}

// Skip second entry without opening
ZipArchiveEntry? second = await GetNextEntry(archive, async);
Assert.NotNull(second);
Assert.Equal("medium.bin", second.FullName);

ZipArchiveEntry? third = await GetNextEntry(archive, async);
Assert.NotNull(third);
using (Stream ds = third.Open())
{
Assert.Equal(expected[2], await ReadStreamFully(ds, async));
}

Assert.Null(await GetNextEntry(archive, async));
}

[Theory]
[InlineData(true, true)]
[InlineData(true, false)]
[InlineData(false, true)]
[InlineData(false, false)]
public async Task StoredEntries_SeekableAndNonSeekable_ReadCorrectly(bool async, bool readSeekable)
{
// Always created on seekable stream → known sizes, no data descriptors
byte[] zipBytes = CreateZipWithEntries(CompressionLevel.NoCompression, seekable: true);
byte[][] expected = [s_smallContent, s_mediumContent, s_largeContent];

using MemoryStream archiveStream = new(zipBytes);
Stream readStream = readSeekable
? archiveStream
: new WrappedStream(archiveStream, canRead: true, canWrite: false, canSeek: false, null);
using ZipArchive archive = new(readStream, ZipArchiveMode.ForwardRead);

for (int i = 0; i < expected.Length; i++)
{
ZipArchiveEntry? entry = await GetNextEntry(archive, async);
Assert.NotNull(entry);
Assert.Equal(ZipCompressionMethod.Stored, entry.CompressionMethod);

using Stream ds = entry.Open();
Assert.Equal(expected[i], await ReadStreamFully(ds, async));
}

Assert.Null(await GetNextEntry(archive, async));
}

[Theory]
[MemberData(nameof(Get_Booleans_Data))]
public async Task PartialRead_ThenAdvance_ReadsNextEntryCorrectly(bool async)
{
byte[] zipBytes = CreateZipWithEntries(CompressionLevel.Optimal, seekable: false);

using MemoryStream archiveStream = new(zipBytes);
using ZipArchive archive = new(archiveStream, ZipArchiveMode.ForwardRead);

ZipArchiveEntry? first = await GetNextEntry(archive, async);
Assert.NotNull(first);

using (Stream ds = first.Open())
{
byte[] partial = new byte[3];
await ReadStream(ds, partial, async);
}

ZipArchiveEntry? second = await GetNextEntry(archive, async);
Assert.NotNull(second);
Assert.Equal("medium.bin", second.FullName);

using Stream ds2 = second.Open();
Assert.Equal(s_mediumContent, await ReadStreamFully(ds2, async));
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a Forward-only API for System.IO.Compression

3 participants