Skip to content

extract, list: auto-detect and decompress gzip archives#168

Open
kevinburke wants to merge 1 commit into
uutils:mainfrom
kevinburke:kb-gzip-support
Open

extract, list: auto-detect and decompress gzip archives#168
kevinburke wants to merge 1 commit into
uutils:mainfrom
kevinburke:kb-gzip-support

Conversation

@kevinburke

Copy link
Copy Markdown
Contributor

Previously, extract and list passed the raw file bytes directly to the tar parser without decompression. When given a .tar.gz file, the compressed gzip stream was interpreted as tar headers, producing errors like "numeric field did not have utf-8 text" on the checksum field.

Detect gzip compression by reading the two-byte magic number (0x1f 0x8b) at the start of the file, and wrap the reader in a GzDecoder when present. Plain .tar files continue to work as before.

Confirmed this patch allows extraction of Go source code from https://go.dev/dl/ (previously we would get an error).

@codecov

codecov Bot commented Apr 10, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 99.43820% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 97.77%. Comparing base (791ae26) to head (f8884ba).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
src/uu/tar/src/compression.rs 98.62% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #168      +/-   ##
==========================================
+ Coverage   96.84%   97.77%   +0.92%     
==========================================
  Files          11       15       +4     
  Lines        1492     1751     +259     
  Branches       29       34       +5     
==========================================
+ Hits         1445     1712     +267     
+ Misses         46       38       -8     
  Partials        1        1              
Flag Coverage Δ
macos_latest 97.77% <99.43%> (+0.92%) ⬆️
ubuntu_latest 97.77% <99.43%> (+0.92%) ⬆️
windows_latest 0.00% <0.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kaladron

Copy link
Copy Markdown
Collaborator

Hi Kevin! I'm holding this for just a bit while I get the support for reading from stdin in as that will change some function interfaces. I'll review this right after.

@kaladron kaladron self-requested a review April 10, 2026 06:38
@kaladron kaladron self-assigned this Apr 10, 2026
@kaladron

Copy link
Copy Markdown
Collaborator

Please tage #158 in the commit description.

@kevinburke

Copy link
Copy Markdown
Contributor Author

I see quite a lot of patches just came in; anything I can help review?

@kaladron

Copy link
Copy Markdown
Collaborator

I see quite a lot of patches just came in; anything I can help review?

I filed Issues for each piece of missing functionality so that we can coordinate. If more people are showing up, I want them to be able to claim an issue rather than have any conflict. But I'll tag you in two others (that are blocking these at the moment)

@kevinburke kevinburke force-pushed the kb-gzip-support branch 3 times, most recently from b98b8d5 to 08cf2ca Compare April 13, 2026 16:15
@codspeed-hq

codspeed-hq Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Merging this PR will not alter performance

✅ 7 untouched benchmarks


Comparing kevinburke:kb-gzip-support (f8884ba) with main (9dd7012)

Open in CodSpeed

@kevinburke kevinburke force-pushed the kb-gzip-support branch 3 times, most recently from 33822fa to 77374c7 Compare June 15, 2026 17:31
Teach tar to auto-detect gzip-compressed archives for list and extract
operations while keeping archive creation explicit via -z/--gzip.

Route archive I/O through a shared compression helper that supports plain,
gzip, and zstd streams. This keeps the existing --zstd behavior while adding
gzip sniffing on reads and gzip encoding on writes when requested.

Add integration tests covering gzip create, list, extract, explicit -z on
extract/list, stdio round-tripping, invalid gzip input, and continued zstd
create/list/extract behavior.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants