add create_zip and open_zip#25
add create_zip and open_zip#25samoconnor wants to merge 4 commits intofhs:masterfrom samoconnor:master
Conversation
|
This implementation is quiet different from your original proposal. You basically have a wrapper around Reader/Writer constructor and add some new methods to it. I think now it makes more sense to add the new functionality directly to Reader/Writer. So, I propose renaming open_zip to Reader and create_zip to Writer, and bringing them back into top-level module. That said, the following two function are kind of inconsistent: I'd change them to something like this: The name The documentation needs to be regenerated -- run doc/jldoc.py. But really we should be using docstrings introduced in julia 0.4. |
src/create_zip.jl
Outdated
Many new test cases. open_zip now supports read and write. create_zip now only does 1-line zip creation from Pairs, Dict etc... open_zip now returns ZipArchive <: Associative. This wraps Reader and Writer. ZipArchive has as cache to support reading and writing an archive. e.g. if the user opens an existing archive, then adds a file, the existing files are read in to the cache, then the reader is closed and a writer is created to write everyting out to the output archive. If the user only reads or only writes then the cache is not used and the previous memory-efficient behaviour is retained where one file is handled at a time. If reads and writes are mixed, the cache is activated and the whole content is stored in ram. This could be made more efficient later by stroring compressed data in the cache to avoid decompressing/recompressing. ZipFile.jl type Reader and Writer: Add _close_io::Bool flag to tell close() if it is allowed to close the underlying stream (sometimes the caller wants it left open). Add rewind(f::ReadableFile) to allow a ReadableFile to be read more than once (not exported). Fix bug in write(f::WritableFile, p::Ptr, nb::Integer) where a zero length write would cause an error inside _zio.
|
@fhs, My intention with this API design was to follow the patterns used in Base. My first thought was that Base should already have a good abstraction for reading and writing to an archive. Associative (and by implication iterable) seems like a good fit for an archive that identifies data by strings. The next decision was type names. ZipFile has I agree that create_file() doing two different things isn't great (open file for writing vs one-shot archive creation). In https://github.com/samoconnor/ZipFile.jl/commit/f275ea73c929063c8d8dd7687360c53e9fa4e955 I've refactored to avoid this. While I was at it I addressed the Reader/Writer issue and added a Also new is this for creating a zip file from other files... function create_zip(io::IO, files::Array)
create_zip(io::IO, files, [open(readbytes, f) for f in files])
end
e.g.
create_zip("foo.zip", ["file1.csv", "file2.csv", "subdir/file3.csv"])(prompted by a question from @nalimilan in JuliaLang/julia#14546 (comment)) |
There was a problem hiding this comment.
@fhs, bug fix for zero-length file written to archive.
There was a problem hiding this comment.
I don't understand what's the bug. If nb is 0, write function wouldn't write anything anyway.
There was a problem hiding this comment.
it was crashing somewhere inside zlib when I wrote an empty file
There was a problem hiding this comment.
It seems like a zlib or Zlib.jl bug. Maybe switching to Libz.jl will fix this (see #22).
For now, you can send a separate PR with a comment saying this is a workaround zlib/Zlib.jl bug.
There was a problem hiding this comment.
Never mind. I just merged the fix. Thanks.
|
I don't see why we need caching and ZipArchive. It seems like the library is trying to do too much. Caching is not an easy problem. You don't need want to read large files from a zip archive and keep it in memory. I liked the previous implementation where it just added few more convenience functions to Reader/Writer. BTW, I called them Reader/Writer because this library was inspired by Go's zip package (https://golang.org/pkg/archive/zip/) and even in Go -er names are usually reserved for interfaces, not structs. So, I don't know if they're good names or not, but it's too late to change them. Please discuss the new API before you implement it, so we can come to an agreement and you're not wasting your time ;-) |
|
Caching is necessary if you want to update an existing archive. A single ZipArchive type to replace Reader and Writer makes things much easier for the user who wants to read and write to the same archive. My implementation is careful to avoid the cache for use patterns where it is not needed, e.g. pure writes to a new archive, or pure reads from an existing archive. The test cases cover a number of combinations of read-write-read, write-read-write, etc... You are right that for large files there would be a large memory overhead. I've done a quick implementation of the same interface using Info-ZIP as the backend. I'll leave it up to you to decide how much of this interface is appropriate for ZipFile. |
|
Hi @fhs, the Although your pure-Julia ZipFile implementation is not as full featured as Info-ZIP I imagine that there are use cases where avoiding the external dependancy on Info-ZIP is desirable. Do you have any interest in:
It seems to me that there is benefit to end users in having a consistent interface. |
|
The latter option seems good to me. I would like to keep ZipFile.jl as pure julia. I'm closing this PR for now. It seems to me that the use-case for this is dealing with lots of small files in a zip file. I think it'll introduce performance issues for large files in a zip file. Also, there are more important things to work on besides a higher-level alternate API, such as
|
|
Hi @fhs, sounds good to me. |
Per JuliaLang/METADATA.jl#4332 (comment)