Skip to content

Storage: google-resumable-media==0.4.0 Breaks Gzipped Downloads #9188

@william-silversmith

Description

@william-silversmith

Environment details

  1. Specify the API at the beginning of the title (for example, "BigQuery: ...")
    General, Core, and Other are also allowed as types

Google Storage blob.py

  1. OS type and version

Ubuntu 14.04

  1. Python version and virtual environment information: python --version

Python 3.6.8

  1. google-cloud- version: pip show google-<service> or pip freeze

google-cloud-storage==1.19.0

Steps to reproduce

  1. pip install google-resumable-media==0.4.0
  2. blob.download_as_string()

Per the latest release of google-resumable-media, no decompression of content-encoding gzip is performed and raw bytes are returned.

See https://github.com/googleapis/google-resumable-media-python/releases

blob.download_as_string() formerly returned decompressed bytes, and now returns compressed bytes. We are using .blob instead of .get_blob for an HPC application and thus have no way of knowing what the content encoding is as the information is erased.

We actually LIKE the new functionality as we can now decide when to decompress, but we need to know the content encoding to avoid various kinds of problems that would be introduced by speculative decompression.

Code example

Here is our desired functionality.

    blob = bucket.blob( key )
    try:
      # blob handles the decompression so the encoding is None
      resp = blob.download_as_string(start=start, end=end)
      return resp, blob.content_encoding
    except google.cloud.exceptions.NotFound as err:
      return None, None

Adding this patch to google.cloud.storage.blob.py would solve this problem for us:

    def _do_download(
        self, transport, file_obj, download_url, headers, start=None, end=None
    ):
        """Perform a download without any error handling.

        This is intended to be called by :meth:`download_to_file` so it can
        be wrapped with error handling / remapping.

        :type transport:
            :class:`~google.auth.transport.requests.AuthorizedSession`
        :param transport: The transport (with credentials) that will
                          make authenticated requests.

        :type file_obj: file
        :param file_obj: A file handle to which to write the blob's data.

        :type download_url: str
        :param download_url: The URL where the media can be accessed.

        :type headers: dict
        :param headers: Optional headers to be sent with the request(s).

        :type start: int
        :param start: Optional, the first byte in a range to be downloaded.

        :type end: int
        :param end: Optional, The last byte in a range to be downloaded.
        """
        if self.chunk_size is None:
            download = Download(
                download_url, stream=file_obj, headers=headers, start=start, end=end
            )
>           response = download.consume(transport)
>           if 'Content-Encoding' in response.headers:
>               self.content_encoding = response.headers['Content-Encoding']
        else:
            download = ChunkedDownload(
                download_url,
                self.chunk_size,
                file_obj,
                headers=headers,
                start=start if start else 0,
                end=end,
            )

            while not download.finished:
                download.consume_next_chunk(transport)

Metadata

Metadata

Assignees

Labels

🚨This issue needs some love.api: storageIssues related to the Cloud Storage API.triage meI really want to be triaged.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions