Environment details
- Specify the API at the beginning of the title (for example, "BigQuery: ...")
General, Core, and Other are also allowed as types
Google Storage blob.py
- OS type and version
Ubuntu 14.04
- Python version and virtual environment information:
python --version
Python 3.6.8
- google-cloud- version:
pip show google-<service> or pip freeze
google-cloud-storage==1.19.0
Steps to reproduce
- pip install google-resumable-media==0.4.0
- blob.download_as_string()
Per the latest release of google-resumable-media, no decompression of content-encoding gzip is performed and raw bytes are returned.
See https://github.com/googleapis/google-resumable-media-python/releases
blob.download_as_string() formerly returned decompressed bytes, and now returns compressed bytes. We are using .blob instead of .get_blob for an HPC application and thus have no way of knowing what the content encoding is as the information is erased.
We actually LIKE the new functionality as we can now decide when to decompress, but we need to know the content encoding to avoid various kinds of problems that would be introduced by speculative decompression.
Code example
Here is our desired functionality.
blob = bucket.blob( key )
try:
# blob handles the decompression so the encoding is None
resp = blob.download_as_string(start=start, end=end)
return resp, blob.content_encoding
except google.cloud.exceptions.NotFound as err:
return None, None
Adding this patch to google.cloud.storage.blob.py would solve this problem for us:
def _do_download(
self, transport, file_obj, download_url, headers, start=None, end=None
):
"""Perform a download without any error handling.
This is intended to be called by :meth:`download_to_file` so it can
be wrapped with error handling / remapping.
:type transport:
:class:`~google.auth.transport.requests.AuthorizedSession`
:param transport: The transport (with credentials) that will
make authenticated requests.
:type file_obj: file
:param file_obj: A file handle to which to write the blob's data.
:type download_url: str
:param download_url: The URL where the media can be accessed.
:type headers: dict
:param headers: Optional headers to be sent with the request(s).
:type start: int
:param start: Optional, the first byte in a range to be downloaded.
:type end: int
:param end: Optional, The last byte in a range to be downloaded.
"""
if self.chunk_size is None:
download = Download(
download_url, stream=file_obj, headers=headers, start=start, end=end
)
> response = download.consume(transport)
> if 'Content-Encoding' in response.headers:
> self.content_encoding = response.headers['Content-Encoding']
else:
download = ChunkedDownload(
download_url,
self.chunk_size,
file_obj,
headers=headers,
start=start if start else 0,
end=end,
)
while not download.finished:
download.consume_next_chunk(transport)
Environment details
General, Core, and Other are also allowed as types
Google Storage blob.py
Ubuntu 14.04
python --versionPython 3.6.8
pip show google-<service>orpip freezegoogle-cloud-storage==1.19.0
Steps to reproduce
Per the latest release of google-resumable-media, no decompression of content-encoding gzip is performed and raw bytes are returned.
See https://github.com/googleapis/google-resumable-media-python/releases
blob.download_as_string() formerly returned decompressed bytes, and now returns compressed bytes. We are using .blob instead of .get_blob for an HPC application and thus have no way of knowing what the content encoding is as the information is erased.
We actually LIKE the new functionality as we can now decide when to decompress, but we need to know the content encoding to avoid various kinds of problems that would be introduced by speculative decompression.
Code example
Here is our desired functionality.
Adding this patch to google.cloud.storage.blob.py would solve this problem for us: