Skip to content

[Bug] Potential Billion Laughs Attack Vector via Unrestricted XML Parsing in ZeepSchemaHelper #558

@ShangzhiXu

Description

@ShangzhiXu

Hello Google Ads API Team,

Firstly, thanks so much for your great work!
While using and reviewing the googleads-python-lib, I came across a potential XML parsing issue in the ZeepSchemaHelper class that I'd like to raise for discussion.

I understand that the library is designed to work with trusted WSDL endpoints provided by Google, and this issue is unlikely to be exploitable under normal use. However, for defense-in-depth and potential future-proofing, I wanted to share the finding.

# Affected Source Code: `googleads/common.py`

class ZeepSchemaHelper(GoogleSchemaHelper):
  def __init__(self, endpoint, timeout, proxy_config, namespace_override, cache):
    ...
    transport = _ZeepProxyTransport(timeout, proxy_config, cache)
    
    try:
      data = transport.load(endpoint)  #   [Untrusted Input Source: XML from user-supplied endpoint]
    except requests.exceptions.HTTPError as e:
      raise googleads.errors.GoogleAdsSoapTransportError(str(e))

    self.schema = zeep.xsd.Schema(
        lxml.etree.fromstring(data)    #  [VULNERABILITY SINK: unsafe XML parsing]
    )

This type of attack leverages recursive entity declarations in XML to cause exponential memory usage like the Billion Laughs attack.

We can set

parser = lxml.etree.XMLParser(
    resolve_entities=False,
    load_dtd=False,
    no_network=True
)

to solve this

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions