Skip to content

Support fuzz_targets property in clusterfuzz_manifest.json#5305

Open
notvictorl wants to merge 1 commit into
google:masterfrom
notvictorl:liuvic/fuzz_targets
Open

Support fuzz_targets property in clusterfuzz_manifest.json#5305
notvictorl wants to merge 1 commit into
google:masterfrom
notvictorl:liuvic/fuzz_targets

Conversation

@notvictorl
Copy link
Copy Markdown
Collaborator

@notvictorl notvictorl commented Jun 1, 2026

Adds support for fuzz_targets property in clusterfuzz_manifest.json. This allows coverage-guided fuzzing jobs with ChromeBuildArchives to skip the unpacking and fuzz target discovery phases.

Problem:
Currently, ClusterFuzz wastes time unpacking complete archives to then traverse the directory to discover fuzz targets. Furthermore, on certain large build types, this unpacking process frequently triggers out-of-disk-space errors due to the massive size of the zip files.

Solution:
By introducing fuzz_targets — a list of fuzz target paths relative to clusterfuzz_manifest.json — we can bypass the discovery step entirely. ClusterFuzz will now map this list so that each fuzz target has a normalized name (key) and its corresponding path (value).

More Info: crbug.com/508214240

@notvictorl notvictorl requested a review from a team as a code owner June 1, 2026 17:19
@notvictorl notvictorl force-pushed the liuvic/fuzz_targets branch 4 times, most recently from 828343c to a4fc5d3 Compare June 1, 2026 18:47
@notvictorl notvictorl force-pushed the liuvic/fuzz_targets branch from a4fc5d3 to 2647ce6 Compare June 1, 2026 19:39
@notvictorl notvictorl requested a review from letitz June 1, 2026 20:14
self.list_fuzz_targets()

if not fuzz_target in self._fuzz_targets:
if not self._fuzz_targets or fuzz_target not in self._fuzz_targets:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should assume and document in DefaultBuildArchive that list_fuzz_targets() always sets self._fuzz_targets to a dict value. Then we can skip this second check for not self._fuzz_targets. This ties in to my comment below about making list_fuzz_targets() a concrete method.

Also, we should use return self._fuzz_targets.get(fuzz_target) to avoid the double lookup.

# case, default to schema version 0.
manifest_path = 'clusterfuzz_manifest.json'
if self.file_exists(manifest_path):
with self.open(manifest_path) as f:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch has grown enough, let's invert the branch with an early return for the schema v0 case to reduce nesting.

  if not self.file_exists(manifest_path):
    self._archive_schema_version = default_archive_schema_version
    return

  with self.open(manifest_path) as f:
    ...

self._archive_schema_version = default_archive_schema_version

fuzz_targets_data = manifest.get('fuzz_targets')
if isinstance(fuzz_targets_data, list):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again we can reduce nesting with an early return instead in the error case. Also a nit, let's name the variable fuzz_target_paths:

if not isinstance(fuzz_targets_paths, list):
  logs.error("not a list")
  return



class BuildArchive(archive.ArchiveReader):
"""Abstract class for representing a build archive. This is mostly an
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking aloud here. If you enjoy refactoring code and feel motivated to take this on, do feel free to, but it's straying off the path a fair bit so please also feel free to just ignore this.

As I review this, I'm coming to the conclusion that this abstraction is a bit wonky.

  1. It really does not make much sense for it to inherit from ArchiveReader, at most it could expose self._reader through a getter but from skimming the code that instantiates BuildArchive objects 1, it does not seem we need to reach down into the reader anyway.
  2. unpacked_size() and unpack() don't need to be abstract, they are only overridden once in the class hierarchy by DefaultBuildArchive. All that's left apart from that is list_fuzz_targets() and get_target_dependencies().

The design would make more sense to me if we had:

  • ArchiveReader: interface that abstracts over different file formats (tar vs zip), allows operations on files
  • FuzzTargetFinder: interface that abstracts over how to find fuzz targets and their dependencies, given an archive reader
  • BuildArchive: concrete class that abstracts over different directory layouts inside the archive, allows operations on the whole build and fuzz targets, using an injected FuzzTargetFinder.

Something like:

class FuzzTargetFinder(abc.ABC):
  @abc.abstractmethod
  def find_targets(self, reader: ArchiveReader) -> list[os.PathLike]:
    pass

  @abc.abstractmethod
  def get_target_dependencies(self, reader: ArchiveReader, target: os.PathLike) -> list[os.PathLike]:
    pass

class DefaultFuzzTargetFinder(FuzzTargetFinder):
  pass

class ChromeFuzzTargetFinder(FuzzTargetFinder):
  pass

def make_fuzz_target_finder(archive_reader: ArchiveReader) -> FuzzTargetFinder:
  # check for args.gn, return either Finder class

class BuildArchive:
  def __init__(self, fuzz_target_finder: FuzzTargetFinder):
    self._fuzz_target_finder = fuzz_target_finder

  def archive_reader(self) -> ArchiveReader:
    return self._fuzz_target_finder.archive_reader

@@ -185,8 +185,10 @@
return to_extract

def list_fuzz_targets(self) -> List[str]:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're using inheritance here and below to override the initialization behavior of self._fuzz_targets, which works but is a bit messy and duplicates a bit of logic. See my topmost comment for a completely optional suggestion on a refactor.

A simpler way to make things a bit better here without reworking the class hierarchy entirely is maybe to define an abstract pure method that just handles finding fuzz targets, and share the implementation of list_fuzz_targets() in BuildArchive above.

class BuildArchive:
  def _init__(self, reader):
    self._reader = reader
    self._fuzz_targets = None

  def list_fuzz_targets():
    if self._fuzz_targets is None:
      self._fuzz_targets = {
        fuzzer_utils.normalize_target_name(path): path
        for path in self.find_fuzz_targets()
      }

    return list(self._fuzz_targets.keys())

  @abc.abstractmethod
  def find_fuzz_targets(self) -> list[os.PathLike]:
    pass

class DefaultBuildArchive(BuildArchive):
  def find_fuzz_targets(self):
    # use `is_fuzz_target()` to filter files in the archive

class ChromeBuildArchive(DefaultBuildArchive):
  def __init__(self, reader):
    self._fuzz_target_paths = # from the manifest

  def find_fuzz_targets(self):
    if self._fuzz_target_paths:
      return self._fuzz_target_paths

    return super().find_fuzz_targets()

test_archive = build_archive.ChromeBuildArchive(self.mock.open.return_value)
self.assertEqual(test_archive.archive_schema_version(), 1)

def _generate_manifest_with_fuzz_targets(self, archive_schema_version,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's collapse all these _generate_manifest() functions into a single one that expects a dict. The format is simple enough that we can repeat it in most tests:

def _generate_manifest(self, contents: any):
     json_contents = json.dumps(contents).encode()
     def _mock_open(_):
      buffer = io.BytesIO(b'')
      buffer.write(json_contents)
      buffer.seek(0)
      return buffer

    self.mock.open.return_value.open.side_effect = _mock_open

# and call it like
def test_foo(self):
  self._generate_manifest({ 'archive_schema_version': 1 })

test_archive.list_fuzz_targets()
self.mock.open.return_value.list_members.assert_called_once()

def test_manifest_fuzz_targets_missing(self):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have a test for what happens when only some of the entries are invalid.

self.mock.file_exists.return_value = True
self._generate_manifest_with_fuzz_targets(
1, ['out/build/my_fuzzer', 'out/build/other_fuzzer'])
test_archive = build_archive.ChromeBuildArchive(self.mock.open.return_value)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Optional] self.mock.open.return_value is mentioned many times, and its name is quite opaque. It may be worth defining either a local variable mock_archive_reader or even a test fixture class member self.mock_archive_reader.

self.mock.unzip_over_http_compatible.return_value = True
self.mock.time.return_value = 1000.0
build = build_manager.setup_regular_build(2)
fuzz_target = build_manager.pick_random_fuzz_target(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test may have been exercising the code path for blackbox fuzzing, and no longer is. Is there another test that checks what happens when fuzz_target is None? If not, we should duplicate the old test to conserve coverage.

self.mock.unzip_over_http_compatible.return_value = True
self.mock.time.return_value = 1000.0
build = build_manager.setup_regular_build(2)
fuzz_target = build_manager.pick_random_fuzz_target(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants