Support fuzz_targets property in clusterfuzz_manifest.json#5305
Support fuzz_targets property in clusterfuzz_manifest.json#5305notvictorl wants to merge 1 commit into
Conversation
828343c to
a4fc5d3
Compare
a4fc5d3 to
2647ce6
Compare
| self.list_fuzz_targets() | ||
|
|
||
| if not fuzz_target in self._fuzz_targets: | ||
| if not self._fuzz_targets or fuzz_target not in self._fuzz_targets: |
There was a problem hiding this comment.
I think we should assume and document in DefaultBuildArchive that list_fuzz_targets() always sets self._fuzz_targets to a dict value. Then we can skip this second check for not self._fuzz_targets. This ties in to my comment below about making list_fuzz_targets() a concrete method.
Also, we should use return self._fuzz_targets.get(fuzz_target) to avoid the double lookup.
| # case, default to schema version 0. | ||
| manifest_path = 'clusterfuzz_manifest.json' | ||
| if self.file_exists(manifest_path): | ||
| with self.open(manifest_path) as f: |
There was a problem hiding this comment.
This branch has grown enough, let's invert the branch with an early return for the schema v0 case to reduce nesting.
if not self.file_exists(manifest_path):
self._archive_schema_version = default_archive_schema_version
return
with self.open(manifest_path) as f:
...| self._archive_schema_version = default_archive_schema_version | ||
|
|
||
| fuzz_targets_data = manifest.get('fuzz_targets') | ||
| if isinstance(fuzz_targets_data, list): |
There was a problem hiding this comment.
Again we can reduce nesting with an early return instead in the error case. Also a nit, let's name the variable fuzz_target_paths:
if not isinstance(fuzz_targets_paths, list):
logs.error("not a list")
return|
|
||
|
|
||
| class BuildArchive(archive.ArchiveReader): | ||
| """Abstract class for representing a build archive. This is mostly an |
There was a problem hiding this comment.
Thinking aloud here. If you enjoy refactoring code and feel motivated to take this on, do feel free to, but it's straying off the path a fair bit so please also feel free to just ignore this.
As I review this, I'm coming to the conclusion that this abstraction is a bit wonky.
- It really does not make much sense for it to inherit from
ArchiveReader, at most it could exposeself._readerthrough a getter but from skimming the code that instantiatesBuildArchiveobjects 1, it does not seem we need to reach down into the reader anyway. unpacked_size()andunpack()don't need to be abstract, they are only overridden once in the class hierarchy byDefaultBuildArchive. All that's left apart from that islist_fuzz_targets()andget_target_dependencies().
The design would make more sense to me if we had:
ArchiveReader: interface that abstracts over different file formats (tar vs zip), allows operations on filesFuzzTargetFinder: interface that abstracts over how to find fuzz targets and their dependencies, given an archive readerBuildArchive: concrete class that abstracts over different directory layouts inside the archive, allows operations on the whole build and fuzz targets, using an injectedFuzzTargetFinder.
Something like:
class FuzzTargetFinder(abc.ABC):
@abc.abstractmethod
def find_targets(self, reader: ArchiveReader) -> list[os.PathLike]:
pass
@abc.abstractmethod
def get_target_dependencies(self, reader: ArchiveReader, target: os.PathLike) -> list[os.PathLike]:
pass
class DefaultFuzzTargetFinder(FuzzTargetFinder):
pass
class ChromeFuzzTargetFinder(FuzzTargetFinder):
pass
def make_fuzz_target_finder(archive_reader: ArchiveReader) -> FuzzTargetFinder:
# check for args.gn, return either Finder class
class BuildArchive:
def __init__(self, fuzz_target_finder: FuzzTargetFinder):
self._fuzz_target_finder = fuzz_target_finder
def archive_reader(self) -> ArchiveReader:
return self._fuzz_target_finder.archive_reader| @@ -185,8 +185,10 @@ | |||
| return to_extract | |||
|
|
|||
| def list_fuzz_targets(self) -> List[str]: | |||
There was a problem hiding this comment.
We're using inheritance here and below to override the initialization behavior of self._fuzz_targets, which works but is a bit messy and duplicates a bit of logic. See my topmost comment for a completely optional suggestion on a refactor.
A simpler way to make things a bit better here without reworking the class hierarchy entirely is maybe to define an abstract pure method that just handles finding fuzz targets, and share the implementation of list_fuzz_targets() in BuildArchive above.
class BuildArchive:
def _init__(self, reader):
self._reader = reader
self._fuzz_targets = None
def list_fuzz_targets():
if self._fuzz_targets is None:
self._fuzz_targets = {
fuzzer_utils.normalize_target_name(path): path
for path in self.find_fuzz_targets()
}
return list(self._fuzz_targets.keys())
@abc.abstractmethod
def find_fuzz_targets(self) -> list[os.PathLike]:
pass
class DefaultBuildArchive(BuildArchive):
def find_fuzz_targets(self):
# use `is_fuzz_target()` to filter files in the archive
class ChromeBuildArchive(DefaultBuildArchive):
def __init__(self, reader):
self._fuzz_target_paths = # from the manifest
def find_fuzz_targets(self):
if self._fuzz_target_paths:
return self._fuzz_target_paths
return super().find_fuzz_targets()| test_archive = build_archive.ChromeBuildArchive(self.mock.open.return_value) | ||
| self.assertEqual(test_archive.archive_schema_version(), 1) | ||
|
|
||
| def _generate_manifest_with_fuzz_targets(self, archive_schema_version, |
There was a problem hiding this comment.
Let's collapse all these _generate_manifest() functions into a single one that expects a dict. The format is simple enough that we can repeat it in most tests:
def _generate_manifest(self, contents: any):
json_contents = json.dumps(contents).encode()
def _mock_open(_):
buffer = io.BytesIO(b'')
buffer.write(json_contents)
buffer.seek(0)
return buffer
self.mock.open.return_value.open.side_effect = _mock_open
# and call it like
def test_foo(self):
self._generate_manifest({ 'archive_schema_version': 1 })| test_archive.list_fuzz_targets() | ||
| self.mock.open.return_value.list_members.assert_called_once() | ||
|
|
||
| def test_manifest_fuzz_targets_missing(self): |
There was a problem hiding this comment.
We should have a test for what happens when only some of the entries are invalid.
| self.mock.file_exists.return_value = True | ||
| self._generate_manifest_with_fuzz_targets( | ||
| 1, ['out/build/my_fuzzer', 'out/build/other_fuzzer']) | ||
| test_archive = build_archive.ChromeBuildArchive(self.mock.open.return_value) |
There was a problem hiding this comment.
[Optional] self.mock.open.return_value is mentioned many times, and its name is quite opaque. It may be worth defining either a local variable mock_archive_reader or even a test fixture class member self.mock_archive_reader.
| self.mock.unzip_over_http_compatible.return_value = True | ||
| self.mock.time.return_value = 1000.0 | ||
| build = build_manager.setup_regular_build(2) | ||
| fuzz_target = build_manager.pick_random_fuzz_target( |
There was a problem hiding this comment.
This test may have been exercising the code path for blackbox fuzzing, and no longer is. Is there another test that checks what happens when fuzz_target is None? If not, we should duplicate the old test to conserve coverage.
| self.mock.unzip_over_http_compatible.return_value = True | ||
| self.mock.time.return_value = 1000.0 | ||
| build = build_manager.setup_regular_build(2) | ||
| fuzz_target = build_manager.pick_random_fuzz_target( |
Adds support for
fuzz_targetsproperty inclusterfuzz_manifest.json. This allows coverage-guided fuzzing jobs withChromeBuildArchives to skip the unpacking and fuzz target discovery phases.Problem:
Currently, ClusterFuzz wastes time unpacking complete archives to then traverse the directory to discover fuzz targets. Furthermore, on certain large build types, this unpacking process frequently triggers out-of-disk-space errors due to the massive size of the zip files.
Solution:
By introducing
fuzz_targets— a list of fuzz target paths relative toclusterfuzz_manifest.json— we can bypass the discovery step entirely. ClusterFuzz will now map this list so that each fuzz target has a normalized name (key) and its corresponding path (value).More Info: crbug.com/508214240