Skip to content

Commit 1f2f551

Browse files
authored
fix: naming for fuzz target discovery (#24)
The naming method for the insertion file has changed from print1 to .bak.py, modified the find target method, and the way to find the function names for inserting print statements. * Upload the python project Fuzz test script valid_projects.txt: Python project list script_fuzz_py_final.sh: Single project test script script_fuzz_py_batch_final.sh: Batch projects test script * feat: Add OSS-Fuzz submodule tracking main branch * chore: Switch oss-fuzz submodule to personal fork * Switch oss-fuzz submodule to personal fork * move the valid_project file * move the .py file * create build_oss_fuzz.py * create run_fuzz_target.py * split the pool.py into build_oss_fuzz and run_fuzz_target * delete the .sh files * translate to english * fuzz_runner_pool.py:74 * edit stdout * 添加空值检查 * modify stdout, delete pool.py * indentation level check * Remove build log write files * Remove build log write files * use logging mdule * use precise logging * use logging * use precise exception log info * correct type problems * correct some mistakes * correct some mistakes * correct * modify discover fuzz target * modify the oss-fuzz dir * Redirect the output to an empty device without retaining any output * add always yes * split the build script * split the build script * build scripts test successfully * build.py * collect targets first and then run * list, tuple, ptional * list,tuple,optional * translate * build_fuzz.py, run_fuzz_all_target.py * correct * original * record input * Fatal error in main program: cannot unpack non-iterable NoneType object * name 'target_functions' is not defined fuzz_util_instrumented.py does not seem to exist * 准备大改 * create modify file script add"print(data)" to each fuzz_.py * build_fuzzer script * modify tuple dict list * remove stdout stderr in build fuzz * test successfully * rename run fuzz ds to run fuzz print1 * add print(data ) to fuzz target and rename the file with "_print1" * oss -fuzz change * rename the print1.py * modify the exegesis * modify * modify log name * type error * list dict tuple * type error * construct errors module * run_command module * combine the run_command instrument to one file * remove the run_command * modify * mytype check * mytype * mytype * mytype * translate * remove run command * timeout - shell instrument * correct in out error and return Popen directly * ready to change from rust script * 修改build_image * y/n * correct repo_id and repo_name in main * test build_image 构建日志 * add build_fuzzer * fuzz and testgen * correct run_one_target * fuzz ok * transform * testgen need to ^ help: add `;` here * test successful * example output project * type error * English ver * delete privious scripts * python template * python template * correct the template * ver2 wrong template * ok * testgen file change into copy the original and then add input_data =b"" * only read b' ' inputs * remove transform * clean the inputs and testgen * set max_file * max input file * input b"" * modify the method of writing files into PIPE * use max total time; remove size monitor * 修改并行错误, 写入方法还是直接写入文件 延时控制为max total time * 补充日志输出 * 模板生成成功 * testgen完成 * 删除冗余, 修改代码 * 更换为未删除冗余版本 * template插入data=b"" 函数header改为test_() * translation * A complete script for building the processes of build_image, build_fuzzer, fuzz, transform, and testgen, suitable for Python projects. * delete some imports * use ASTfor transform and testgen * use AST * Set up command line arguments * use fire * use FIre * black formatter * deal the data after closing the file * when doing line-matching, check for # This is a test template in the line * when doing line-matching, check for # This is a test template in the line * delete UnicodeDecodeError * apply transformations on the original unmodified fuzz targets. * put all AST related class/module/function in another file and import from there. * put all AST related class/module/function in another file and import from there. * translation * use relative address * use relative address * remove the class outside of the function * add tuple's type * Properly handle indentation and process data after the file is closed. * correct tne relative path * add black to requirements.txt * 修改添加print()的命名方式, 函数匹配选择识别atheris.Setup()的第二个参数,目前还不准确 * 此脚本会读取 data/valid_projects.txt中的项目列表 对于每个项目,在 fuzz/oss-fuzz/projects目录中查找对应项目文件夹 删除所有 _print1.py和 .bak.py结尾的文件 处理剩余的 .py文件: 创建备份文件(.bak.py) 查找 atheris.Setup()语句并提取第二个参数(函数签名) 在该函数中添加 print(data)语句 保持原文件名不变 * correct the source file name as .bak.py * 寻找fuzz target 排除了一些常见的工具列表 * 添加了符合条件的python项目名单(有Dockerfile, .yaml language = python, build.sh, 有.py 文件) * black format
1 parent dc8785c commit 1f2f551

File tree

4 files changed

+163
-63
lines changed

4 files changed

+163
-63
lines changed

data/valid_projects.txt

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
abseil-py
22
adal
33
aiohttp
4+
airflow
45
aniso8601
56
ansible
67
argcomplete
@@ -15,10 +16,13 @@ autopep8
1516
azure-sdk-for-python
1617
babel
1718
black
19+
bleach
1820
botocore
1921
bottleneck
22+
bs4
2023
bz2file
2124
cachetools
25+
cbor2
2226
cffi
2327
chardet
2428
charset_normalizer
@@ -70,6 +74,7 @@ g-cloud-logging-py
7074
gcp-python-cloud-storage
7175
genshi
7276
gitdb
77+
github_scarecrow
7378
glom
7479
gprof2dot
7580
g-py-bigquery
@@ -98,8 +103,8 @@ jinja2
98103
jmespathpy
99104
joblib
100105
jsmin
101-
jupyter-nbconvert
102106
jupyter_server
107+
jupyter-nbconvert
103108
kafka
104109
keras
105110
kiwisolver
@@ -123,6 +128,7 @@ nbclassic
123128
nbformat
124129
netaddr-py
125130
networkx
131+
nfstream
126132
ntlm2
127133
ntlm-auth
128134
numexpr
@@ -142,6 +148,7 @@ pandas
142148
paramiko
143149
parse
144150
parsimonious
151+
parso
145152
pasta
146153
pathlib2
147154
pdoc
@@ -200,6 +207,7 @@ retry
200207
rfc3967
201208
rich
202209
sacremoses
210+
scapy
203211
scikit-learn
204212
scipy
205213
setuptools
@@ -208,6 +216,7 @@ simplejson
208216
six
209217
smart_open
210218
soupsieve
219+
sqlalchemy
211220
sqlalchemy_jsonfield
212221
sqlalchemy-utils
213222
sqlparse
@@ -220,6 +229,7 @@ toolbelt
220229
toolz
221230
tqdm
222231
typing_extensions
232+
ujson
223233
underscore
224234
uritemplate
225235
urlextract
@@ -230,5 +240,6 @@ websocket-client
230240
wheel
231241
wtforms
232242
xlrd
243+
xmltodict
233244
yarl
234245
zipp

fuzz/ast_utils.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -146,8 +146,8 @@ def generate_test_template(target_name: str, repo_path: str):
146146
"""
147147
src_file = os.path.join(repo_path, target_name)
148148
logging.info(f"Generating test template for {src_file}")
149-
if not src_file.endswith(".py"):
150-
src_file += ".py"
149+
if not src_file.endswith(".bak.py"):
150+
src_file += ".bak.py"
151151
if not os.path.exists(src_file):
152152
logging.error(f"Source target file not found: {src_file}")
153153
return None

fuzz/collect_fuzz_python.py

Lines changed: 27 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -102,14 +102,38 @@ def discover_targets(project_name: str, oss_fuzz_dir: Path) -> list[str]:
102102
logging.warning(f"Build output directory for {project_name} does not exist")
103103
return targets
104104

105+
# 常见非 fuzz target 的工具列表
106+
non_target_tools = {
107+
"llvm-symbolizer",
108+
"asan_symbolize",
109+
"msan_symbolize",
110+
"tsan_symbolize",
111+
"ubsan_symbolize",
112+
"clang",
113+
"clang++",
114+
"llvm-ar",
115+
"llvm-nm",
116+
"llvm-objcopy",
117+
"llvm-objdump",
118+
"llvm-ranlib",
119+
"llvm-readelf",
120+
"llvm-readobj",
121+
"llvm-size",
122+
"llvm-strings",
123+
"llvm-strip",
124+
"ld",
125+
"ld.lld",
126+
"lld",
127+
"lld-link",
128+
}
129+
105130
try:
106131
for f in out_dir.iterdir():
107132
if (
108133
f.is_file()
109-
and f.name.startswith("fuzz_")
110134
and "." not in f.name
111-
and f.name.endswith("print1")
112135
and os.access(f, os.X_OK)
136+
and f.name not in non_target_tools # 排除已知的工具
113137
):
114138
targets.append(f.name)
115139
logging.info(
@@ -222,7 +246,7 @@ def _transform_repo(repo: str):
222246

223247
def substitute_one_repo(
224248
repo: str,
225-
targets: list[tuple[str,str]], # Each element is (transformed_target, raw_target)
249+
targets: list[tuple[str, str]], # Each element is (transformed_target, raw_target)
226250
n_fuzz: int,
227251
strategy: str,
228252
max_len: int,

fuzz/modify_fuzz_files.py

Lines changed: 122 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -1,72 +1,137 @@
11
#!/usr/bin/env python3
22
import os
3+
import re
4+
import shutil
35
import ast
4-
import fire
6+
import astunparse
57

68

7-
class InsertPrintTransformer(ast.NodeTransformer):
8-
def visit_FunctionDef(self, node):
9-
if node.name in ("TestOneInput", "TestInput") and node.args.args:
10-
first_arg_name = node.args.args[0].arg
11-
print_stmt = ast.Expr(
12-
value=ast.Call(
13-
func=ast.Name(id='print', ctx=ast.Load()),
14-
args=[ast.Name(id=first_arg_name, ctx=ast.Load())],
15-
keywords=[]
16-
)
17-
)
18-
# 添加空body检查
19-
if not node.body:
20-
node.body.append(print_stmt)
21-
else:
22-
# 增强重复检查逻辑
23-
first_stmt = node.body[0]
24-
if not (isinstance(first_stmt, ast.Expr)
25-
and isinstance(first_stmt.value, ast.Call)
26-
and hasattr(first_stmt.value.func, 'id')
27-
and first_stmt.value.func.id == 'print'):
28-
node.body.insert(0, print_stmt)
29-
return node
30-
31-
def add_print_to_testoneinput(file_path):
32-
with open(file_path, 'r') as f:
33-
content = f.read()
34-
35-
tree = ast.parse(content)
36-
transformer = InsertPrintTransformer()
37-
new_tree = transformer.visit(tree)
38-
ast.fix_missing_locations(new_tree)
39-
40-
import astor
41-
new_content = astor.to_source(new_tree)
42-
return new_content
9+
def process_projects():
10+
# Read valid projects from data/valid_projects.txt
11+
with open("data/valid_projects.txt", "r") as f:
12+
projects = [line.strip() for line in f.readlines() if line.strip()]
4313

44-
def main(
45-
projects_path="fuzz/oss-fuzz/projects",
46-
valid_projects_file="data/valid_projects.txt"
47-
):
48-
"""为fuzz target添加打印语句"""
49-
with open(valid_projects_file, 'r') as f:
50-
projects = [line.strip() for line in f if line.strip()]
14+
# Base directory containing projects
15+
base_dir = "fuzz/oss-fuzz/projects"
5116

5217
for project in projects:
53-
project_dir = os.path.join(projects_path, project)
54-
if not os.path.isdir(project_dir):
18+
project_dir = os.path.join(base_dir, project)
19+
20+
if not os.path.exists(project_dir):
21+
print(f"Project directory not found: {project_dir}")
5522
continue
5623

57-
for root, _, files in os.walk(project_dir):
24+
print(f"Processing project: {project}")
25+
26+
# Remove _print1.py and .bak files
27+
for root, dirs, files in os.walk(project_dir):
5828
for file in files:
59-
if file.startswith('fuzz_') and file.endswith('.py'):
29+
if file.endswith("_print1.py") or file.endswith(".bak.py"):
6030
file_path = os.path.join(root, file)
61-
try:
62-
new_content = add_print_to_testoneinput(file_path)
63-
new_file_path = file_path.rsplit('.', 1)[0] + '_print1.py'
64-
with open(new_file_path, 'w') as f:
65-
f.write(new_content)
66-
print(f"Processed: {file_path} -> {new_file_path}")
31+
os.remove(file_path)
32+
print(f"Removed: {file_path}")
33+
34+
# Find all remaining .py files
35+
py_files = []
36+
for root, dirs, files in os.walk(project_dir):
37+
for file in files:
38+
if file.endswith(".py"):
39+
py_files.append(os.path.join(root, file))
40+
41+
# Process each .py file
42+
for py_file in py_files:
43+
process_py_file(py_file)
44+
45+
46+
class FunctionVisitor(ast.NodeVisitor):
47+
def __init__(self, target_func):
48+
self.target_func = target_func
49+
self.found_node = None
50+
self.first_param = None
51+
52+
def visit_FunctionDef(self, node):
53+
if node.name == self.target_func:
54+
self.found_node = node
55+
if node.args.args:
56+
self.first_param = node.args.args[0].arg
57+
self.generic_visit(node)
58+
59+
60+
def process_py_file(file_path):
61+
print(f"Processing file: {file_path}")
62+
63+
# Create backup with .bak.py suffix
64+
base_name = os.path.splitext(file_path)[0] # Remove .py extension
65+
backup_path = base_name + ".bak.py"
66+
shutil.copy2(file_path, backup_path)
67+
print(f"Created backup: {backup_path}")
68+
69+
# Read file content
70+
with open(file_path, "r") as f:
71+
content = f.read()
72+
73+
# Find atheris.Setup() call and extract function signature (only the second parameter)
74+
# 改进的正则表达式,只匹配第二个参数
75+
setup_pattern = r"atheris\.Setup\([^,]*,\s*([^,)]*)"
76+
match = re.search(setup_pattern, content)
77+
78+
if not match:
79+
print(f"No atheris.Setup() found in {file_path}")
80+
return
81+
82+
function_signature = match.group(1).strip()
83+
print(f"Found function signature: {function_signature}")
84+
85+
# Parse AST to find target function
86+
try:
87+
tree = ast.parse(content)
88+
except SyntaxError as e:
89+
print(f"Syntax error in {file_path}: {e}")
90+
return
91+
92+
visitor = FunctionVisitor(function_signature)
93+
visitor.visit(tree)
94+
95+
if not visitor.found_node:
96+
print(f"Function {function_signature} not found in {file_path}")
97+
return
98+
99+
if not visitor.first_param:
100+
print(f"No parameters found in function {function_signature}")
101+
return
102+
103+
# Create print statement node
104+
print_stmt = ast.Expr(
105+
value=ast.Call(
106+
func=ast.Name(id="print", ctx=ast.Load()),
107+
args=[ast.Name(id=visitor.first_param, ctx=ast.Load())],
108+
keywords=[],
109+
)
110+
)
111+
112+
# Insert print statement at the beginning of function body
113+
if visitor.found_node.body:
114+
# Preserve docstring if present
115+
first_item = visitor.found_node.body[0]
116+
if isinstance(first_item, ast.Expr) and isinstance(first_item.value, ast.Str):
117+
# Insert after docstring
118+
visitor.found_node.body.insert(1, print_stmt)
119+
else:
120+
# Insert at the very beginning
121+
visitor.found_node.body.insert(0, print_stmt)
122+
else:
123+
visitor.found_node.body = [print_stmt]
124+
125+
# Generate modified source code
126+
modified_content = astunparse.unparse(tree)
127+
128+
# Write modified content back to file
129+
with open(file_path, "w") as f:
130+
f.write(modified_content)
131+
132+
print(f"Added print({visitor.first_param}) to function {function_signature}")
67133

68-
except Exception as e:
69-
print(f"Error processing {file_path}: {str(e)}")
70134

71135
if __name__ == "__main__":
72-
fire.Fire(main)
136+
process_projects()
137+

0 commit comments

Comments
 (0)