An AI-powered system that detects logic-flaw vulnerabilities in C/C++ code using CodeBERT, code slicing, and graph-based analysis (AUG-PDG).
This project analyzes C/C++ source code to identify potential vulnerabilities without executing it.
It focuses on risky patterns, builds dependency graphs, and uses a fine-tuned transformer model to classify code as:
⚠️ Vulnerable- ✅ Safe
- 🔹 Code Slicing – Extracts only risk-prone parts of code (pointers, memory ops, loops)
- 🔹 Graph-Based Analysis – Builds a simplified Program Dependency Graph (PDG)
- 🔹 Transformer Model – Uses
microsoft/codebert-basefor classification - 🔹 End-to-End Pipeline – Input raw C/C++ → Output prediction + graph
- 🔹 Visualization – Displays dependency graph using NetworkX
- Input Code
- Slicing
- Extract risky lines (e.g.,
malloc,strcpy, pointers, loops)
- Extract risky lines (e.g.,
- Graph Construction
- Nodes → code lines
- Edges → control flow + data flow
- Model Inference
- Tokenize sliced code
- Pass through CodeBERT
- Predict vulnerability
- Output
- Vulnerability label
- Graph visualization
| Metric | Score |
|---|---|
| Accuracy | 74% |
| Precision | 0.696 |
| Recall | 0.772 |
| F1 Score | 0.732 |
The model prioritizes recall, making it effective for detecting most vulnerabilities.
- Python 🐍
- PyTorch 🔥
- HuggingFace Transformers 🤗
- NetworkX 🕸️
- Matplotlib 📊
- Scikit-learn
GraphCodeBERT-VulnDetector ┣ src/ ┃ ┣ model.py ┃ ┣ graph_extractor.py ┃ ┣ infer.py ┣ notebooks/ ┣ data/ ┣ requirements.txt ┣ README.md
pip install -r requirements.txt
from infer import predict_code
code = """
int main() {
char buffer[10];
gets(buffer);
return 0;
}
"""
result, graph = predict_code(code)
print(result)
import matplotlib.pyplot as plt
import networkx as nx
nx.draw(graph, with_labels=True)
plt.show()