🔍 GraphCodeBERT Vulnerability Detector

An AI-powered system that detects logic-flaw vulnerabilities in C/C++ code using CodeBERT, code slicing, and graph-based analysis (AUG-PDG).

🚀 Overview

This project analyzes C/C++ source code to identify potential vulnerabilities without executing it.
It focuses on risky patterns, builds dependency graphs, and uses a fine-tuned transformer model to classify code as:

⚠️ Vulnerable
✅ Safe

⚙️ Key Features

🔹 Code Slicing – Extracts only risk-prone parts of code (pointers, memory ops, loops)
🔹 Graph-Based Analysis – Builds a simplified Program Dependency Graph (PDG)
🔹 Transformer Model – Uses microsoft/codebert-base for classification
🔹 End-to-End Pipeline – Input raw C/C++ → Output prediction + graph
🔹 Visualization – Displays dependency graph using NetworkX

🧠 Methodology

Input Code
Slicing
- Extract risky lines (e.g., malloc, strcpy, pointers, loops)
Graph Construction
- Nodes → code lines
- Edges → control flow + data flow
Model Inference
- Tokenize sliced code
- Pass through CodeBERT
- Predict vulnerability
Output
- Vulnerability label
- Graph visualization

📊 Results

Metric	Score
Accuracy	74%
Precision	0.696
Recall	0.772
F1 Score	0.732

The model prioritizes recall, making it effective for detecting most vulnerabilities.

🛠️ Tech Stack

Python 🐍
PyTorch 🔥
HuggingFace Transformers 🤗
NetworkX 🕸️
Matplotlib 📊
Scikit-learn

📂 Project Structure

GraphCodeBERT-VulnDetector ┣ src/ ┃ ┣ model.py ┃ ┣ graph_extractor.py ┃ ┣ infer.py ┣ notebooks/ ┣ data/ ┣ requirements.txt ┣ README.md

▶️ How to Run

1. Install dependencies

pip install -r requirements.txt


from infer import predict_code

code = """
int main() {
    char buffer[10];
    gets(buffer);
    return 0;
}
"""

result, graph = predict_code(code)
print(result)

import matplotlib.pyplot as plt
import networkx as nx

nx.draw(graph, with_labels=True)
plt.show()

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.vscode		.vscode
__pycache__		__pycache__
README.md		README.md
app.py		app.py
data.csv		data.csv
fahh.mp3		fahh.mp3
graph.py		graph.py
model.pth		model.pth
predict.py		predict.py
py test.py		py test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔍 GraphCodeBERT Vulnerability Detector

🚀 Overview

⚙️ Key Features

🧠 Methodology

📊 Results

🛠️ Tech Stack

📂 Project Structure

▶️ How to Run

1. Install dependencies

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🔍 GraphCodeBERT Vulnerability Detector

🚀 Overview

⚙️ Key Features

🧠 Methodology

📊 Results

🛠️ Tech Stack

📂 Project Structure

▶️ How to Run

1. Install dependencies

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages