This project aims to study the privacy-preserving properties of LLMs with a real-world application in mind. The idea is to see if a model deployed in a setting where it has access to PII data and is asked to perform a task(summarization, drawing insights etc) without leaking PII data(by masking) can still potentially leak data by looking at internal activations and top k tokens generated.
-
Notifications
You must be signed in to change notification settings - Fork 0
ba11b0y/candi
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
About
Understanding Privacy Preserving Knowledge in models via Mechanistic Interpretability
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published