Welcome to the Advanced PII Extractor! This Python script scans various file formats for Personally Identifiable Information (PII) such as email addresses, phone numbers, Social Security Numbers (SSNs), credit card numbers, and IP addresses.
Syed Mansoor ul Hassan Bukhari
GitHub Profile
LinkedIn
The advanced_pii_extractor.py script parses and scans the following file types for PII:
.docxfiles.txt,.doc,.csv,.log, and.htmlfiles.xlsxfiles.pdffiles
- Identifies and extracts email addresses, phone numbers, SSNs, credit card numbers, and IP addresses.
- Saves matches to separate files (
email_matches.txt,phone_matches.txt, etc.). - Supports parsing from ZIP archives for
.docxfiles and text extraction from PDFs.
- Ensure all dependencies are installed:
pip install openpyxl PyPDF2
- Download And Run:
git clone https://github.com/cyberfantics/pii_extractor.git cd pii_extractor.git python pii_extractor.py
Matches are saved in respective files:
1. email_matches.txt
2. phone_matches.txt
3. ssn_matches.txt
4. credit_card_matches.txt
5. ip_matches.txtEach file contains the file path and matched PII entries.
This project is licensed under the MIT License.