Skip to content

toshiro92/EmailParser

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

emailparser

while simple for humans, identifying and removing signature blocks is actually quite a challenging task for machines. by automatically identifying and removing signature blocks, emailparser streamlines large-scale email analysis.

example

here is a sample email.

Wendy – thanks for the intro! Moving you to bcc.
 
Hi Vincent – nice to meet you over email. Apologize for the late reply, I was on PTO for a couple weeks and this is my first week back in office. As Wendy mentioned, I am leading an AR/VR taskforce at Foobar Retail Solutions. The goal of the taskforce is to better understand how AR/VR can apply to retail/commerce and if/what is the role of a shopping center in AR/VR applications for retail.
 
Wendy mentioned that you would be a great person to speak to since you are close to what is going on in this space. Would love to set up some time to chat via phone next week. What does your availability look like on Monday or Wednesday?
 
Best,
Joe Smith
 
Joe Smith | Strategy & Business Development
111 Market St. Suite 111| San Francisco, CA 94103
M: 111.111.1111| joe@foobar.com

after parsing with emailparser:

Wendy – thanks for the intro! Moving you to bcc.
 
Hi Vincent – nice to meet you over email. Apologize for the late reply, I was on PTO for a couple weeks and this is my first week back in office. As Wendy mentioned, I am leading an AR/VR taskforce at Foobar Retail Solutions. The goal of the taskforce is to better understand how AR/VR can apply to retail/commerce and if/what is the role of a shopping center in AR/VR applications for retail.
 
Wendy mentioned that you would be a great person to speak to since you are close to what is going on in this space. Would love to set up some time to chat via phone next week. What does your availability look like on Monday or Wednesday?

getting started

>>> from Parser import read_email, corpus2sentences, generate_text
>>> from spacy.en import English 

>>> pos_tagger = English()  # part-of-speech tagger
>>> msg_raw = read_email('emails/test1.txt')
>>> sentences = corpus2sentences(msg_raw)  # convert corpus into sentences

# iterate through lines, write to file if not signature block
>>> generate_text(sentences, pos, 'emails/test1_clean.txt', threshold=.9)

About

remove signature blocks from emails

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%