资料内容:
Building a spam classifier
Supervised learning. features of email. spam (1) or not spam (0).
Features : Choose 100 words indica3ve of spam/not spam
Building a spam classifier
How to spend your 3me to make it have low error?
-‐ Collect lots of data
-‐ E.g. “honeypot” project.
-‐ Develop sophis3cated features based on email rou3ng
informa3on (from email header).
-‐ Develop sophis3cated features for message body, e.g. should
“discount” and “discounts” be treated as the same word? How
about “deal” and “Dealer”? Features about punctua3on?
-‐ Develop sophis3cated algorithm to detect misspellings (e.g.
m0rtgage, med1cine, w4tches.)