Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Spam Junk Mail Filter
Post: #1

Spam Junk Mail Filter
Post: #2
The filter implemented is used to block spam also called unsolicited email. It uses statistical approach called Bayesian filtering to block the spam. First of all the program has to be trained using a set of spam and non-spam mails. These are put in a database. The performance increases with the number of training it gets. When a new mail comes it is tokenised and probability of each word is found by looking into the database. The total probability is found out and if it is greater than 0.9 it is marked as spam. With good training it can block 99% of the spam mails with 0 false positives.

Presented By:
Binu Ashiq Y1066
National Institute of Technology, Calicut Department of Computer Engineering

1 Introduction
Spam is a growing problem for email users, and many solutions have been proposed, from a postage fee for email to Turing tests to simply not accepting email from people you don't know. Spam filtering is one way to reduce the impact of the problem on the individual user (though it does nothing to reduce the effect of the network traffic generated by spam). In its simplest form, a spam filter is a mechanism for classifying a message as either spam or not spam.
There are many techniques for classifying a message. It can be examined for "spam-markers" such as common spam subjects, known spammer addresses, known mail forwarding machines, or simply common spam phrases. The header and/or the body can be examined for these markers. Another method is to classify all messages not from known addresses as spam. Another is to compare with messages that others have received, and find common spam messages. And another technique, probably the most popular at the moment, is to apply machine learning techniques in an email classifier.
2 Design
The filter uses a method called bayesian filtering. The project is implemented in C language and uses a linux platform for its working. A database called SQLITE is used to store the training data.
2.1 Bayesian Filtering
In a nutshell, the approach is to tokenize a large corpus of spam and a large corpus of non-spam. Certain tokens will be common in spam messages and uncommon in non-spam messages, and certain other tokens will be common in non-spam messages and uncommon in spam messages. When a message is to be classified, we tokenize it and see whether the tokens are more like those of a spam message or those of a non-spam message. How we determine this similarity is what the math is all about. It isn't complicated, but it has a number of variations.
2.2 Theory of Operation
Probabilities in this algorithm are calculated using a degenerate case of Bayes' Rule. There are two simplifying assumptions: that the probabilities of features (i.e. words) are independent, and that we know nothing about the prior probability of an email being spam.
The first assumption is widespread in text classification. Algorithms that use it are called "naive Bayesian.'
If spammers get good enough at obscuring tokens for this to be a problem, we can respond by simply removing whitespace, periods, commas, etc. and using a dictionary to pick the words out of the resulting sequence. And of course finding words this way that weren't visible in the original text would in itself be evidence of spam.
Picking out the words won't be trivial. It will require more than just reconstructing word boundaries; spammers both add ("xHot nPorn cSite") and omit ("Prn") letters. Vision research may be useful here, since human vision is the limit that such tricks will approach.
3 Implementation
The user first trains the filter. The training data is stored in the database. Initially, the database is empty.
On spam detection, the user can choose to move spam to a Spam table in the database by using -g option. Initially for training, non-spam message are moved to a Ham table in the database by using -b option. Finally we get to a stage with one corpus of spam and one of non-spam mail.
To train the database do: ./a.out dbase.db -g *good.msg -b *bad.msg To classify do: ./a.out dbase.db message.msg
4 Conclusion
Once you have enough spam messages and non-spam messages correctly classified, you can think about using a Bayesian filter. You really want a few hundred of each type, preferably more. You also want to make sure there isn't an unintended identifying feature of the spam messages or non-spam messages. For example, don't use non-spam messages from the past 6 months and only the last month of spam messages; the learning algorithm might decide that messages with old dates are non-spam messages and messages with new dates are spam messages. Don't try to pad the numbers with duplicates; it will overtrain the filter on the features in those messages.
5 References
[1] Paul Graham. "A Plan for Spam." August 2002.
[2] Steven Hauser. "Statistical Spam Filter Works for Me."
[3] Mehran Sahami, Susan Dumais, David Heckerman and Eric Horvitz. "A Bayesian Ap¬proach to Filtering Junk E-Mail." Proceedings of AAAI-98 Workshop on Learning for Text Categorization.
Post: #3
to get information about the topic " e mail filtering to cut out spam" FULL REPORT PPT AND RELATED TOPIC refer the link bellow
Post: #4
please give the uml diagrams for an email filter to cut out spam
Post: #5
to get information about the topic "Spam Junk Mail Filter" full report ppt related topic refer the link bellow

Important Note..!

If you are not satisfied with above reply ,..Please


So that we will collect data for you and will made reply to the request....OR try below "QUICK REPLY" box to add a reply to this page
Popular Searches: email spam, r fir1 filter beispiel, how is science applied to a wind filter, wavelet filter, cosdes, download spam rules for outlook, powerflow fuel filter,

Quick Reply
Type your reply to this message here.

Image Verification
Image Verification
(case insensitive)
Please enter the text within the image on the left in to the text box below. This process is used to prevent automated posts.

Possibly Related Threads...
Thread: Author Replies: Views: Last Post
  Design of Intranet Mail System nit_cal 14 9,657 19-05-2015 11:17 AM
Last Post: seminar report asees
  An email filter to cut out spam project topics 1 1,064 24-12-2012 11:36 AM
Last Post: seminar details
  DESIGN OF AN INTRANET MAIL CLIENT SYSTEM Electrical Fan 1 2,165 06-12-2012 01:35 PM
Last Post: seminar details
  SMTP/POP3 Mail Server project topics 1 1,544 06-10-2012 11:29 AM
Last Post: seminar details
  Mail Server with Intranet and Live Chat seminar surveyer 1 2,192 06-10-2012 11:28 AM
Last Post: seminar details
  Detecting Spam Zombies by Monitoring Outgoing Messages Projects9 0 1,006 20-01-2012 06:14 PM
Last Post: Projects9
  Load Balancing Mail Server project topics 0 589 04-08-2011 03:40 PM
Last Post: project topics
  Gabor Filter Visualization smart paper boy 0 1,431 22-06-2011 04:15 PM
Last Post: smart paper boy
  A Fuzzy Similarity Approach for Automated Spam Filtering project topics 0 740 02-05-2011 10:10 AM
Last Post: project topics
  intranet mail server mechanical engineering crazy 1 3,301 26-03-2011 05:55 PM
Last Post: ritesh33