Partner Perspectives  Connecting marketers to our tech communities.
SPONSORED BY
3/14/2017
11:00 AM
Pieter Arntz
Pieter Arntz
Partner Perspectives
Connect Directly
Twitter
RSS
50%
50%

7 Things You Need to Know about Bayesian Spam Filtering

Knowing how spam filters work can clarify how some messages get through, and how your own emails can avoid being caught.

Bayesian spam filtering is based on Bayes rule, a statistical theorem that gives you the probability of an event. Bayesian filtering is used to give you the probability that a certain email is spam.

1. The Name
It’s named after the statistician the Rev. Thomas Bayes, who provided an equation that allows new information to update the outcome of a probability calculation. The rule is also called the Bayes-Price rule after the mathematician Richard Price, who recognized the importance of the theorem, made some corrections to Bayes’ work, and put the rule to use.

2. Spam
When dealing with spam the theorem is used to calculate a probability about whether a certain message is spam. The probability is based on words in the title and message, derived from messages that were identified as spam and messages that were identified as not being spam (sometimes called ham).

3. False positives
The objective of the learning ability is to reduce the number of false positives. As annoying as it might be to receive a spam message, it is worse to not receive a message from a customer just because he used a word that triggered the filter.

4. Scoring
Other methods often use simple scoring filters. If a message contains specific words a few points are added to that messages’ score and when it exceeds a  certain score, the message is regarded as spam. Not only is this a very arbitrary method, it’s also a given that it will result in spammers changing their wording. Take for example “Viagra” which is a word that will surely give you a high score. As soon as spammers found that out they switched to variations like “V!agra” and so on. This is a  cat and mouse game that will keep you busy creating new rules.

5. Learning
If the filtering is allowed for individual input the precision can be enhanced on a per-user base. Different users may attract specific forms of spam based on their online activities. In other words,  what is spam to one person is a “must-read” newsletter to the next. Every time the user confirms or denies that a message is spam, the filtering process can calculate a more refined probability for the next occasion.

6. Poisoning
A downside of Bayesian filtering, in cases of more-or-less targeted spam, is that spammers will start using words or whole pieces of text that will lower the score. During prolonged use, these words might get associated with spam, which is called poisoning.

7. Bypasses
A few methods to bypass “bad word” filtering.

  • The use of images to replace words that are known to raise the score

  • Deliberate misspelling, as mentioned earlier.
  • Using homograph letters, which are characters from other character-sets that look similar to letters in the messages’ character set. For example, the Omicron from the Greek looks exactly the same as an “O," but has a different character encoding.

Bayesian filtering is a method of spam-filtering that has a learning ability, although limited. Knowing how spam filters work will clarify how some messages get through, and how you can make your own mails less prone to get caught in a spam filter.

Links to more information:

Was a Microsoft MVP in consumer security for 12 years running. Can speak four languages. Smells of rich mahogany and leather-bound books. View Full Bio
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
Register for Dark Reading Newsletters
Partner Perspectives
What's This?
Malwarebytes protects businesses against malicious threats that escape detection by traditional antivirus solutions. Malwarebytes Anti-Malware, the companys flagship product, has a highly advanced heuristic detection engine that has removed more than five billion malicious threats from computers worldwide. SMBs and enterprise businesses worldwide trust Malwarebytes to protect their data. Founded in 2008, the company is headquartered in California with offices in Europe, and a global team of researchers and experts. For more information, please visit us at www.malwarebytes.com/business.
Featured Writers
White Papers
Video
Cartoon Contest
Write a Caption, Win a Starbucks Card! Click Here
Latest Comment: This comment is waiting for review by our moderators.
Current Issue
Security Operations and IT Operations: Finding the Path to Collaboration
A wide gulf has emerged between SOC and NOC teams that's keeping both of them from assuring the confidentiality, integrity, and availability of IT systems. Here's how experts think it should be bridged.
Flash Poll
Slideshows
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
CVE-2017-0290
Published: 2017-05-09
NScript in mpengine in Microsoft Malware Protection Engine with Engine Version before 1.1.13704.0, as used in Windows Defender and other products, allows remote attackers to execute arbitrary code or cause a denial of service (type confusion and application crash) via crafted JavaScript code within ...

CVE-2016-10369
Published: 2017-05-08
unixsocket.c in lxterminal through 0.3.0 insecurely uses /tmp for a socket file, allowing a local user to cause a denial of service (preventing terminal launch), or possibly have other impact (bypassing terminal access control).

CVE-2016-8202
Published: 2017-05-08
A privilege escalation vulnerability in Brocade Fibre Channel SAN products running Brocade Fabric OS (FOS) releases earlier than v7.4.1d and v8.0.1b could allow an authenticated attacker to elevate the privileges of user accounts accessing the system via command line interface. With affected version...

CVE-2016-8209
Published: 2017-05-08
Improper checks for unusual or exceptional conditions in Brocade NetIron 05.8.00 and later releases up to and including 06.1.00, when the Management Module is continuously scanned on port 22, may allow attackers to cause a denial of service (crash and reload) of the management module.

CVE-2017-0890
Published: 2017-05-08
Nextcloud Server before 11.0.3 is vulnerable to an inadequate escaping leading to a XSS vulnerability in the search module. To be exploitable a user has to write or paste malicious content into the search dialogue.

Dark Reading Radio
Archived Dark Reading Radio
In past years, security researchers have discovered ways to hack cars, medical devices, automated teller machines, and many other targets. Dark Reading Executive Editor Kelly Jackson Higgins hosts researcher Samy Kamkar and Levi Gundert, vice president of threat intelligence at Recorded Future, to discuss some of 2016's most unusual and creative hacks by white hats, and what these new vulnerabilities might mean for the coming year.