Content filtering

From Wikipedia, the free encyclopedia

See also: Content-control software

Content filtering is the technique whereby content is blocked or allowed based on analysis of its content, rather than its source or other criteria. It is most widely used on the internet to filter email and web access.

1 Content filtering of email
2 Content filtering of web content
3 Filtering methods
4 External links

[edit] Content filtering of email

Content filtering is the most commonly used group of methods to filter spam. Content filters act either on the content, the information contained in the mail body, or on the mail headers (like "Subject:") to either classify, accept or reject a message.

The most popular filter is the Bayesian filter, which is a statistical filter.

Usually anti-virus methods can be classified as content filters too, since they scan simplified versions of either the binary attachments of mail or the HTML contents. Content filters can also refer to parental controls software that analyzes data and either restricts the data or changes the data as with chat filtering. Depending on where content or packets are filtered in the OSI or Internet model, content filtering will refer to technologies designed to ascertain the logic of data and that depends on the application, spam, viruses, computer worms, denial-of-service attacks, trojans, spyware, human understandable subject of data and much more because to an extent it depends on the application or user requirements, hate websites, swear words, chat application subject matter.

It is important to note that the Internet does not have a clear security model standard designed to limit the extent of security incidents such as worms which could potentially overload the Internet causing a global denial of service. Developing intelligent and sophisticated content filtering technology with standards and cooperation among ISPs may be the solution.

[edit] Content filtering of web content

Content filtering is commonly used by organisations such as offices and schools to prevent computer users from viewing inappropriate web sites or content. Filtering rules are typically set by a central IT department and may be implemented via software on individual computers or at a central point on the network such as the proxy server or internet router. Depending on the sophistication of the system used, it may be possible for different computer users to have different levels of internet access.

Content filtering software is sometimes also used on home computers in order to restrict access to inappropriate websites for children using the computer. Such software is typically described as parental control software.

[edit] Filtering methods

Common content filtering methods include:

Bayesian
Attachment - The blocking of certain types of file (e.g. executable programs)
Mail header - Filtering based solely on the analysis of e-mail headers. Made less effective by the ease of message header forgery.
Mailing List - Used to detect mailing list messages and file them in appropriate folders.
HTML anomalies
Language
Heuristic - Filtering based on heuristic scoring of the content based on multiple criteria.
Regular Expression - Filtering based on rules written as regular expressions.
Phrases - Filtering based on detecting phrases in the content text.
Proximity - Filtering based on detecting words or phrases when used in proximity.
URL - Filtering based on the URL. Suitable for blocking websites or sections of websites.
Content-encoding
Char-set