Collaborative filtering

From Wikipedia, the free encyclopedia

Collaborative filtering (CF) is the method of making automatic predictions (filtering) about the interests of a user by collecting taste information from many users (collaborating). The underlying assumption of CF approach is that those who agreed in the past tend to agree again in the future. For example, a collaborative filtering or recommendation system for music tastes could make predictions about which music a user should like given a partial list of that user's tastes (likes or dislikes). Note that these predictions are specific to the user, but use information gleaned from many users. This differs from the more simple approach of giving an average (non-specific) score for each item of interest, for example based on its number of votes.

Contents

[edit] Methodology

Collaborative filtering systems usually take two steps:

  1. Look for users who share the same rating patterns with the active user (the user whom the prediction is for).
  2. Use the ratings from those like-minded users found in step 1 to calculate a prediction for the active user

Alternatively, item-based collaborative filtering popularized by Amazon.com (users who bought x also bought y) and first proposed in the context of rating-based collaborative filtering by Vucetic and Obradovic in 2000, proceeds in an item-centric manner:

  1. Build an item-item matrix determining relationships between pairs of items
  2. Using the matrix, and the data on the current user, infer his taste

See, for example, the Slope One item-based collaborative filtering family.

Another form of collaborative filtering can be based on implicit observations of normal user behavior (as opposed to the artificial behavior imposed by a rating task). In these systems you observe what a user has done together with what all users have done (what music they have listened to, what items they have bought) and use that data to predict the users behavior in the future or to predict how a user might like to behave if only they were given a chance. These predictions then have to be filtered through business logic to determine how these predictions might affect what a business system ought to do. It is, for instance, not useful to offer to sell somebody some music if they already have demonstrated that they own that music.

In the age of information explosion such techniques can prove very useful as the number of items in only one category (such as music, movies, books, news, web pages) have become so large that a single person cannot possibly view them all in order to select relevant ones. Relying on a scoring or rating system which is averaged across all users ignores specific demands of a user, and is particularly poor in tasks where there is large variation in interest, for example in the recommendation of music. Obviously, other methods to combat information explosion exist such as web search, data clustering, and more.

[edit] History

Collaborative filtering stems from the earlier system of information filtering, where relevant information is brought to the attention of the user by observing patterns in previous behaviour and building a user profile. This system was essentially unable to help with exploration of the web and suffered from the cold-start problem that new users had to build up tendencies before the filtering was effective.

The first system to use collaborative filtering was the Information Tapestry project at Xerox PARC. This system allowed users to find documents based on previous comments by other users. There were many problems with this system as it only worked for small groups of people and had to be accessed through word specific queries which largely defeated the purpose of collaborative filtering.

USENET Net news furthered collaborative filtering such that it was available for a mass scale of users while having a simpler method for accessing articles. The system allowed users to rate material based on popularity, which then allowed other users to search for articles based on these ratings.

[edit] Types

[edit] Active filtering

Active filtering is a method that in recent years has become increasingly popular. This popularity increase is due to the fact that there is an ever growing base of information available to users of the World Wide Web. With an exponentially growing amount of information being added to the internet, finding efficient and valuable information is becoming more difficult. In recent years a basic search for information using the World Wide Web turns out thousands of results and a high percentage of this information is not effective and — more often than not — irrelevant as well. There are a large number of databases and search engines in the market today to use for searches but a majority of the population is not familiar with all the options available and this is where active filtering comes into effect.

Active filtering differs from other methods of collaborative filtering due to the fact that it uses a peer-to-peer approach. This means that it is a system where peers, coworkers, and people with similar interests rate products, reports, and other material objects, also sharing this information over the web for other people to see. It is a system based on the fact that people want to share consumer information with the other peers. The users of active filtering use lists of commonly used links to send the information over the web where others can view it and use the ratings of the products to make their own decisions.

Active collaborative filtering can be useful to many people in many situations. This type of filtering can be extremely important and effective in a situation where a non-guided web search produces thousands of results that are not useful or effective for the person locating the information. In cases where people are not comfortable or knowledgeable about the array of databases that are available to them, active filtering is very useful and effective.

[edit] Advantages

There are many advantages to using or viewing an Active collaborative filtering. One of these advantages is an actual rating given to something of interest by a person who has viewed the topic or product of interest. This produces a reasonable explanation and rank from a reliable source, being the person who has come into contact with the product. Another advantage of Active filtering is the fact that the people want to and ultimately do provide information regarding the matter at hand.

[edit] Disadvantages

There are a few disadvantages of active filtering. One is that the opinion may be biased. Also, as providing feedback requires action by the user, less data may be available than with a passive approach.

[edit] Passive filtering

A method of collaborative filtering that is thought to have great potential in the future is passive filtering, which collects information implicitly. A web browser is used to record a user’s preferences by following and measuring their actions. These implicit filters are then used to determine what else the user will like and recommend potential items of interest. Implicit filtering relies on the actions of users to determine a value rating for specific content, such as:

  • Purchasing an item
  • Repeatedly using, saving, printing an item
  • Refer or link to a site
  • Number of times queried

An important feature of passive collaborative filtering is using the time aspect to determine whether a user is scanning a document or fully reading the material. The greatest strength of the system is that it takes away certain variables from the analysis that would normally be present in active filtering. For example, only certain types of people will take the time to rate a site, in passive collaborative filtering anyone accessing the site has automatically given data.

[edit] Item based filtering

Item based filtering is another method of collaborative filtering in which items are rated and used as parameters instead of users. This type of filtering uses the ratings to group various items together in groups so consumers can compare them as well as a rating scale that is available to manufacturers so they can locate where their product stands in the market in a consumer based rating scale.

Through this method of filtering, users or user groups use and test the product and give it a rating that is relevant to the product and the product class in which it falls. These users test many products and with the results, the products are classified based on the information which the rating holds. The products are used and tested by the same user or group in order to get an accurate rating and eliminate some of the error that is possible in the tests that take place under this type of filtering.

[edit] Explicit versus implicit filtering

Within active and passive filtering there are explicit and implicit methods for determining user preferences. Explicit collection of user preferences requires the evaluator to indicate a value for the content on a rating scale. This creates a cognitive aspect to collaborative filtering, but can mean that the feedback received is more accurate. Implicit collection does not involve the direct input of opinion by the user, but instead it is assumed that their opinion is implied by their actions. This reduces variability amongst users and reduces the demand on the user, which can mean that much more data is available. However, this behaviour data does not necessarily accurately represent the user's true opinion of an item.

[edit] Applications

[edit] In commercial systems

Commercial sites that implement collaborative filtering systems include:

[edit] In non-commercial systems

Non-commercial sites that implement collaborative filtering systems include:

[edit] Software libraries

There are also software libraries which allow a developer to add collaborative filtering to an application or web site:

[edit] See also

[edit] External links