Web bug

From Wikipedia, the free encyclopedia

This article includes a list of references or external links, but its sources remain unclear because it lacks in-text citations.
You can improve this article by introducing more precise citations.

A Web bug is an object that is embedded in a web page or e-mail and is usually invisible to the user but allows checking that a user has viewed the page or e-mail. One common use is in e-mail tracking. Alternative names are Web beacon, tracking bug, tracking pixel, pixel tag, 1×1 gif, and clear gif.

1 Overview
2 Implementation
3 E-mail Web bugs
4 External links

[edit] Overview

A web bug is any one of a number of techniques used to track who is reading a web page or e-mail, when, and from what computer. They can also be used to see if an e-mail was read, forwarded, to someone else or if a web page was copied to another website. The first web bugs were small images.

Some e-mails and web pages are not wholly self-contained. They may refer to content on another server, rather than including the content directly. When an e-mail client or web browser prepares such an e-mail or web page for display, it ordinarily sends a request to the server to send the additional content.

These requests typically include the IP address of the requesting computer; the time the content was requested; the type of web browser that made the request; and the existence of cookies previously set by that server. The server can store all of this information, and associate it with a unique tracking token attached to the content request.

Web bugs are typically used by third parties to monitor the activity of customers at a site. Turning off the browser's cookies can prevent some web bugs from tracking a customer's specific activity. The web site logs will still record a page request from the customer's IP address, but unique information associated with a cookie cannot be recorded. However, web site server techniques that do not use cookies can be employed to help track a site's cookie-blocking users. For example, a web site can identify a request from a new visitor and send that visitor links that pass a unique ID as a GET parameter.

As an example of the way web bugs can make user logging easier, consider a company that owns a network of sites. This company may have a network that requires all images to be stored on one host computer while the pages themselves are stored elsewhere. They could use web bugs in order to count and recognize users travelling around the different servers on the network. Rather than gathering statistics and managing cookies on all their servers separately, they can use web bugs to keep them all together.

For e-mail, many web bugs can be avoided by turning off HTML display and displaying only the text. Turning off the display of images while still using HTML is usually not enough since other techniques can still be used.

Web bugs are frequently used in spamming (sending unsolicited commercial e-mail) as a way of "pinging" to find which spam recipients open (and presumably read) spam before deleting it.

[edit] Implementation

Originally, a Web bug was a small (usually 1×1 pixel) transparent GIF or PNG image (or an image of the same colour of the background) that is embedded in an HTML page, usually a page on the Web or the content of an e-mail. Modern web bugs will also use the HTML iframe, style, script, input link, embed, object, and other tags to track usage.[1] Whenever the user opens the page with a graphical browser or e-mail reader, the image or other information is downloaded. This download requires the browser to request the image from the server storing it, allowing the server to take notice of the download. As a result, the organization running the server is informed of when the HTML page has been viewed.

While Web bugs are used in the same way in Web pages or e-mails, they have different purposes:

If the bug is embedded in an e-mail, the image is requested when the user reads the e-mail for the first time, and can also be requested every time the user loads the e-mail again;
Whenever a Web page, with or without bugs, is downloaded, the server holding the page knows and can store the IP address of the computer requesting the page; this information can therefore be retrieved from the server log files without the need of using bugs; bugs are used when monitoring has to be done by a server that is different from the one holding the Web pages; this is necessary for example when the Web pages are served by different servers, or when the monitoring has to be done by a third party.

As for all files transferred using the Hypertext Transfer Protocol, Web bugs are requested by sending the server their URL, and possibly the URL of the page containing them. Both URLs contain information that can be useful for the server:

The URL of the page containing the bug allows the server to determine which particular Web page the user has accessed;
The URL of the bug can be appended with an arbitrary string in various ways while still identifying the same object; this extra information can be used to better identify the conditions under which the bug has been loaded; this extra information can be added while sending the page or by JavaScript scripts after the download.

For example, an e-mail sent to the address somebody@example.org can contain the embedded image of URL http://example.com/bug.gif?somebody@example.org. Whenever the user reads the e-mail, the image at this URL is requested. The part of the URL after the question mark is ignored by the server for the purpose of determining which file to send, in this case, but the complete URL is stored in the server's log file. As a result, the file bug.gif is sent and shown in the e-mail reader; at the same time, the fact that the particular e-mail sent to somebody@example.org has been read is also stored in the server. Using this system, a spammer or e-mail marketer can send similar e-mails to a large number of addresses to check which ones are valid and read by the users.

Web bugs can be used in combination with HTTP cookies like any other object transferred using the HTTP protocol.

[edit] E-mail Web bugs

Web bugs embedded in e-mails have greater privacy implications than bugs embedded in Web pages. Typically, the URL of web bugs contained in e-mail messages carry a unique identifier. This identifier is chosen when the e-mail is sent, and is recorded together with the recipient e-mail address. The later download of the URL signals that the e-mail has been read. The sender of the e-mail is therefore also able to record the exact time that a message was read and the IP address of the computer used to read the mail or the proxy server that the user went through. In this way, the sender can gather detailed information about when, and from where, each particular recipient reads e-mail. Additionally, every time the e-mail message is displayed, another request goes to the sender's web site.

Web bugs are used by legitimate e-mail marketers, spammers, and phishers, to verify that e-mail addresses are valid, that the content of e-mails has made it past the spam filters, and that the e-mail is actually viewed by users. When the user reads the e-mail, the e-mail client requests the image, letting the sender know that the e-mail address is valid and that e-mail was viewed. The e-mail need not contain an advertisement or anything else related to the commercial activity of the spammer. This makes detection of such e-mails harder for mail filters and users.

Tracking via web bugs can be prevented by using e-mail clients that do not download images whose URLs are embedded in HTML e-mails. Many graphical e-mail clients can be configured to avoid accessing remote images. Examples include the Gmail, Yahoo!, and SpamCop/Horde webmail clients, Mozilla Thunderbird, Opera, relatively recent versions of Microsoft Outlook, and KMail mail readers. But other HTML techniques like iframes can still be used to track e-mail viewing, so some of these clients still are not adequately protecting their users.

Text-based mail readers such as Pine or Mutt, or graphical e-mail clients with purely text-based HTML capabilities such as Mulberry, do not interpret HTML or show images, so are not subject to tracking by e-mail Web bugs. Plain-text e-mail messages cannot contain Web bugs since they cannot have images, and so are safe with any mail client.

Many modern e-mail readers and Web-based e-mail services will not load images when opening an HTML e-mail from an unknown sender or that is suspected to be spam mail. The user must explicitly choose to load images. Web bugs can also be filtered out at the server level so that they never reach the end user. MailScanner is just one example of gateway software that can disarm iframes as well as Web bugs. As a result of these measures, Web bugs are slowly losing their effectiveness and cannot be relied on to accurately count read rates for e-mail campaigns.

Disposition-Notification-To email headers may be seen as another form of Web bug. See RFC 4021.