Data URI scheme
The data URI scheme is a uniform resource identifier (URI) scheme that provides a way to include data in-line in web pages as if they were external resources. It is a form of file literal or here document. This technique allows normally separate elements such as images and style sheets to be fetched in a single Hypertext Transfer Protocol (HTTP) request, which may be more efficient than multiple HTTP requests.[1] As of 2015, data URIs are fully supported by most major browsers, and partially supported in Internet Explorer and Microsoft Edge.[2]
Syntax
The syntax of data URIs was defined in Request for Comments (RFC) 2397, published in August 1998,[3] and follows the URI scheme syntax. A data URI consists of:
data:[<media type>][;base64],<data>
- The scheme,
data
. It is followed by a colon (:
). - An optional media type. The media type part may include one or more parameters, in the format
attribute=value
, separated by semicolons (;
) . A common media type parameter ischarset
, specifying the character set of the media type, where the value is from the IANA list of character set names.[4] If one is not specified, the media type of the data URI is assumed to betext/plain;charset=US-ASCII
. - An optional base64 extension
base64
, separated from the preceding part by a semicolon. When present, this indicates that the data content of the URI is binary data, encoded in ASCII format using the Base64 scheme for binary-to-text encoding. The base64 extension is distinguished from any media type parameters by virtue of not having a=value
component and by coming after any media type parameters. - The data, separated from the preceding part by a comma (
,
). The data is a sequence of zero or more octets represented as characters. The comma is required in a data URI, even when the data part has zero length. The characters permitted within the data part include ASCII upper and lowercase letters, digits, and many ASCII punctuation and special characters. Note that this may include characters, such as colon, semicolon, and comma which are delimiters in the URI components preceding the data part. Other octets must be percent-encoded. If the data is Base64-encoded, then the data part may contain only valid Base64 characters.[5] Note that Base64-encodeddata:
URIs use the standard Base64 character set (with '+
' and '/
' as characters 62 and 63) rather than the so-called "URL-safe Base64" character set.
Examples of data URIs showing most of the features are:
data:text/vnd-example+xyz;foo=bar;base64,R0lGODdh
data:text/plain;charset=UTF-8;page=21,the%20data:1234,5678
The minimal data URI is data:,
, consisting of the
scheme, no media-type, and zero-length data.
Thus, within the overall URI syntax, a data URI consists of a scheme and a path, with no authority part, query string, or fragment. The optional media type, the optional base64 indicator, and the data are all parts of the URI path.
Examples of usage
HTML
An HTML fragment embedding a picture of a small red dot:
<img src="
ANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4
//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU
5ErkJggg==" alt="Red dot" />
In this example, the lines are broken for formatting purposes. In actual URIs, including data URIs, control characters (ASCII 0 to 31, and 127) and spaces (ASCII 32) are "excluded characters". This means that whitespace characters are not permitted in data URIs. However, in the context of HTML 4 and HTML 5, linefeeds within an element attribute value (such as the "src" above) are ignored. So the data URI above would be processed ignoring the linefeeds, giving the correct result. But note that this is an HTML feature, not a data URI feature, and in other contexts, it is not possible to rely on whitespace within the URI being ignored.
CSS
A Cascading Style Sheets (CSS) rule that includes a background image:
ul.checklist li.complete {
padding-left: 20px;
background:white url('\
ORw0KGgoAAAANSUhEUgAAABAAAAAQAQMAAAAlPW0iAAAABlBMVEU\
AAAD///+l2Z/dAAAAM0lEQVR4nGP4/5/h/1+G/58ZDrAz3D/McH8\
yw83NDDeNGe4Ug9C9zwz3gVLMDA/A6P9/AFGGFyjOXZtQAAAAAEl\
FTkSuQmCC') no-repeat scroll left top;
}
In this example, the \ + <linefeed>
line terminators
are a feature of CSS, indicating continuation on the next line. These would be removed by the CSS stylesheet processor, and the data URI would be reconstituted without whitespace, making it correct, since whitespace is not allowed within the data component of a data:
URI.
JavaScript
A JavaScript statement that opens an embedded subwindow, as for a footnote link:
window.open('data:text/html;charset=utf-8,' +
encodeURIComponent( // Escape for URL formatting
'<!DOCTYPE html>'+
'<html lang="en">'+
'<head><title>Embedded Window</title></head>'+
'<body><h1>42</h1></body>'+
'</html>'
)
);
SVG
A Scalable Vector Graphic image containing an embedded JPEG image encoded in Base64:
<image width="64" height="24" xlink:href="data:image/jpeg;base64,
/9j/4AAQSkZJRgABAQEAYABgAAD/2wBDADIiJSwlHzIsKSw4NTI7S31RS0VFS5ltc1p9tZ++u7Kf
r6zI4f/zyNT/16yv+v/9////////wfD/////////////2wBDATU4OEtCS5NRUZP/zq/O////////
////////////////////////////////////////////////////////////wAARCAAYAEADAREA
AhEBAxEB/8QAGQAAAgMBAAAAAAAAAAAAAAAAAQMAAgQF/8QAJRABAAIBBAEEAgMAAAAAAAAAAQIR
AAMSITEEEyJBgTORUWFx/8QAFAEBAAAAAAAAAAAAAAAAAAAAAP/EABQRAQAAAAAAAAAAAAAAAAAA
AAD/2gAMAwEAAhEDEQA/AOgM52xQDrjvAV5Xv0vfKUALlTQfeBm0HThMNHXkL0Lw/swN5qgA8yT4
MCS1OEOJV8mBz9Z05yfW8iSx7p4j+jA1aD6Wj7ZMzstsfvAas4UyRHvjrAkC9KhpLMClQntlqFc2
X1gUj4viwVObKrddH9YDoHvuujAEuNV+bLwFS8XxdSr+Cq3Vf+4F5RgQl6ZR2p1eAzU/HX80YBYy
JLCuexwJCO2O1bwCRidAfWBSctswbI12GAJT3yiwFR7+MBjGK2g/WAJR3FdF84E2rK5VR0YH/9k="/>
Malware and phishing
The data URI can be utilized by criminals to construct attack pages that attempt to obtain usernames and passwords from unsuspecting web users. It can also be used to get around cross-site scripting (XSS) restrictions, embedding the attack payload fully inside the address bar, and hosted via URL shortening services rather than needing a full website that is owned by the criminal. [6]
References
- ↑ "Using Data URIs to Speed Up Your Website". Treehouse Blog. 27 March 2014.
- ↑ Deveria, Alexis (July 2015). "Can I use...". Retrieved 31 August 2015.
- ↑ Masinter, L (August 1998). "RFC 2397 - The "data" URL scheme". Internet Engineering Task Force. Retrieved 2008-08-12.
- ↑ Freed, Ned; Dürst, Martin, eds. (20 December 2013). "Character Sets". Internet Assigned Numbers Authority. Retrieved 31 August 2015.
- ↑ Berners-Lee, Tim; Fielding, Roy; Masinter, Larry (January 2005). "Uniform Resource Identifiers (URI): Generic Syntax". Internet Engineering Task Force. Retrieved 31 August 2015.
- ↑ Phishing without a webpage – researcher reveals how a link itself can be malicious, Naked Security by Sophos, 31 AUG 2012 https://nakedsecurity.sophos.com/2012/08/31/phishing-without-a-webpage-researcher-reveals-how-a-link-itself-can-be-malicious/