Talk:Cross-site scripting

From Wikipedia, the free encyclopedia

	This article is within the scope of WikiProject Computer science, which aims to create a comprehensive computer science reference for Wikipedia. Visit the project page for more information and to join in on related discussions.
B	rated as B-Class on the assessment scale
High	rated as High-importance on the assessment scale

Internet Portal

This article is within the scope of WikiProject Internet, an attempt to better organise information in articles related to the Internet. For more information, visit the project page.

This article has been rated as B-class on the class scale.

High

This article has been rated as High-importance on the importance scale.

This article is part of WikiProject Malware, an attempt to better organise information in articles related to Malware. If you would like to participate, you can edit the article attached to this page, or visit the project page, where you can join the project and/or contribute to the discussion.

1 Exploit scenarios
2 Terminology
3 Avoiding XSS Scripting: blacklisting vs. whitelisting
4 Examples
5 Restructuring
6 Link to HTML Purifier
7 Vulnerability example/demonstration
8 XSS v CSS
9 This article is very well written
10 There's also a singing group
11 The Reason For Wiki Formatting?
12 Maybe the wrong place, but...
13 vulnerability or attack?
14 Passive Aggressive Page Tagging
15 Avoiding/Prevention Rewrite
16 Lists
17 iframe

[edit] Exploit scenarios

CLARIFICATION NEEDED ON Type-0 attack: In this section under the subsection Type-0 attack bullet #3 it says, "The malicious web page's JavaScript opens a vulnerable HTML page installed locally on Alice's computer." There is no explanation of how the vulnerable HTML page got installed locally on Alice's computer or how Mallory knew about it. This is the crux of this attack so without this this part of the explanation this scenario is not useful. I haven't found an answer to this or I would have corrected the article. I'm hoping someone else will read this who has more knowledge of this attach and add the clarifying information.

I added a short blurb that reads "(Local HTML pages are commonly installed with standard software packages, including Internet Explorer.)" Does that clear it up a bit? TDM 13:53, 15 November 2007 (UTC)

[edit] Terminology

The first paragraph in section: "Other forms of mitigation" is garbage. Just quoting text will not stop it from being interpretted as html. I can always put "> into the text to close the tag. That whole section should be either removed or heavily modified. It is naive and innaccurate. --129.97.84.62 15:06, 4 April 2006 (UTC)

The use of the word "quoting" in the entire article is very ambigious, which is why Mr 129.97.84.62 misunderstood the article. We should take the time to clarify - "quoting" should be replaced with "encoding". --Blaufish 19:46, 3 May 2006 (UTC)

Indeed, many people are very confused by "quoting" in HTML. I believe this is the official terminology for encoding HTML special characters, and this was mentioned at the beginning of the "Avoiding XSS vulnerabilities" section. However, for casual readers who don't read that section and aren't familiar with the term in this context, it would probably be best to use "encode" or "encoding" more often. -- TDM 05:07, 28 May 2006 (UTC)

I agree... "HTML quoting" made absolutely no sense to me, and information about what that means is not easily available. Google searches for "HTML quoting means," "what is HTML quoting," "what does HTML quoting," and the ever-popular define:"HTML quoting" all yeild no results. And there is no Wikipedia entry for "HTML quoting." If it is the official terminology there should probably be a wiki article about it as I'm still uncertain what other people think it is (or what its common usage is). "Encoding" seems better to me.

[edit] Avoiding XSS Scripting: blacklisting vs. whitelisting

There should be some mention of the two different approaches -- blacklisting (i.e., removing anything that can be recognised as a potential script injection) and whitelisting (i.e., only allowing stuff that can be determined not to be a potential script injection). If I had references for this kind of stuff, I'd add it myself, but I came here looking for them. :( JulesH 17:10, 27 July 2006 (UTC)

This "avoiding" section is written from a programmers point of view. How do users avoid these problems? Justforasecond 14:55, 31 July 2006 (UTC)

True, it is written from a programmers perspective, which is most relevant. Users can do very little to avoid such attacks, but perhaps a few things should be mentioned. All I can think of from the users' side, is disabling scripting in browsers (usually unworkable), and to avoid trusting links sent to them via email. --TDM 13:28, 8 August 2006 (UTC)

Or NoScript for Firefox ^.^ 58.160.188.225 09:34, 28 March 2007 (UTC)

I haven't personally used NoScript, but it may be a good mitigation. If you add something about it, be sure to link off to something that describes how it works. TDM 13:49, 15 November 2007 (UTC)

[edit] Examples

I like the recent example of PayPal's XSS hole. However, it isn't mentioned what type it is. Is it a type 1 XSS? If so, we can probably remove the ATutor example, since it isn't very well known, and replace it with the PayPal one. We should also keep the number of examples down to 4-5 if possible. It could easily grow to 1000 if everyone put their favorite in, but we don't need that. --TDM 13:32, 8 August 2006 (UTC)

No one has responded to this, so I went ahead and ripped out several half-complete examples. It seems this section is becoming a bulletin board for script kiddies to advertise. Honestly people... XSS are a dime a dozen. Posting them to popular security mailing lists is more than enough to get your name out there. I did remove the ATutor example and improved some others, but some still don't list what type of XSS they are. If those who posted them could describe them a bit more, that would make this section more consistent and complete. TDM 22:51, 26 October 2006 (UTC)

[edit] Restructuring

Someone added the notice recently that this page may not meet Wikipedia's standards due to the need for restructuring. Could whoever added that elaborate? I don't see any new comments here specific to that evaluation. If there's some other organization that would work better, I'd be willing to improve the document. TDM 23:57, 19 October 2006 (UTC)

Guilty as charged. I should have added a note with some suggestions. Firstly, I found the article confusing and difficult to read, despite having a 15-year background as a systems software engineer. Specifically, the article does not _begin_ with a clear definition of cross-site scripting. Secondly, the sections characterising the types of cross-site scripting are hard to read. I would suggest that in a sense the article could be considered as "written backwards" in that the _examples_ given just after the "types" section show the clearest writing, and are the nearest thing the article has to a clear _definition_. Consider moving these to the front of the article since a "definition by example" would be an improvement. So "restructuring" the article could mean trying moving things around into a more logical order so that things are clearly defined _before_ they are referenced. And if clear definitions are not easily obtainable, perhaps because of lack of consensus, then definition-by-exemplification is definitely the way to go. CecilWard 10:20, 20 October 2006 (UTC)

I have expanded the first paragraph's description and re-ordered the first two sections, which will hopefully help a bit. I don't have time to rewrite the types at the moment, but I did try to clean up the real world examples a bit. I agree that an example early in the article will help those who don't have a clear grasp of all of the background material, but I think it should be relatively short and as simple as possible. When I originally put most of the text together, I wanted to be sure to put the vulnerability in the context of the same-origin policy, otherwise the technical reason why XSS is even a vulnerability at all may be difficult to understand. Because of the order in which things are referenced (e.g. "XSS" abbreviation), a major reordering would require a lot of rewriting as well. However, I agree that the long background section, coupled with the terminology section makes for a long read before casual readers get to any solidifying examples. Thanks for the feedback. TDM 22:48, 26 October 2006 (UTC)

[edit] Link to HTML Purifier

I'd like to add a link to HTML Purifier in the Prevention section, as it implements the most reliable method: parsing and stripping all tags/attributes not in the whitelist (as well as other protection). Unfortunately, I wrote the library, so if I put it on it's vanity. So could someone take a look and, if it looks useful, add the link for me? Thanks! — Edward Z. Yang^(Talk) 23:36, 29 November 2006 (UTC)

A bit quiet around here hmm... I'll wait another week. — Edward Z. Yang^(Talk) 02:37, 2 December 2006 (UTC)

In my not-so-humble-opinion, stripping tags is never the most reliable method. HTML entity encoding is likely the only safe method. Sure, you can't develop a complex stripping system that is designed only to allow good things through, and this might work most of the time, but browsers are just too inconsistent for this to let many sleep well at night. I don't care if you link to it, but don't change the text saying it is the best way to go or anything like that. TDM 17:32, 23 January 2007 (UTC)

The article already has text in "Avoiding XSS vulnerabilities" that states: "The most reliable method is for web applications to parse the HTML, strip tags and attributes that do not appear in a whitelist, and output valid HTML." (Which I did not add to the article). It probably is POV, but I think it's correct (we'll need to find a citation for it, then). Making a complex stripping system is not impossible: as HTML Purifier demonstrates, it has been done.

Browser inconsistency is a trickier issue, but I believe that it too poses no problem as long as you enforce standards-compliant code. Browsers begin to have wildly differing interpretations of HTML when it's ambiguous, when you have things like <IMG src="http://ha.ckers.org/" style"="style="a /onerror=alert(String.fromCharCode(88,83,83))//" >`> . If you get rid of this craziness and enforce well-formed XHTML, you're gold. (Just don't allow comments). — Edward Z. Yang^(Talk) 04:34, 25 January 2007 (UTC)

I still don't agree with you Edward, sorry. I did not put that text there that you quoted, and I think it's definitely a PoV. The problem is not with HTML itself, in all of it's incarnations. You can certainly write a reliable stripper that guarantees properly-formed HTML is not injected. But what you can't guarantee is that a browser won't interpret broken tags as proper tags. An attack which used to work against hotmail was: <ifra<iframe >me src="http://evil.example.org/...">. Of course hotmail would strip the inner, properly formatted iframe, but it wouldn't make a second pass and remove outer one, which became properly formatted after the first was stripped. Another blunder by Microsoft was with their magical .NET 2.0 tag checking. Here, they look for "evil" things and reject them when they find them. However their algorithms didn't even match what IE interprets as valid tags. The attack that worked was <\x01script ...> (where "\x01" should be interpreted as the literal 01 byte). This fooled the "evil" searching algorithm while IE happily ignored the weird byte and used the tag. You have to completely understand the parsing algorithms of 3-4 browsers for 3-4 versions back before you can reliably create a tag stripper that I'll trust. Others feel the same way about this. TDM 13:48, 15 November 2007 (UTC)

if you find <iframe> in the first pass you could simply discard the whole input, if you find <iframe> again in a second pass, you *know* that someone tried to play smart and you can file a case... ;) the problem is that computers can be save, but programmers aren't :>

I just don't agree that any kind of HTML parsing algorithm will do it right in all cases given the current state of HTML. It is fine to mention that this approach is used and is one option, but it is *FAR* from the safest approach. Have you taken into account things like UTF-7 encodings while you're doing this parsing? Outright rejecting any input that looks like it contains tags is much better than trying to sanitize input, but it's still going to be far from perfect. In my work as a penetration tester, I've exploited hundreds of pages that try to use similar techniques. It's just too hard to get it right for all browsers under all HTML and encoding variants. TDM (talk) 19:45, 17 April 2008 (UTC)

[edit] Vulnerability example/demonstration

I'm not familiar enough with this article to know exactly where this should go, but I think this presentation of a Google Desktop vulnerability is extremely educational - they show how such small vulnerabilities in this case end up cascading into complete control over the victim's computer. The vulnerabilities they use are all patched (I think including one glitch that's server-side), so they no longer work, so it should be safe to show. This sounds like Type 2 in the article. —AySz88 \^-^ 20:39, 22 February 2007 (UTC)

That looks like a good resource. I wouldn't have a problem with its addition. Is there a plain-text version, though? — Edward Z. Yang^(Talk) 00:38, 25 February 2007 (UTC)

[edit] XSS v CSS

I was almost certain we'd previously had a discussion on this, but obviously this is not the case. So, I'll bring it to the floor now.

I am strongly opposed to including the acronym "CSS" in the introduction paragraph of the article. It is misleading term that no one uses anymore, as the Terminology statement already states, and thus, while deserving mention in that segment, should not be in the intro. — Edward Z. Yang^(Talk) 22:42, 28 February 2007 (UTC)

It is true that most people (especically in the security community) today no longer use "CSS" to refer to cross-site scripting, since this acronym can refer to another technology. Nevertheless, AFAIK, a few existing articles (including some more recent ones) in the Internet still use this acronym, or use both acronyms simultaneously to refer to cross-site scripting (examples for using both: [1] and [2]). While we should certainly discuss the more appropriate or prefered term in the main article (e.g. in the Terminology section), it seems better that other terms are also mentioned in the intro as long as it is still used by some people or can be commonly found. Or, we can change/rephrase the intro statement a bit to make it more clear.--64.231.71.28 08:16, 1 March 2007 (UTC)

I can see where you're coming from. Maybe we could bump to the end of the intro paragraph. — Edward Z. Yang^(Talk) 22:27, 1 March 2007 (UTC)

It seems good.--64.231.71.28 01:38, 2 March 2007 (UTC)

[edit] This article is very well written

It's well-structured, concise, disambiguating, sufficiently detailed, and very clear. 64.221.248.17 22:24, 6 April 2007 (UTC)

following information is so good.

[edit] There's also a singing group

called XSS. I don't know how to do disambiguation pages, and I'm not an expert on XSS (that's why I was looking them up), but maybe someone can help clarify this? All I know about XSS is that they sing sort of hip-hop style R&B in English, and that they're at least popular in the middle east.

I believe that there has been an XSS disambiguation page created now that you could add to. TDM (talk) 19:46, 17 April 2008 (UTC)

[edit] The Reason For Wiki Formatting?

Are XSS and the difficulty with interpreting and reformatting HTML some of the reasons why wikis don't use HTML for formatting? I know that one reason for not using HTML is that it might be difficult for some wiki users to learn. But it seems that the wiki formatting also helps prevent XSS while giving the users some control. --Lance E Sloan 16:57, 8 August 2007 (UTC)

Yes, wiki's use alternate formatting languages largely due to XSS. If they allowed raw HTML, it would be trivial to hijack anyone else's account and post on their behalf in most cases. Obviously alternate languages can be easier for non-programmers to learn, but I think this is the main reason. Keep in mind, the use of an alternate language does not prevent XSS alone. It must be very carefully implemented. I've seen bulletin board posting languages which allowed injection through attribute parameters. In the language I was testing, one would specify something like: [link url="http://..."]text[endlink] to produce <a href="http://...">text</a>. However, if you supplied "http://evil.example.org/%22%3e" as the URL, the page would render as <a href="http://evil.example.org/">">text</a>, indicating an obvious injection. TDM 13:40, 15 November 2007 (UTC)

[edit] Maybe the wrong place, but...

I have been getting strange XSS warnings in FF 2.0.0.6 from wikipedia articles with images lately. Does anyone know if there has been a change in the template formatting of images or if its a FF bug?

[edit] vulnerability or attack?

isn't cross site scripting really an attack and not a vulnerability? the vulnerability is most clearly input validation. the attack is script injection, of which cross site scripting is a a specific type of injection. do we agree? —Preceding unsigned comment added by 198.169.188.227 (talk) 19:32, 5 September 2007 (UTC)

Well, I would agree that cross-site scripting could be used to refer to an attack. However, there is a vulnerability at the core of it which allows the attack to succeed. I strongly disagree with the assertion that it's a "input validation" flaw, because the real problem output-encoding. These are very different issues, even though people tend to lump them together. What if you want your application to handle nearly any kind of input (free-form text field with multiple languages/character sets) but don't want it to be vulnerable? You can't validate the input carefully (and prevent HTML special characters from getting in there), but you *can* encode the output. It's an injection flaw, whose correct fix is to treat special characters as literals. Yes, you can use validation up-front in 95% of the cases to mitigate the problem, and you *should* do this, since input validation can mitigate other types of vulnerabilities as well, but it is just a mitigation. TDM 13:31, 15 November 2007 (UTC)

[edit] Passive Aggressive Page Tagging

When you tag the page as "needs cleanup", "needs citations", "needs an expert", or whatever, please include a description here of the specific criticisms. I consider myself an expert on this topic and have wrote most of the content for the page. However, I'm a busy guy and only have time to look at the page once every few months. I certainly don't have time to read up on every Wikipedia policy regarding format, so please describe your gripes rather than just doing a hit-and-run tag like that. I can guess the citations issue could be resolved by adding inline external links or footnote tags. Certainly there are plenty (too many) of external resources listed at the bottom that could be better referenced internally to back up the page's assertions. However, the "needs an expert" tag confuses me. TDM (talk) 19:55, 17 April 2008 (UTC)

Hello, TDM. I will remove the expert tag. Just happened to see your note, I am not an expert on the topic but would be happy to help do the citations. —SusanLesch (talk) 15:19, 11 May 2008 (UTC)

Hi SusanLesch. Thanks for responding. I know the article lacks direct citations of several assertions. If you see statements that could use backing up, feel free to insert the little "needs citation" marks on the specific sentences. Sometimes it's hard when you're knee deep in this stuff to realize that certain things, which seem obvious, need backing up with references. I can then try to track down some references for those specific items. TDM (talk) 17:56, 23 May 2008 (UTC)

One para is now cited, using your text as a framework. Barring unforseen circumstances which are possible, I figure if a person can learn JavaScript in 24 hours I might have what I can do done in five to seven days. If you are available then to make corrections for all the errors I introduce that would be great. If anyone else is available and interested we could be done in half that time with luck. —SusanLesch (talk) 17:25, 27 May 2008 (UTC)

Citing is done except one which is marked. I removed my cartoons, the python section and the lists mentioned below. OK from my point of view to throw out all the computers and start over. Only half kidding. Thank you TDM. —SusanLesch (talk) 12:15, 8 June 2008 (UTC)

[edit] Avoiding/Prevention Rewrite

I think this section is currently pretty weak. For one, the Python example can probably go away, since it isn't an ideal filter. Perhaps it would be better to start with a more abstract description of how to do white-list based character encoding (i.e., all characters except those in a white list get encoded), then move on to some algorithms or examples of libraries that already do this. Finally, I think it's important to include a note on defining a page's character set to prevent UTF-7 based attacks. There are very few good references online for this... mostly just specific vulnerabilities and associated exploits. I might get around to this rewrite at some point, but feel free to give it a go if anyone is interested. TDM (talk) 18:01, 23 May 2008 (UTC)

[edit] Lists

Hi. The "External links" section was tagged since last November so I removed it except for a couple. In case anyone needs them, here they all are. —SusanLesch (talk) 17:46, 28 May 2008 (UTC)

The other list was real-world examples which are all here and now summarized in one paragraph under "Background". If anyone thinks that a list makes sense ok from my point of view to restore it. —SusanLesch (talk) 02:31, 29 May 2008 (UTC)

[edit] iframe

From the Mitigation section, this was cut only because OpenAjax recommends iframe and I don't know how to reconcile the two thoughts. Maybe someone else would be able to restore this sentence if it needs to be there. Thanks. —SusanLesch (talk) 06:10, 9 June 2008 (UTC)

"Unfortunately external content can still be loaded into the page with elements like iframe or object to trick users.^[1]"