Talk:CDATA

From Wikipedia, the free encyclopedia

This is the talk page for discussing improvements to the CDATA article.

Article policies

Contents

[edit] General questions

…A couple questions still in my mind after reading this page:

The article says you can't have ]]> in a CDATA section -- is there no escaping at all in these sections? Does this basically mean that CDATA sections are only useful for hand-coding XML, when you know what's in them?

Well W3Schools.com says "A CDATA section cannot contain the string "]]>", therefore, nested CDATA sections are not allowed." So I guess something like "<![CDATA[ The following statement is CDATA: <![CDATA[hello world]]> ]]>" would be a no-no. Be this would be ok: "<![CDATA[ The following statement is CDATA: <![CDATA[hello world]]&gt; ]]>" Notice, you can use the '&' 'g' 't' ';' characters as a right angle bracket. If you look at the two examples below, you'll notice that they look exactly the same. But click edit this page and you'll see the bottom one uses '&' 'g' 't' ';'.
  <![CDATA[ The following statement is CDATA: <![CDATA[hello world]]> ]]>
  <![CDATA[ The following statement is CDATA: <![CDATA[hello world]]> ]]>
Also, W3Schools.com says "Everything inside a CDATA section is ignored by the parser." so I'm guessing it's for humans only. (Source) -Hyad 16:38, 13 December 2005 (UTC)
If W3Schools.com says that CDATA is ignored, then it is completely wrong and should not be trusted as a source. Stick to the specs. — mjb 01:02, 31 January 2006 (UTC)
this would be ok: "<![CDATA[ The following statement is CDATA: <![CDATA[hello world]]&gt; ]]>" Notice, you can use the '&' 'g' 't' ';' characters as a right angle bracket.
Not within a CDATA section you can't, surely? I must say, this lack of escaping sounds very bizzare and somewhat dangerous. PeteVerdon 17:27, 10 March 2006 (UTC)

Is whitespace significant in CDATA sections, but not elsewhere? Thus, are "<![CDATA[foo bar]]>" and "foo bar" parsed differently?

No, the only difference between a CDATA section and character data out in the open is that within a CDATA section, "&" means "&" (not the start of an entity or character reference) and "<" means "<" (not the start of a tag). Nothing more! — mjb 01:02, 31 January 2006 (UTC)

[edit] CDATA section in attribute?

A further point to clarify: can CDATA appear as an attribute? I've only ever seen it used as a child node, but nothing I've read here says it has to be used that way. Could one write

<Formula Name="saltLimit" Text="<![CDATA[salt < 3]]>" Units="tons" />

? 195.212.29.83 17:01, 27 September 2006 (UTC)

Short answer: No, a CDATA section can't be used in attribute values. And the fact that this isn't mentioned is a good point, and should be addressed!
Longer answer: The XML spec says a CDATA section can appear "anywhere that character data can appear", but also says that attribute values are not considered character data; they're just part of the markup for an element start-tag. So that pretty much sums it up. As far as I know, SGML is the same way.
You should keep in mind that these constructs are defined and operate at the lexical level, which is beneath any logical abstractions such as "nodes". That is, the W3C DOM Core, for example, represents a CDATA section with what they call a CDATASection node, but that has no bearing on what a CDATA section is or where it can appear. Other node-based object models like XPath's data model and the XML Infoset do not represent CDATA sections as discrete types of objects, which is good since there's no guarantee that a parser will preserve such distinctions (i.e., a parser is free to report sequential character data in chunks any way it likes; it won't necessarily be based on markup boundaries and definitely won't indicate whether the data was in a CDATA section).
FWIW, in SGML, a "CDATA section" is, I believe, an informal term by which you would refer to just one of several kinds of marked sections which take the form <![foo[]]> where foo is a status keyword like CDATA, RCDATA, IGNORE, INCLUDE, or TEMP. In XML, a CDATA section is a formal construct and is the only kind of marked section that is inherited from SGML. So in XML we don't even have or need a concept of 'marked sections'. —mjb 21:26, 27 September 2006 (UTC)
Thanks. 195.212.29.92 09:45, 28 September 2006 (UTC)

[edit] CDATA = PCDATA?

What's the difference between CDATA and PCDATA? If you search for PCDATA you get redirected here but PCDATA isn't mentioned in this article at all. --Stefán Örvarr Sigmundsson 04:27, 28 October 2007 (UTC)

I think it wouldn't hurt to mention it. I did a quick Google search and had trouble finding anything very accurate or definitive, though, since it's an SGML concept (ISO 8879 has yet to be published online). Something I posted to a mailing list in 2001 comes up pretty high in the results, but is actually wrong; the XML spec introduces "#PCDATA" at the same time as "mixed content" and I mistakenly thought the two were synonymous. So feel free to find some better references and mention PCDATA in the article.
What I can tell you now is that in an SGML or XML DTD, "#PCDATA" signifies that an element's content is "parsed" character data. It's the same as a CDATA-type attribute value in that some of its characters may comprise markup like entity references or numeric character references, and "<" is also considered to be markup: in this case, markup that will be recognized as not being part of the element content. —mjb 03:39, 29 October 2007 (UTC)

[edit] Uses of CDATA

This article states that CDATA is used in XML, but does that refer to all XML-based languages or some? I know that it can be used in XHTML document as well? --Stefán Örvarr Sigmundsson 00:15, 31 October 2007 (UTC)