Talk:CESU-8

From Wikipedia, the free encyclopedia

[edit] rewrite

I have started a rewrite at CESU-8/temp as per the instructions in the copyvio template i am reporting it here. Plugwash 13:51, 14 July 2005 (UTC)

  • Temp page has replaced the main article. RedWolf 03:34, July 22, 2005 (UTC)

[edit] examples

Can yu please give an example of a string encoded in CESU-8 and in which case it is treated in a special way? -- Nichtich 00:23, 27 October 2005 (UTC)

[edit] Advantages

What are the advantages to UTF-8? --Apoc2400 09:15, 6 December 2006 (UTC)

I can think of three
1: if used for serialisation of strings in languages where strings are natively UTF-16 it won't break if someone decides to use string types for something other than valid UTF-16 data.
2: if sorted using a byte orientated sort the result will be the same as using a word orientated sort on UTF-16
3: conversion between CESU-8 and UTF-16 is simpler than conversion between UTF-8 and UTF-16


But as the name suggests the main reason it is used is compatibility with old software. Software that was built to use UCS-2 internally and UTF-8 inexternally and that does not explicitly reject surrogate codepoints can be made to store surrogates (for supplementry characters) by feeding its external interfaces with CESU-8 data (which will be converted to UTF-16 for storage internally). Plugwash 23:07, 7 December 2006 (UTC)