Talk:Endianness/Archive

From Wikipedia, the free encyclopedia

Archive This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Contents

Endianness in german language

376 is pronounced as "Dreihundertsechsundsiebzig", i.e. "three hundred six-and-seventy". So, whats the difference to three-hundred-and-seven-teen?

Yes, they are similar, and I've changed the relevant article text, but there are differences:
  • There are only 7 exceptions in English, versus 90 (I suppose) in German.
  • The "und" emphasizes the separateness of the tens and units digit-names in German, versus the case in English, where the "teen" is not really recognized as a separate word meaning 10; rather the names "thirteen" (not "three-ten"), ..., "nineteen" are single units and are mainly treated as arbitrary sounds (words) for the numbers, just as "ten", "eleven" and "twelve" are.

We have this endianness scheme in old-style Norwegian counting as well. The "new counting style" (similar to the English way) became an official policy in Norway right after WWII, but we still hang on to the old way. Serves to show how difficult it is to change people's daily language by government decree. :-) --Wernher 05:52, 18 Apr 2005 (UTC)

It's not true that English is big-endian. In English, a number followed by a smaller number implies addition (eg. fifty five = 50+5) while a number followed by a larger number implies multiplication (eg. three hundred = 3×100). Thus, "two thousand five hundred eighty four" is 2×1000+5×100+80+4 = 2584. --P3d0 03:21, 29 December 2005 (UTC)

No. Word order never implies multiplication in English. The reason "three hundred" means 300 is that "hundred" is a place marker (like the "-ty" suffix), not a number. To represent 100, except in special contexts, you have to say "one hundred" or "a hundred." Meanwhile, you can't say "three fifty" to mean 3*50=150--in fact, "three fifty" means 350 (or 3.50, 3 hours 50 minutes, etc.--but notice that even these examples are big-endian).

Violating big-endian order in English is possible, but it always requires extra verbiage, and usually sounds archaic or strange. The best example I can think of is "four and twenty" for 24. Falcotron 04:15, 18 June 2006 (UTC)

Do you have a reference for that assertion? "Four score" is 80. "Three dozen" is 36. Just because every possible combination is not allowed (like "three fifty") doesn't mean the general rule is invalid. (Though, you raise a good point that "three fifty" means "three hundred fifty". I'd hazard a guess that this is a more recent evolution of the language, and is nothing but an abbreviation - dropping the word "hundred" for brevity - rather than a new numbering paradigm, but that's a pure guess.) --P3d0 12:14, 19 June 2006 (UTC)
And "score" and "dozen" are, like "hundred," not numbers on their own. Twenty plus thirty makes "fifty" (not "a fifty"), but four plus eight makes "one dozen" (not "dozen"). You can have "a dozen eggs," or "two dozen eggs," or "dozens of eggs." You can't have "a fifty eggs" or "fifties of eggs"--and "two fifty eggs" clearly means 250 (but sounds a bit odd--without context, I'd be most inclined to interpret it as eggs that cost $2.50). And consider "one dozen" (12), "one score" (20), "one hundred" (100) vs. "one twelve" (112), "one fifty" (150), etc.
The only exception I can find to this is "ten," which is usually a number word, but can also be used in the expression "tens of eggs" (although it's a bit odd, and you still can't say "a ten eggs" or "two tens [of] eggs," and "one ten" is 110).
As for how old the "three fifty" construction is, I'd bet it goes back farther than you think (Americans at least have been referring to "seventeen seventy-six" and "eighteen twelve" for most of American history). But this is irrelevant. It's completely productive, and internalized in the minds of all English speakers. Clearly "four and twenty" is an older construction, but that doesn't mean that it's more important to determining the rules of English numbers. Falcotron 13:16, 19 June 2006 (UTC)

Who was Gulliver?

Lemuel Gulliver's the name of the character, isn't it (as opposed to Swift's pen name)?

That's right; it's Gulliver's Travels by Jonathan Swift. Swift may have written the tale in first person form, though (IIRC, it's been like twenty years since I read it). --Wernher 23:48, 20 Mar 2004 (UTC)

Hex string example

I changed the example 0xDEADBEEF since it might give readers the mistaken impression that hex digits are the same as letters and that texts are stored on computers as hex digits. AxelBoldt 16:42, 17 Mar 2004 (UTC)

I've changed it from 0xA0B70708 to 0x4A3B2C1D. I think the pattern will help the reader to more quickly understand the essence of endianness. The letter hex digits march in one direction while the numeric hex digits march the other way. The un-arbitrariness is a drawback, but smaller than the perception gain, I think. -R. S. Shaw 21:43, 26 Mar 2005 (UTC)
I think 0x4A3B2C1D is brilliant. --P3d0 03:22, 29 December 2005 (UTC)

Endianness in text

Does endianness also affect the storing of texts? I.e., is the text 'ABCD' stored as ASCII 0x44434241 on little endian machines? AxelBoldt 16:49, 17 Mar 2004 (UTC)

Good points. It's very weird the article doesn't mention UTF-16 as if anything besides definitions were superficial. -- Taku 17:07, Mar 17, 2004 (UTC)

Ok, so if endianness does not affect the storing of ASCII text (1 byte=1 char), what exactly is the NUXI problem? I don't understand. AxelBoldt 13:19, 20 Mar 2004 (UTC)

I think it's just the term to describe the general problem. If you're storing two characters per 16 byte word, then the NUXI/UNIX distinction becomes a bit more vivid. Dysprosia 13:39, 20 Mar 2004 (UTC)
Yes, but my question is: are the characters stored that way? I.e., will the memory of a big-endian machine storing the string 'UNIX' look different from the memory of a little-endian machine storing the same string? AxelBoldt 15:37, 20 Mar 2004 (UTC)
From the Endian FAQ referenced in the External links section: "The same order also applies to long (multi-word) character strings and to multiple precision numbers." The NUXI problem appears when transferring data over a network where the computers on each end are "different-endian" from each other, and no precaution for this is made in the protocol. However, to acknowledge your doubt, I do not at the moment quite remember whether NUXI was to be understood as an example text string or as a fictional hex digit quadruple. I guess I have to study the endianness canon intensively all over again... --Wernher 23:48, 20 Mar 2004 (UTC)
Hmmm, I just checked the article again, and I think I got the stuff to compute correctly in my head now, like it sometimes used to do before: 1) ASCII text, i.e. strings of char bytes, and multibyte numbers, are the same to the computer as regards transfers over a link to another machine — just arrays of bytes, and 2) the specific example known as NUXI is just what happens when the string UNIX on a little-endian computer is sent to a middle-endian computer (substitute the individual letters for the bytes in the example, and you'll see). --Wernher 00:12, 21 Mar 2004 (UTC)
With 2, I don't think so. I'm pretty sure endianness matters when you're considering storing a certain number of bytes per word (consider arbitrary length strings). Taku's site provides a good explanation. Dysprosia 01:08, 21 Mar 2004 (UTC)
Er - aren't we actually saying the same thing here? Yes, the strings would be stored differently in memory because of the difference in endianness, and, also, this typically is most often ignored until one tries to send data from an x-endian to a y-endian computer (x and y being different endiannesses). --Wernher 01:37, 21 Mar 2004 (UTC)
I'm saying that if you store an array, with each element consisting of one byte, endianness does not matter. However if each character is stored in an array of 16-bit or 32-bit elements, where each character is stored two characters per array element (so [UN] [IX]), then it does matter, since endianness refers to how each byte (each character) is stored in each 16 or 32 bit element. Strings are usually stored in arrays of 1 bytes, so sending a plain, ASCII string to a machine with different endianness wouldn't be mangled. Dysprosia 03:01, 21 Mar 2004 (UTC)
It all depends on how the programmer stores it. Usually, the char datatype in C, for example, is one byte long. In memory, I don't think it's different either, and that endianness only affects the storage of bytes per word... Dysprosia 23:53, 20 Mar 2004 (UTC)

This site gives a good answer. I think this kind of things happen when you are dealing with a magic number at the beginning of binary files. For example, PNG files starts a four-byte string containing PNG or something (I believe). Then it matters if you are using a big-enddian or little-enddian. True, the article should be more clear about this. -- Taku 01:04, Mar 21, 2004 (UTC)

The site you mention is a must for our article, as it's quite concise. I'll include under External links. --Wernher 02:03, 21 Mar 2004 (UTC)

Spelling

Neither of the two variants appear in current dictionaries, so it is hard to tell which variant is correct.

This is backwards, since a dictionary is not a list of correct words. Correct words go into dictionaries, I must agree, but that doesn't mean that wrong words do not enter dictionaries or that no correct words are ommitted from dictionaries. Kjoonlee 10:02, 2005 Feb 1 (UTC)

Bit numbering

Little-endian numbers are easier to access bit-wise for a computer: When we have bit number n (0...31) and want to determine byte number y (0...3) and bit number i (0...7) of the storage representation, we just take groups of bits: y=n>>3; i=n&7.

I've removed the above text because it applies just as well to big-endian machines. I've coded just such a manipulation many times for both types of endianness. In many cases the code need not differ at all. In other cases the code differs but as duals of each other; neither is more complicated than the other. -R. S. Shaw 04:27, 11 Mar 2005 (UTC)

I believe the above text removed is following the theory presented here, [1] I'm not exactly qualified to claim either/or, and in the sake of keeping the peace, I'd say leave it out. --ORBIT 01:57, 26 Mar 2005 (UTC)
The referenced webpage actually says very little on the subject of bit-wise access. It starts its endianness argument with bits, for unclear reasons, but immediately switches entirely to talking about bytes (which of course is where most endianness issues lay).
(The byte endianness argument there is another partisan shot in the endianness holy wars. It is based around an idea of "expanding" a binary number container from one byte to two bytes. This container expansion is not an operation that is typical of computer execution or programming. The discussion ends up trying to display memory contents in the order opposite to all common usage, big or little endian, and comes off looking a little silly, IMO.) --R. S. Shaw 00:47, 27 Mar 2005 (UTC)

Date formats

I added a little section about endians in the various date formats, because it's a nice easy example for us dummies. If it's wrong, feel free to kick my ass and remove it. Or just remove it. Proto 13:17, 6 Apr 2005 (UTC)


Hell, I think it's a stroke of genius. It's a great analogy. Maybe a tad long, but great nonetheless. Scotto 04:07, 9 November 2005 (UTC)

FTR, it isn't just an analogy as endianness is the terminology used amost universally to compare date formats.
BTW, I've improved the section and split it into subsections. If anyone disagrees feel free...
--Joe Llywelyn Griffith Blakesley talk contrib 01:37, 27 February 2006 (UTC)

"Byte Sex" and "Bytesexual"

Do we really need these amateurish nicknames in there? As far as I know, they're not used in a professional context and they make Wikipedia sound silly. Other articles wouldn't use industry-specific slang.

If those terms are in use somewhere, then we need to mention them. It is simply that wikipedia covers slangs. If other articles omit slangs that are in use, we should mention the slangs in those articles. -- Taku 16:54, Apr 17, 2005 (UTC)
"The word bytesexual or bi-endian, said of hardware, denotes willingness to compute or pass data in either big-endian or little-endian format" --- This tells me that bytesexual is an official (or a very common) term, meaning the same as bi-endian. The use of the term byte-sex later in the article reinforces this idea. However, the silliness of the term (not to mention its ethymological nonsense) should be enough to keep it hygienically separated from the normal flow of the text. Couldn't this - at least - be rephrased into something like: "Some people use the term bytesexual to refer to..."?
I do not feel confusing the reader may add value nor emphasizing a term. A legal term does exist, wide used in the community, in the litterature, a.s.o. Then, I'd prefer to stick on a single term. However, not mentioning synonyms or slang is not the solution since these terms exist as well. I hence propose a tradeoff: adding a section targeting synonyms, slang, ... with a short description; and using in the rest of the document a single unique term.

Endianness in computers

I'd change "Architectures that follow this rule are called big-endian and include Motorola 68000, SPARC and System/370." like this: "Architectures that follow this rule are called big-endian (for "big end first") and include Motorola 68000, SPARC and System/370." i.e. add the parentheses with an explanation.

I think this would make understanding the point even easier, although the example is already pretty clear.

However I don't want to create confusion with the real origin of the term which is explained in the Background section below.

Perhaps it should be placed somewhere else, or in a different wording, but I'd still like to stick the "big end first" somewhere.

Comments?

Krille 11:53, 29 Apr 2005 (UTC)

Something like this sounds ok. If it's good there, probably the corresponding phrase for little endian would also be good. Be bold. -R. S. Shaw 18:39, 29 Apr 2005 (UTC)
Done. -Krille 19:19, 1 May 2005 (UTC)

Little-endian alphabetical index for Wikipedia

I would like to suggest the addition of a little-endian alphabetical index for Wikipedia. At the present time, the only alphabetical index for Wikipedia is big-endian. A little-endian index can help a researcher who knows the last letters of a word but not its first letters. Also, it can make it easier to research a set of words with a common ending, for example, "-shire", "-itis", "-osis", "-mania", "-phobia", "-ton", "-ville", "-ide", "-ic acid", and "-ics". If, after this suggestion is considered, it is not implemented, please state the reason(s) on this page. Here is a sample listing:

              a
             aa
            baa
           baba
          samba
          rumba
           tuba
          abaca
           Inca
         agenda
          aloha
       Himalaya
          Hunza

         baobab
            cab
        taxicab
            dab
            tab

           buzz
          abuzz
           fuzz

Wavelength 22:33, 24 May 2005 (UTC)

This may be a good feature. Regardless, it is up to developers to implement it; see m:MediaWiki feature request and bug report discussion. -- Taku 22:54, May 24, 2005 (UTC)
Thank you for your feedback. I have followed the link and I have considered both routes for communicating with a developer. At the present time, I have decided not to follow either route. Someone else may wish to do so. It is not urgent for me. Wavelength 01:07, 28 May 2005 (UTC)

IBM machines

An anonymous user made some changes showing System/370, System/390 and z/OS as bi-endian. (z/OS is an OS; probably meant zSeries.) I'm not that knowledgeable about these, but I'm sure the 370 was pure big endian, so I've reverted the changes for now. Googling this seems a bit hard, but did find this on an IBM site: http://www-128.ibm.com/developerworks/eserver/articles/linux_s390/ which is "Linux for S/390 and zSeries porting hints" and says "zSeries is a big-endian system. Any code that processes byte-oriented data that originated on a little-endian system might need some byte swapping...." so it really does look like Anon's edits were off base. -R. S. Shaw 05:36, 28 May 2005 (UTC)

Idiot question...

With the exception of middle-endianness, is big-endianism and little-endianism the same thing as most significant bit (MSB) and least significant bit (LSB) respectively? misternuvistor 21:40, Jun 5, 2005 (UTC)

No. The least significant bit has to do with what the bits of a number mean. It has no dependence on the order of the bits. The equivalent in normal decimal numbers is what's called "the units position". In the number 2005, the "5" is in the units position (and is the least significant digit).
Endianness on the other hand has to do with the convention of the order of the digits, without changing the meaning of the number. In decimal, this year in big-endian is 2005; the little-endian form, which you never see in normal usage, is 5002. If that looks backwards, it is; that's what the little-endian requires. (Note that in the little-endian 5002, the "5" is in the units position, since the units position comes first in little-endian.) -R. S. Shaw 23:11, 5 Jun 2005 (UTC)
I thought 2005 in the opposite endianness would be 0520.
Be S, the size in digits of element to handle; Be ZS, the size in digits of atomic element unit (ZS <= S); Be E, the Endian conversion function taking 3 parameters: S, ZS and x, computing x into the opposite endianness element.
E(4,4,2005)=2005
E(4,2,2005)=0520
E(4,1,2005)=5002
E(2,2,2005)=2005
E(2,1,2005)=0250
Let's us introduce Byte-Swap feature BS (read digit-swap for this example): digits within an atomic element unit are swapped. Be E1, the Endian conversion function as defined above, with Byte-Swap.
E1(4,4,2005)=5002
E1(4,2,2005)=5002
E1(4,1,2005)=5002
E1(2,2,2005)=0250
E1(2,1,2005)=0250
Let's us introduce Bit-Swap feature bS (for the example we will reverse order of digit such as 0 is 9 once bit-swapped, 1 is 8, ...). Be E2, the Endian conversion function as define for E, with Bit-Swap.
E2(4,4,2005)=7994
E2(4,2,2005)=9479
E2(4,1,2005)=4997
E2(2,2,2005)=7994
E2(2,1,2005)=9749
To conclude, we can notice the codage of a simple number depends on S, ZS, BS, bS leading to further issues such as communication between heterogeneous Endian systems.
Bblanc 08:47, 29 August 2006 (UTC)


Make sure to point out the difference between most significant bits and most significant bytes. - Omegatron 18:52, Jun 8, 2005 (UTC)
I was speaking of 4 decimal digits. If one were talking about 2 hexadecimal bytes, then the little-endian expression of the hex value 2005 would 05 20, that is, the low addressed byte would have the 05 value. -R. S. Shaw 01:43, 9 Jun 2005 (UTC)

hmm - bi-endianness

"IA-64 running Linux"

Does it matter what OS it's running? Wouldn't the default not care? - Omegatron 18:52, Jun 8, 2005 (UTC)
Yes, it matters. At boot, the OS will set up a bi-endian processor to be either big-endian or little-endian, then all (or at least most) programs will execute with that kind of endianness. -R. S. Shaw 01:43, 9 Jun 2005 (UTC)
Then all of them should mention the OS, right? - Omegatron 15:08, Jun 9, 2005 (UTC)
Not really; some bi-endian processors are used in manners that fix the endianness. For instance, as the article says "on some architectures the default endianness is selected by some hardware on the motherboard" and that system can only run in one endianness. Some potentially bi-endian processors are in fact used with only one endianness, there being no OS which will support the other endianness. -R. S. Shaw 23:53, 9 Jun 2005 (UTC)


Bi-endianness is a non-sense since layout can be based either on little-endian format or big-endian format: a bi-endian data is ambiguous. I'd prefer to keep little-endian and big-endian keywords, introducing Byte-Swap in atomic elements. Hence any confusion disappears. Bertrand Blanc 14:22, 1 September 2006 (UTC)

Claim: numbers in arabic numerals are left-to-right and big-endian even in (some) r-to-l languages

The article now says, "The written system of arabic numerals is used world-wide and is such that the most significant digits are always written to the left of the less significant ones. In languages that write text left-to-right, this system is therefore big-endian, in languages that write right-to-left, this numeral system is little-endian."

Numbers written in arabic numerals are big-endian, because the language of mathematics itself is left-to-right. In Hebrew, if a multi-digit number appears in the middle of a sentence, it is written and read left-to-right, just as in English. In Hebrew, numbers are not little-endian. Numbers, and all of mathematics, are left-to-right.

Biblical Hebrew sometimes used little-endian word order for numbers, but in modern Hebrew, big-endian word order is used exclusively, except for poetic use analogous to "Four and twenty blackbirds baked in a pie."

The article reflects that numerals are l-to-r in the r-to-l written languages like Arabic and Hebrew. That is exactly why it calls them little endian. Consider the equivalent of the English "ABC 123" in Hebrew:
123 אבג
The alef character is first (at right), then bet, then gimel, and then "3". But the number value is 123, not (English) 321, so "3" is the least significant digit (units digit). So the littlest component comes first, and thus Hebrew writing is little-endian. At least that's my understanding of the situation. -R. S. Shaw 00:01, 14 Jun 2005 (UTC)

Thank you, R. S. Shaw, but I think you have missed the point. One must read each string in the order it was intended to be read. English words are intended to be read left to right. Hebrew words are intended to be read right to left. Numbers, arithmetic, and algebraic expressions are intended to be read left to right. These are all true even when embedded in each other's domains. If a Hebrew text discusses the French expression "joie de vivre," it might embed these words in ordinary left-to-right spelling in the middle of a Hebrew sentence. One would begin the Hebrew sentence at the right, but when one comes to the French words one would jump leftward and read "joie de vivre" left to right, and then continue reading Hebrew right to left, just to the left of "joie."

Now suppose you wanted to translate this sentence into Hebrew: I have 31 – 2 = 29 oranges. Transliterated first, that's "Yesh li 31 – 2 = 29 tapuzim." In Hebrew:

יש לי 31 – 2 = 29 תפוזים.

The entire mathematical expression 31 – 2 = 29 is read left to right.

Or consider decimal numbers, for example, 3.14159. Do you really think anyone, even someone whose only written language is Hebrew, could possibly read that number right to left? I don't think so. I think everyone in the world, whether a reader of a left-to-right language or a right-to-left language, reads that as "three point one four one five nine" in the words of their language.

In conclusion, in Hebrew, numbers are read left to right, and therefore are big-endian.

Thank you, Anonymous, but I believe you have missed the point. "Big-endian" does not mean "read left to right," and "read left to right" does not mean "big-endian". Nor does "big-endian" mean that the most significant digits are to the left of the less significant digits.
Endianness is a computer concept. In computer memory there is no left and right, but there is an order defined by the memory addresses. That address order is also the order which is used for text on all types of machines. That order can be thought of as left-to-right or right-to-left, depending on the natural order of the language of the text. Endianness is whether the little or big part of a binary number comes first in that address (text) order.
A big endian binary number is one in which the most significant byte comes "first", where first is defined by the address (text) order. A little-endian number is one in which the least significant part come "first" in the address (text) order.
Now an attempt to transfer this computer concept to written natural languages needs to carry across these concepts, and there isn't necessarily a unique way of making such a mapping. The following seems to be the most straightforward to me; if someone would like to offer a more straightforward mapping, feel free to contribute.
Since written text is linear, the natural order of the text is clear: l-to-r in English, etc. and r-to-l in Hebrew, Arabic, etc. For a written number in Hindu-Arabic numerals, the "most significant" part is consistent in all the systems we're talking about: the leftmost part. Now when numerals are part of a written expression in a natural language, does the most significant or least significant digit appear first in natural order of the text? In the Hebrew example "יש לי 29 תפוזים", which digit appears first in the natural order of Hebrew? The sequence of characters is "י", then "ש", then "ל", then "י", then "9", then "2", so the 9 comes before the 2. Since 9 is the least significant part in the value 29, the least significant digit came first. This is little-endian.
It may indeed be true that one "thinks of" or "sees" the 2 coming before the 9. When writing by hand, one might well write the 2 before the 9 (estimating the space needed for the less significant digits as they will approach the already written words). In that case, one could say that the writing order is big endian, but the text order is little endian. In a computer, the most significant byte of a binary number might be set before the least significant, but that does not mean that the stored number is big-endian; that depends on the final result, the sequence of bytes as they reside in memory. -R. S. Shaw 19:29, 15 Jun 2005 (UTC)
Consider the number 23-1/32. If in a spoken language the number is spoken something like "twenty three [begin fraction] one [end numerator] thirty second" I would regard that as big endian. If the number is spoken something like "two thirty [end denominator] one [end numerator] three twenty" -- or perhaps "one [end numerator] two thirty [end denominator] three twenty" I would regard that as little endian.
Modern Hebrew uses the same word order as English, except that 11 to 19 have the units before the word for ten (English has single words for 11 to 19, and for 13 to 19 the units are first). So spoken modern Hebrew is big endian in that the most significant digit is spoken first, except 11 through 19.
Written Hebrew should be considered independently of how a particular computer system might store mixed Hebrew-numeric text. Since endianness is first a computer concept, let's consider first how a computer might store mixed Hebrew-numeric text. But before that, let's consider how a computer might store mixed Hebrew-roman text. If one were using a Hebrew word processor to write an article that discusses "joie de vivre" then one would always type the letters of "joie de vivre" in the normal French order starting with the j. And, if the line break happened so there was room for only one word at the (left) end of the line, that word would be "joie" and the next line would start at the right with "de vivre" and continue in Hebrew. Now if this were stored in the order (some Hebrew text) eioj [new line] erviv ed (more Hebrew text), and editing elsewhere caused the line break to move so that the whole French expression fit on one line, then the computer would have to move the backwards "joie" to the other side of the backwards "de vivre". I suppose it could be implemented this way, but I think it would be much more typical to store "joie de vivre" in the natural left-to-right order and handle the "reversing" at display or print time.
Now consider storage of Hebrew-numeric text. In my earlier example, where I had "31 – 2 = 29" embedded in a Hebrew sentence, if the line break happened to fall just after the equals sign, then "31 – 2 =" would appear at the (left) end of the first line and "29" would appear at the (right) beginning of the next line. For the same reasons as discussed in the mixed Hebrew-French example, I believe the computer would store the entire mathematical expression in the natural mathematical order. HTML does this. In other words, in HTML, numbers in mixed Hebrew-numeric text are big endian, most significant digit first.
Endianness of the written text: You've already agreed that in the physical act of writing the text, the writer writes the most significant digit first. So the act of writing numbers in Hebrew is big endian.
Finally, the question of actual appearance on the page. Endianness is about "firstness." Well, suppose the 30-digit number 123,456,789,012,345,678,901,234,567,890 appeared in a Hebrew text, and the text column is not wide enough for 30 digits and nine commas, so it must be divided. Suppose the first 12 digits and four commas fit on the (left) end of the first line, and the rest of the number appeared at the (right) beginning of the second line. I argue that the number is big endian because the most significant digit, 1, appears "first." The first digit "displayed" in the Hebrew right-to-left order is 2, the twelfth digit of the number, and the last digit "displayed" in the Hebrew right-to-left order is 3, the thirteenth digit of the number. But it makes no sense to determine endianness based on this. Firstness isn't determined by location on the page in Hebrew's direction. Firstness is determined by location on the page in the number's own direction, which is left to right. The first digit is 1 and the last digit is 0, regardless of where they appear on the page. So numbers in written Hebrew are big endian.
In conclusion, numbers in Hebrew are big endian whether we consider spoken Hebrew, HTML implementations, the physical act of writing, and in appearance on the page. :Anomalocaris 06:55, 16 Jun 2005 (UTC)
Continuing my previous thought, if numbers in written Hebrew were little endian, then in the above example, the printer would print the twelve least significant digits on the first line, ",901,234,567,890" with the comma on the left to emphasize that the number is continued on the next line, and the remaining eighteen most significant digits would appear on the second line. But that's not how it's done in Hebrew, and I don't think it's done that way in any written language. : Anomalocaris 18:23, 18 Jun 2005 (UTC)

I think I'll settle this argument.

Let's say pi = 3.14

Now, is that big-endian or little-endian?

If you read it left-to-right it's big-endian; if you read it right-to-left it's little-endian. There is no practical difference between the two.

Likewise, pi = 41.3 is left-to-right little-endian or right-to-left big-endian.

Get my point? I think I get both of yours.

It all depends on which way you read it. I see it as big-endian, as most of the people out there probably do, but there are probably still a small few who see it as little-endian.

Tada. --Ihope127 19:28, 23 July 2005 (UTC)


I looked at both the article and the long-winded "explanation" above for the current text (Jan 10, 2006) which states that the Indic-Arabic system of digits is always bigendian. That is totally and utter hogwash, as Tada above eminently illustrates. Embedded in a right-to-left stream, Indic-Arabic digits are clearly littleendian. If one chooses to interrupt the order and read the digits left-to-right, then of course it's bigendian -- but that's not a property of the digits! The fact that Hebrew (the example used) may deal with numbers in a bigendian fashion has nothing to do with it. Contrapositively, Danish is an example of a language written with a left-to-right script (Latin) which deals with numbers in a littleendian fashion -- the number 82 is read as to og firs (two-and-eighty) -- a littleendian treatment, but according to the article's argument, that would imply Indic-Arabic letters are always littleendian!

I do intend to change this, obviously. The argument is ridiculous.

--User:HpaScalar 10 Jan 2006

I agree. Endianness only applies to memory, data comms, etc. Parallels with writing numbers are great, but we should not pretend "endianness" is the correct term, or that written numbers can be little or big-endian. Unfortunately, it seems as though a large part of the problem people have with endianness is that they visualise memory in a particular way and expect it to match the way they write numbers in decimal, which is crazy. I really want to add something like this:
or alternatively:
103 102 101 100
... 1D 2C 3B 4A ...
with the addresses in 'reverse order', but I daren't. --StuartBrady 23:23, 11 January 2006 (UTC)
I've come to believe that parts of this article have wandered off into speculations by individual editors about ill-defined endian-like subjects. Endianness is a well-known and well-defined concept, but only in the computing arena. In other realms, such as natural language, I don't think endianness has ever been established in works 'published by a reputable source'. I think what has happened here is that editors have been speculating and trying to reason by analogy to the well-defined computer concept rather than report the current state of knowledge in the area. Thus "little-endian languages" and so forth constitute Original Research and should be eliminated from the article.
The material to be removed I think should include the stuff about Hindu-Arabic numerals and language, and also the reverse-dictionary blurb. I might also go along with removing the 'Endianness in date formats' section since it too seems to be on shaky ground. -R. S. Shaw 06:10, 12 January 2006 (UTC)

An idiot spelling question

A spelling question: is "Big-endian" spelled with or without the hypen? The article has mostly with, but a few without, for example, in the "Portability Issues" section, we have:

 "... if the data are stored using big endian integers ..."

and in the "Discussion, background, etymology" section:

 " ... The spoken numeral system in English is big endian (with minor exceptions ..."

These are the only two such examples I find, the rest are more like:

 "... In a consistently big-endian architecture ..."

I suspect the hyphen should be required and the first two examples should be corrected. Yes?

It should be "big-endian". See hyphen. - Omegatron 13:54, July 19, 2005 (UTC)
I just corrected those errors. Dan Granahan 14:00, 19 July 2005 (UTC)

"C" example

Currently the example code is not legal C, or even C++. (Neither has a "boolean" type for one thing)

So be bold and correct it. --Dan Granahan 03:37, 2 August 2005 (UTC)
Done. (leaving entry here to avoid any confusion on the part of the OP - good/bad idea? I'm new at wiki...) --sqweek 09:08, 4 August 2005 (UTC)
Generally the talk page is left alone as written. (Someone will archive it if the page gets too large.) If OP is 'operator', it's very unlikely one will be coming by; it's just us reader-editors here. -R. S. Shaw 20:39, 4 August 2005 (UTC)
Like R. S. Shaw said, typically nothing on the talk page is removed. I just want to say thanks for your contributions. It's always nice to see new blood. --Dan Granahan 22:59, 4 August 2005 (UTC)
What, are you some kind of Vampire? (thanks for the welcome! I love the whole wiki idea, but I suspect I won't be doing much here other that fixing the very occasional obvious error. OP was 'Original Poster') --sqweek 08:20, 5 August 2005 (UTC)

The lead-in to the example was recently changed to assert that the code assumes that char is 1 byte. The code does not have to assume this; char is defined to be 1 byte. (A byte is not necessarily defined to be an octet, but the code doesn't make any assumptions about that.) You could read the ANSI and ISO C standards (89/90 and 99), the C++ standards, and the manuals for every common pre-standard compiler if you wanted. Or just see http://www.parashift.com/c++-faq-lite/intrinsic-types.html (which is C++, granted, but the same is true in C).

Note that int > char is still an assumption. For example, an embedded system or DSP that couldn't access anything smaller than a word efficiently (or at all) might define char, short, and int to all be one machine word, 16 bits. The code would not work on such a compiler. Falcotron 05:41, 27 June 2006 (UTC)

The most recent change to the example improves things in two ways:

  • Using long instead of int is probably good. There are platforms where char==short==int==16 bits, but long==32. But it still doesn't solve the problem; in fact, the comment points out that "some DSPs have long==char."
  • Make p be const is a good idea, just on principle.

But:

  • It's not valid C89 because it uses // comments.
  • It doesn't need a comment anyway, as the exact same point is made in the immediately preceding text.
  • It uses some slightly awkward syntax to deal with long being "no larger than" char instead of just saying "the same size," which is silly, because sizeof(long) >= sizeof(char) by definition.
  • The whole attempt to detect UNKNOWN_ENDIAN by comparing sizeof(long) to sizeof(char) is unnecessary. If a long is only 1 byte, you can use either big-endian or little-endian code without worry. (I'm not sure whether it's more accurate to call such a system both BE and LE, or neither, but it doesn't matter.)

True, the code could still be wrong if, e.g., sizeof(intmax_t) > sizeof(long) == sizeof(char). But this could not have any effect on portable C89 code, and it's not detectable with portable C89 code. If you're using C99 (or a compiler with similar extensions), just use intmax_t (or the largest type you intend to use) in place of long in the code, and it'll work fine.

Also, I thought it might be a good idea to point out that middle-endian systems will be incorrectly detected as little-endian, rather than just saying that they won't be detected.

So, I reverted the code and fixed the preceding discussion. I also added a short discussion above about systems with all types the same size further up. And I fixed some minor typos and grammatical errors.

I'm not sure mention of any of this extra complexity with 32-bit-byte systems and so on needs to be in the article at all--but I'm guessing that if it's not there, someone will add something overly complicated again, so I tried to come up with something correct that fits in a single parenthesized sentence. --Falcotron 04:22, 25 July 2006 (UTC)

I just finished thanking someone for changing the int to long, when User:Jesuswaffle comes along and changes it back, with the edit summary, "long int is not necessary; int is at least 16 bits." But char can also be 16 bits. --Falcotron 07:37, 1 August 2006 (UTC)
By the way, is this significantly less clear?
int bigendian()
{
   long i = 1;
   const char *p = (const char *) &i;
   return p[0] != 1;  /* Lowest address doesn't contain least significant byte */
}
--Falcotron 07:47, 1 August 2006 (UTC)

Morse code, Roman numerals

How does Morse have an endianness? Dysprosia 03:32, 29 December 2005 (UTC)

Conversion functions

I added a one-line mention of the Berkely sockets conversion functions (htonl and friends) since Wikipedia had no hits for them. I'm adding a redirect from those functions to this article: is that appropriate, or should I write a new "endianness conversion function" stub or something? --Victor Lighthill 20:10, 17 November 2005 (UTC)

As a reader, I'd prefer to have all topics around Endianness located in one place. Conversion functions are furthermore highly relevant to allow Endian heterogeneous systems to communicate. I ended up with following comversion functions (Z/Z' is element size, ZS/ZS' is atomic element unit, l is little-endian, b is big-endian). Is it aligned with your study? Bblanc 12:21, 29 August 2006 (UTC)

Little-Endian towards Big-Endian: Z/ZS/l to Z’/ZS’/b
Incoming little-endian data are resized: either packed (ZS < ZS’), split (ZS > ZS’) or unchanged (ZS = ZS’)
Endian conversion (l to b) is applied on data gathered in [1]
Big-Endian towards Little-Endian: Z/ZS/b to Z’/ZS’/l
Endian conversion (b to l) is applied on incoming big-endian data, swapping ZS sub-width packets
Data computed in [1] are resized: either packed (ZS < ZS’), split (ZS > ZS’) or unchanged (ZS = ZS’)
Little-Endian towards Little-Endian: Z/ZS/l to Z’/ZS’/l
Incoming little-endian data are resized: either packed (ZS < ZS’), split (ZS > ZS’) or unchanged (ZS = ZS’)
Big-Endian towards Big-Endian: Z/ZS/b to Z’/ZS’/b
ZS = ZS’: incoming big-endian data are unchanged
ZS < ZS’: incoming big-endian data are packed and modified
ZS-width packets are gathered into ZS’-width bigger packets
ZS-width packets within ZS’-width packets are swapped
ZS > ZS’: incoming big-endian data are modified and unpacked
Each ZS’-width packet is split into ZS-width packets
ZS-width packets are swapped within ZS’-packets
All ZS’-width packets are unpacked

Bblanc 12:21, 29 August 2006 (UTC)