Talk:Endianness
From Wikipedia, the free encyclopedia
|
[edit] Signed numbers
I think it might make sense to mention what happens for signed integers. As far as I can these are normally converted by two's complement before the re-ordering of bytes takes place. This may seem rather subtle (depending on how one understands endianness) so could be worth mentioning. At least it isn't immediately obvious what the representation should look like. Is this overly technical and specific to implementations to talk about? Alexwright 00:07, 30 November 2006 (UTC)
- I deliberately avoided them :-) Mainly for two reasons: first, at least ones' complement, two's complement and sign/magnitude should be covered, the most common and the three explicitly allowed by C99. Secondly, I was afraid of a whole burst of edits dealing with the "position" of the sign bit. What I've noticed is that the issue of "bit endianness" (or even bit "position") is not well understood. However, if we are willing to accept a quite long settling-down period for the article we can begin the effort :-) —Gennaro Prota•Talk 12:48, 30 November 2006 (UTC)
[edit] Regression
The article was cleaned up, this section has hence just become deprecated. Bertrand Blanc 10:38, 5 December 2006 (UTC)
Following section in the article is a regression compared with what we converged onto. Then, please remove or be accurate sticking on the level reached in the article rewritten by Gennaro Prota. Thanks.
For example, consider the number 1025 (2 to the tenth power plus one) stored in a 4-byte integer:
00000000 00000000 00000100 00000001
Address | Big-Endian representation of 1025 | Little-Endian representation of 1025 |
---|---|---|
00 | 00000000 | 00000001 |
01 | 00000000 | 00000100 |
02 | 00000100 | 00000000 |
03 | 00000001 | 00000000 |
I may argue that following section is OK as well:
For example, consider the number 1025 (2 to the tenth power plus one) stored in a 4-byte integer:
00000000 00000000 00000100 00000001
Address | Big-Endian representation of 1025 | Little-Endian representation of 1025 |
---|---|---|
00 | 00000000 | 00000100 |
01 | 00000000 | 00000001 |
02 | 00000100 | 00000000 |
03 | 00000001 | 00000000 |
Bertrand Blanc 14:09, 29 November 2006 (UTC)
I have a very simple explanation of endianness. Aword occupies 32 bits and this word may be referred by either the address of the Least Significant Bit (Little Endian) or the address of the Most Significant Bit (Big Endian). The following C program can identify whether a machine is Big Endian or Little Endisn:
int main(void) {
int a =1; if(*(char*)&a == 1) printf("The machine is Little Endian\n"); else printf("The machine is Big Endian\n"); return 0;
}
Some programs dealing with string handling may be dependent on endianness and they will generate different results for different machines. However, it is possible to write Endian Safe program
I've just noticed another potential regression in section "LOLA: LowOrderLo#Address". A new concept called LOLA seems to be introduced. When I read the section, I read middle-endian (byte-swap feature within atomic element). Since middle-endian was already defined, what is the purpose to have a section about LOLA? I probably missed something, may you please clarify a notch further, thanks. Bertrand Blanc 10:47, 1 December 2006 (UTC)
[edit] Endianness in Communication
We can read in Endianness article within Endianness in Communication section: The Internet Protocol defines a standard "big-endian" network byte order.
This information is not enough, we only know atomic units order within a word. What is atomic unit/element size? What is word size? What byte order within an atomic unit? What is bit order within bytes?
Without these information, the data UNIXUNIX can be received as UNIXUNIX or NUXINUXI or XINUXINU or IXUNIXUN (other possibilities do exist) without any clue about reestablishing right UNIXUNIX order.
Since computers can speak together thru the Internet without any issue, I assume that these information are embedded somewhere, maybe in datagram headers within a low-level OSI layer.
To go in deeper details, assuming the issue is known as "NUXI": the only configuration to obtain this result (in big-endian) is with 2-byte atomic element size, without byte-swap (i.e. not middle-endian). The word size seems to be greater than 16 bits. Most significant bit within byte seems to be received first. This interpretation is based on an unformal assumption which may easily be jeopardized with other oral or unformal assumption, like for example "everybody knows that least significant bit within byte is received first" (eventhough this assertion is not obvious and may be wrong in some systems).
Furthermore, if the configuration is really 2-byte atomic unit based per "NUXI", this contradicts sentence coming from the article explicitely stating byte order and not 2-byte order:
To conclude, this is confusing. To be investigated further to provide clear and accurate definition/explanation/wording in the article.
[edit] OPENSTEP reference removed
Just removed this from the article:
- The OPENSTEP operating system has software that swaps the bytes of integers and other C datatypes in order to preserve the correct endianness, since software running on OPENSTEP for PA-RISC is intended to be portable to OPENSTEP running on Mach/i386. <!-- is this automatic or manual and if its manual how does it differentiate it from conversions on other platforms?-->
This is irrelevant. AFAIK all major operating systems have byte swapping functions in the C library. Unless there is something that sets OPENSTEP appart, this does not belong here. (as someone noted in that comment) I suspect this is just from someone reading another article about OpenStep, coming to the wrong conclusion, and polluting the Endianness article with this. – Andyluciano 17:33, 9 February 2006 (UTC)
[edit] Bit endianness in serial communication protocols
So there's been a "battle" over endianness, but what has been the bit endianness? What's the bit endianness for some of the common data link protocols: ethernet, FDDI, ATM, etc.? Cburnett 16:22, 7 March 2006 (UTC)
Just a note: If my opening argument in the section below is correct, BIT-endianness only matters if bits are passed in serial fashion; if they're passed in parallel, bit numbering order is not the same as endianness. phonetagger 19:28, 7 March 2006 (UTC)
[edit] Bit endianness in parallel connections (system memory, etc)
Although I see a paragraph discussing bit endianness in the endianness main article, I have been under the impression for years that the "odd" bit numbering of PowerPC processors (and I'm sure some other processors as well, although I know of none) has nothing to do with endianness. From time to time I've heard programmers or hardware engineers say something about how PowerPC is big endian due to the reversed bit numbering (bit 0 being the MSbit instead of the LSbit), and immediately some other more experienced programmer squashes the first and says that bit numbering has nothing to do with endianness, "you silly inexperienced fool." (OK, maybe they don't say that last part, but I'm sure the first guy sulks away thinking they did.)
Certainly bit numbering has nothing to do with BYTE-endianness, but can we say that bit numbering has anything to do with BIT-endianness? My own take on that is, "no." Byte endianness is something that we have to deal with on an operational level. It affects the WAY bytes are stored and expressed within larger units (words, longwords). If we pass a data structure containing anything other than bytes (chars) from a little-endian machine to a big-endian machine, someone, somewhere along the way, has to perform byte swapping of anything in the structure larger than a byte. On the other hand, if we pass a data structure from one big-endian machine with bit 0 as the LSbit (Motorola 68K-based) to another big-endian machine with bit 0 as the MSbit (PowerPC-based running in native big-endian mode), we encounter no such conversion problem. On the 68K system, the hardware designer knew to hook up bit 0 (LSbit) of the memory array to bit 0 (LSbit) of the 68K CPU, and all was well. On the PowerPC system, the hardware designer knew to hook up bit 0 (LSbit) of the memory array to bit 31 (LSbit) of the PowerPC CPU, and all was well. Data structures passed from one to the other need not be translated.
I work in a company that produces custom/proprietary embedded systems, and occasionally the HW designer gets confused and connects a device (RAM, flash memory, some ASIC, etc) to a PowerPC backwards, tying bit 0 of the device to bit 0 of the PowerPC. Of course if it's RAM, that's no big deal. The processor can store the MSbit in the LSbit location with no problems, since when it fetches it again it comes from the same place it was stored. From time to time my SW group has had to write special drivers for non-RAM devices that were connected backwards, to deal with the fact that the bits are reversed. That's always fun. But that problem isn't (in my opinion) related to endianness (an ordering convention), it's related to a numbering convention.
Comments appreciated! phonetagger 19:28, 7 March 2006 (UTC)
- Also fun is apparently some dual endian chips (e.g. arm) really need either horrible driver code or different wiring to run in different endiannesses in the same system. Its certainly related to endianness but i'd say its not really endianness because there isn't really a first or last bit in a parallell bus just an arbitary name assigned to some pins. Plugwash 21:49, 7 April 2006 (UTC)
[edit] Endianness poetry
Many long years ago, I [kinda] made up a song when we were learning about endianness:
- One little, two little, three little-endians
- Four little, five little, six little-endians
- Seven little, eight little, nine little-endians
- Ten little-endian bytes.
Native English speakers are most likely to "get it" and know the tune than non, but it was fun regardless. Enjoy. Tomertalk 03:05, 9 March 2006 (UTC)
[edit] Endianess in danish dates
The article said that the ISO 8601 format is the most common in danish, I highly doubt that. I'm removing Denmark from the list of countries that use little-endian date formats. Exelban 19:34, 10 May 2006 (UTC)
[edit] The World is Mostly Little Endian
Previously, this article barely even mentioned that modern PCs use little endian. Since modern PC's grew out of the Intel x86 based processors, most of the world's computers evolved from little endian architectures and continue to use little endian (even if they can technically run in other modes). That seems like rather relevant information to me, so I've added it. Sorry to the big endian guys, but the fact is, the vast majority of the machines that view this web page will be using little endian. 209.128.67.234 05:15, 24 April 2006 (UTC)
- By number of desktop pcs yes little endian wins for the moment because a little endian architecture is currently dominating that marketplace. Total number of processors running in each mode is harder to guess at (a lot of embedded stuff is big endiant) Plugwash
The original decision to use little-endian addressing for the Intel line of processors was made by Stanley Mazor. The first 8-bit computer, the 8008, was designed by Intel for Datapoint about 1970 and he chose little-endian for compatibility with the bit-serial Datapoint hardware. Ironically, Datapoint never used the 8008. (Ref. "Anecdotes", IEEE Annals of the History of Computing, April-June 2006; http://www.computer.org/portal/cms_docs_annals/annals/content/promo3.pdf) In an earlier interview Mazor is quoted as considering the choice a mistake. (Ref. http://silicongenesis.stanford.edu/transcripts/mazor.htm, note transcript has "endian" as "Indian".) -Wfaxon 11:32, 26 June 2006 (UTC)
Essentially all modern game consoles use PowerPC processors, which are big-endian. mrholybrain's talk 18:32, 11 March 2007 (UTC)
[edit] Big/little-endian is a misnomer!
First, little-endian is a lame Intel format. Second, the whole scheme is upside down. I mean, end is last. Big-endian described number that has biggest first. It should be the exact opposite, but this isn't a problem with the wiki, but the horrible common mis-use of the terms. 84.249.211.121 18:51, 22 July 2006 (UTC)
- First, calling little-endian a "lame Intel format" is silly and irrelevant. Second, the scheme is not upside-down. "Big-endian" describes numbers that have the big end first. Just as the big-endians in Gulliver's Travels ate the big end of the egg first. --Falcotron 19:30, 22 July 2006 (UTC)
[edit] Example programming caveat
Even without endianness problems, this example would also fail for systems with different alignments or byte sizes. Try writing "66 6f 6f 00 00 00 00 00 00 00 00 00 01 23 45 67 62 61 72 00 00 00 00 00" (64-bit alignment) and reading it back on a system with 32-bit alignment, or writing "0066 006f 006f 0000 0123 4567 0062 0061 0072 0000" (16-bit char; 18-bit is left as an exercize for the reader) and reading it back on a system with 8-bit bytes. (Exactly what you get for the latter depends on the filesystem, but it's probably going to be either "\0f\0o", 0x006f0000, "\001#Eg" with both strings unterminated, or "foo", 0x23676261, "o" and/or EOF.) Endianness is just one reason never to dump raw structs to a FILE * if you expect the result to be portable. --Falcotron 04:55, 25 July 2006 (UTC)
- Indeed. Not to talk of data placed in structs :-) I'm in favour of removing that section altogether, as it seems almost to "justify" what's simply bad programming practice. FWIW, one can't expect to be able to read the file back even by just changing the compiler switches (same compiler version, same program, same platform). Binary dumps are almost exclusively useful for temporary files, which are read back before the program terminates. And there's certainly no need to know anything about endianness to write a number and read it back. —Gennaro Prota•Talk 18:02, 10 November 2006 (UTC)
-
- Just a little addendum: someone may object to my removal of the section by saying, for instance, that if you want to read a binary file where, say, the first four bytes represent an image width and are specified to be little-endian you have to take that into account. That's true, but you only have to consider the *external* format. Here's the "standard" C idiom to deal with this:
typedef ... uint32 width = ( ( (uint32)source[ 0 ] ) ) | ( ( (uint32)source[ 1 ] ) << 8 ) | ( ( (uint32)source[ 2 ] ) << 16 ) | ( ( (uint32)source[ 3 ] ) << 24 ) ;
-
- And that works regardless of the "internal" endianness. —Gennaro Prota•Talk 18:26, 10 November 2006 (UTC)
[edit] Never-endian
Please explain more...
- Probably a joke. Came with these edits, and went with these. It's possible that it's true, but there are no references. --Shreevatsa 16:10, 5 August 2006 (UTC)
- The joke came soon after edits starting with mine that introduced information about 32-bit digital signal processors with a word-addressed memory. They have
CHAR_BITS == 32
andsizeof(int) == 1
. Thus, when dealing with anything fromchar
toint
, there is never any endian issue. However, such processors may still have a preferred order to storelong long
(64-bit) values andlong
(32-bit, 64-bit, or something in between) values. But the phrase has only 27 Google hits and is thus probably non-notable. --Damian Yerrick (☎) 03:26, 10 September 2006 (UTC)
- The joke came soon after edits starting with mine that introduced information about 32-bit digital signal processors with a word-addressed memory. They have
[edit] The Answer
Big endian is write, little endian is wrong. (I've been debugging bitfields on an intel mac. I miss PPC) —The preceding unsigned comment was added by 150.253.42.129 (talk • contribs).
- Do you have any verifiable sources to back up this claim? --Damian Yerrick (☎) 23:29, 1 November 2006 (UTC)
[edit] Origin of little endian
Little-endian cannot come from Gulliver's Travel because the term does not appear in its text. (Follow the reference to the original text to partially verify it.) The original only mentions Big-Endian and breaking eggs on the smaller end or larger end.
Gabor Braun
- Fixed. --P3d0 16:53, 7 November 2006 (UTC)
[edit] Note about Byte Layout vs. Hex format
I know that endianness may appear to be very confusing, then I apologize for this note.
A 32-bit register 1-byte adressable is represented from address A, within the range [A; A+3] such as:
100 |
101 |
102 |
103 |
||
4A |
3B |
2C |
1D |
The HEX format representation of this byte alignement is the value 0x1D2C3B4A since numbers in HEX basis have least significant address value on the right.
-
- No need to apologize; I'm just not sure why you felt a need for the note. Could you please elaborate?
- [snip]
-
-
-
- I feel the confusion is here: in your mind, "atomic unit width" and "address increment unit" are synonyms. I'm saying that there is no relationship: a system can be featured with 2-byte atomic unit width, and with 1-byte address increment unit. I mean:
-
-
-
-
-
-
- With 16-bit atomic element size:
-
-
-
100+0 |
100+1 |
100+0 |
100+1 |
100+2 |
100+3 |
|||||
4A3B |
2C1D |
... |
3B |
1D |
-
-
-
-
- Perhaps we can avoid confusion by eliminating any numeric address from the figures and only show the "increasing direction", as in:
-
-
-
increasing addresses --> | |||
3B |
1D |
-
-
-
-
- The separation of the units is already illustrated by the borders in the second row, so the numbers above are pretty pointless (as is the choice of a particular initial address such as
100
). In effect even the choice of "4A", "3B", "2C", "1D" looks pretty odd to me: what's wrong with "4", "3", "2", "1" or "A", "B", "C", "D"? —Gennaro Prota•Talk 17:24, 10 November 2006 (UTC)
- The separation of the units is already illustrated by the borders in the second row, so the numbers above are pretty pointless (as is the choice of a particular initial address such as
-
-
-
-
-
-
-
- Very good tradeoff!!! Removing all addresses may be a good thing since, in the sequel, address is not a discriminative feature for endianness. Let's refocus on endianness considering only data. However, the reader must be aware of being very thorough reading at the text: he cannot consider the text as a simple "story" without reading carefuly each word. I tested this kind of high-level definition with plenty of engineers: all of them missed the point...
- I'm reluctant to stick on 1-digit hex number, since bytes are manipulated i.e. 2-digit hex numbers.
-
-
-
Bertrand Blanc 23:05, 10 November 2006 (UTC)
[edit] Linguistic Universals Database
Hi guys,
is it just my connection or the links to the Linguistic Universals Database in the footnotes aren't working? —Gennaro Prota•Talk 17:28, 10 November 2006 (UTC)
- As far as I can tell, it's just down right now. The archive's front page still links to the server on port 591, which isn't working. If it were taken down for good, then the archive's front page would likely have been changed. --Damian Yerrick (☎) 17:51, 10 November 2006 (UTC)
-
- Thanks for checking. Yesterday I couldn't access it either. Let's see in the next days. If you have the pages in your browser cache I'd appreciate receiving a little copy-&-paste by mail :-) —Gennaro Prota•Talk 17:55, 10 November 2006 (UTC)
-
-
- PS: Today I asked them by mail; will keep you informed. —Gennaro Prota•Talk 03:08, 12 November 2006 (UTC)
-
-
-
-
- I was informed that the service would be back in some days. Indeed, I'm happy to see that it is working now. —Gennaro Prota•Talk 10:43, 5 December 2006 (UTC)
-
-
[edit] Links to clarify endianness in date formats and mail addresses
The discussion referred to in edit summary is at User talk:EdC#Clarification about your edits to the endianness entry. If to be continued, it may as well be here. –EdC 00:55, 2 January 2007 (UTC)
- Eh, if only you listened to what I wrote. Do you realize what disasters you do? Links to redirects, with links to a section which could disappear... exceptions that except for date formats are exceptions... It's because of users like you that we are all stressed here. —Gennaro Prota•Talk 01:02, 2 January 2007 (UTC)
-
- Is it wrong to link to redirects? I understood that it's fine. Ditto for links to sections; I'll add in linked-from comments if you think there's a danger of those sections disappearing. I'm sorry if my style is so bad that you get stressed; perhaps you should consider a break? –EdC 15:05, 2 January 2007 (UTC)
[edit] Lead section
In this edit, I added two paragraphs to the lead section, which others improved later. Gennaro Prota deleted the paragraphs in this edit with no more comment than "cleanup again... sigh :-(". This is the text in question:
- Big-endian and little-endian are the two main kinds of endianness. Big-endian is generally used on computer networks, little-endian in most computers. (Some computers used endiannesses other than those, referred to as "mixed-endian" or "middle-endian".)
- As a non-technical explanation, most spoken languages use big-endian representations for numbers: 24, for example, is pronounced twenty-four in English, meaning the big end (twenty) comes first. Some languages use little-endian, however. As an example, 24 in Danish would be fireogtyve (literally "four-and-twenty"), putting the less significant digit "four" first.
The reasons for my edit were:
- Article too abstract for too long: It's talking about "Endianness as a general concept" and other things which are only academically relevant (if at all). Realistically, if you discuss big-endian and little-endian, you have 95% of the endianness issue covered, the "general concept" being in the other 5%.
- Discussion upside-down: It's misleading that big- and little-endian are buried far down in section 4 under "Examples" (and then with little-endian as a sub-item of big-endian). That hurts clarity and is annoying to readers who expected to see the "big endian" article (which redirects).
- Lead section is not, as it should be, a summary of the article (see below).
- Lead section is pale, jargon-ridden ("integer", "addressing scheme", "transmission order"), and unspecific ("endianness is the ordering used to represent some kind of data" ...). It's also pretty short on concrete facts.
Because of these issues, I mentioned big- and little-endian in the lead section. The points that most computers are little-endian, while network order is big-endian, provide context and some facts. I also added the linguistic example to provide a clear and concrete explanation. This is in accordance with the relevant Wikipedia Guide:
- Normally, the opening paragraph summarizes the most important points of the article. It should clearly explain the subject so that the reader is prepared for the greater level of detail and the qualifications and nuances that follow. (from [1], emphasis mine)
Asking for guidance: I'd like to know what, specifically, is wrong with the reasoning above and these two paragraphs. I thought they provide value to an otherwise lifeless lead section and find it impolite that they get deleted with no more comment than "cleanup again... sigh". --193.99.145.162 14:20, 2 January 2007 (UTC)
- I'm with you on this. User:Gennaro Prota seems determined that this article should be solely about the technical byte-order meaning – which is, admittedly, the original meaning but is not the only meaning used today – and that thus all other meanings, even if they help to explain, should be stripped from the article. –EdC 15:12, 2 January 2007 (UTC)
-
- I think I have two main problems with the edit you refer to: first it is the classical "local improvement" which enters like an elephant in a crystalware shop in the context of the article. That's an ubiquitous Wikipedia problem: editors rarely seem to have read the whole article; they just kick in. Secondly it is totally rickety, both linguistically and technically; examples: a) "little-endian [is notable] because it is used internally in most computers" -not only that is wrong, but what does "internally" mean? In practice you are *apparently* simplifying, by replacing things such as "storage of integers" or similar with "internally", which appears simple just because it says almost nothing b) "Some computers which today are obscure used endiannesses other than those, referred to as "mixed-endian"... you will agree that this could be in a Dilbert strip, not in an encyclopedia; and its addition clearly shows that you haven't read the whole article, or haven't bothered integrating everything with the "Middle endian" section.
-
- That said, you can change the article as much as you want. I have lost any hope that Wikipedia can ever aim at anything other than fluctuating quality (and I don't think I'll ever spend again so much time on an article as I did on this one). At least, permanent links exist: without them each time you open your browser it is a new surprise. —Gennaro Prota•Talk 01:02, 4 January 2007 (UTC)
[edit] PDP-11 Endianness: Integer vs. Floating Point
Unfortunately, the section on PDP-11 endianness is misleading. The PDP-11 stored 32-bit integers in little endian format. It was only the PDP-11 floating point data types that suffered from mixed endianness. I don't want to lose this section, but I can't see any way of clarifying it without introducing integer vs. floating point, which is not presently mentioned in this article. 68.89.149.2 20:02, 16 January 2007 (UTC)
The purpose of introducing PDP-11 was to illustrate a feature of endianness: middle-Endian. The point to keep in mind is middle-Endian, nothing else. Then if you need to clarify the example, feel free to do it, accuracy is much more better. The example may be skipped by readers if found too much obfuscated, but the basics strengthened by middle-Endian definition cannot.
Subsidiary, what may be misleading is the introduction of mixed-Endian term: is it a middle-Endian synonym? Is it jargon? Is it another kind of Endianness (hence to be explained in another section)?Bertrand Blanc 16:42, 18 January 2007 (UTC)
- Specifically, I have a problem with "stored 32-bit words" in the middle-Endian sub section. To me, it implies that a 32-bit integer would be stored as described, which is untrue. The meaning of "32-bit word" is vague. It seems to me that an explanation of data types needs to be pointed to so that something true can be said, such as "32- and 64-bit floating point representations" were saved as middle-Endian. As for your other question, yes, "mixed-Endian" is a synonym of "middle-Endian." After reading the description of how one can think of the 16-bit chunks as being saved big-Endian while each pair of bytes is saved little-Endian, "mixed-Endian" seems like a very apt term. Netuser500 20:05, 18 January 2007 (UTC)
- Added "some" in the description of Middle-Endian and an explanation to the history, thereby ending my angst about the inaccuracy. Netuser500 19:26, 19 January 2007 (UTC)