Talk:Buffer overflow

From Wikipedia, the free encyclopedia

Former FA This article is a former featured article candidate. Please view its sub-page to see why the nomination failed. For older candidates, please check the archive.
Peer review Buffer overflow has had a peer review by Wikipedia editors which is now archived. It may contain ideas you can use to improve this article.

Contents

[edit] Programming language issues

Removed from an earlier version of this article, becuase it 'sounds like Cyclone advocacy':

Various techniques have been used to make buffer overflows in C programs less likely. Cyclone is a modified version of the C programming language which uses type information and run time checks to reduce the likelihood of buffer overflows and other memory corruption issues. Systems such as stackguard provide protection against the most common techniques for exploiting buffer overflows by checking that the stack has not been altered when a function returns.

Why remove the Cyclone reference, and not the stackguard reference? Why remove either? It is only mild advocacy at worst (my interest is only as a potential Cyclone user, for the reasons stated above) and is directly releveant to the topic. -- The Anome

I suspect that both Cyclone and stackguard are only single instances of ideas that have been tried several times. If anyone can put a name to, or give other examples of, these approaches, then I think they both have a place in the article. Mentioning Cyclone alone does seem a bit out of place in so short an article, given that it's apparently a new system with no extensive real world use. -- Matthew Woodcraft

Okay, how about this as a set of categories:

Attempts to solve the buffer overflow problem:

  • Languages with in-built run-time bounds checking (Pascal, Python, Java)
  • Retrofit run-time bounds checking for unsafe languages
  • Libraries that catch certain cases of buffer overflow at run-time
  • Static analysis to catch buffer overflows at compile time
  • Code analysis tools designed to catch common buffer overflow vulnerabilities
  • Operating system features designed to limit the damage of buffer overflow and other exploits
  • Software testing regimes designed to find buffer overflow bugs, whether systematically or at random

See also the comments in http://catless.ncl.ac.uk/Risks/21.84.html#subj10

-- The Anome

Nice idea. Implement it as you want. I don't know if it will be probably better to make a new page "Attempts to solve the buffer overflow problem" and let "buffer overflow" page describe the problem. What do you think about this?

-- Tuxisuau

Feel free to split the page up if it feels "too large". (I don't think it's gotten to large yet, or I would have done it myself.) --DavidCary 01:00, 6 Nov 2004 (UTC)

The page now does mention Cyclone in the "Choice of Programming Language" section. -- David Hopwood

I feel like the section on "choice of programming language" is too hard on C and C++. Buffer overflows are, obviously, more of an issue of bad programming than a bad language. To someone who didn't understand the topic so well, this section could lead them to believe that everyone should switch to java or some other language in order to eliminate security risks the world over when, in reality, these other languages have their own issues and opportunities for security exploits. I feel like this section should be modified or removed due to this. There's just a lot of complicated issues about choice of language and this section is too terse about the whole thing. I feel I am too much of a newbie to edit it though. -- Peterius 21:07, 23 Mar 2005 (UTC)

[edit] right-to-left or left-to-right ?

Is there some reason that buffer overflow shows the stack growing in completely the *opposite* direction as stack (computing) ?

For a moment, I thought Aha ! I know how to solve the buffer overflow problem ! Simply grow the stack in the opposite direction, so the "extra data" simply flows off the garbage end of the stack.

Alas, now that I've thought about it some more, I see the buffer overflows can happen no matter which way the stack grows. (If the stack grows towards larger addresses, problems no longer occur when buffers allocated *after* the function call overflow. However, buffers allocated *before* the function call, and passed to the current function for filling, can overflow and overwrite the return address one way or the other, no matter which way the stack grows.)

I think I'm going to change the example to show the stack growing in the "standard" direction (left-to-right), unless someone objects. (Of course, then I'm going to have to have buffer allocated *before* the function call.).

--DavidCary 01:00, 6 Nov 2004 (UTC)

[edit] Recursion unimportant?

Infinite recursion redirects to infinite loop. A combined discussion might be useful, but there isn't anything about recursion there at all. Similarly, stack overflow (the result of infinite recursion!) redirects to buffer overflow, which is unreleated (although buffers that overflow onto the stack could be called a stack overflow, that's not the only kind of stack overflow). Is there some reason that there's no discussion of the subject at any of the expected places? Is infinite recursion considered insufficiently notable when divorced from other problems? --Tardis 01:09, 4 May 2005 (UTC)

[edit] History

Perhaps someone can think of a better title than myself, as this isn't the complete history of buffer overflows, only some notable ones in the past decade or so. Also edited out the somewhat amateurish rip on M$. Although their infamous product lines are vulnerable, our Unix variants haven't been altogether immune either. In general, advocacy seems pretty unnecessary when discussing this technology-neutral but very relevant computational exploit technique. cs at the university of michigan (Unsigned comment by 68.40.167.39)

Well, my understanding of the history of buffer overflows is that the original Unix tools that were implemented by Ken Thomspon et al, were rife with rampant BOs. Part of the free software foundation's claim to fame is that when they tested their reimplmentations with a "random inputs" test hoist, that their tools had far fewer buffer problems than the commercial Unix implementations. Then of course there was the Morris worm. Over time Unix and Linux, of course, have gotten a lot better to the point where buffer overflows in the core tools are extremely rare. As windows and internet became popular starting at a later date however, Microsoft had not yet learned the lessons that Unix already had. So its not surprising that they have been failing over and over again throughout their entire software stack. You can argue that MS has indeed been getting better lately, however, I think that story is far from over. Qed 12:50, 24 December 2005 (PST)
Actually, no, I've recently had occasion to revisit a lot of 1970s Unix code, and it is remarkable how careful the old-timers were to prevent buffer overruns, in assembly language as well as C. Most of the bad problems surfaced in code originally written by students at UCB Berkeley, which was adopted en masse by vendors when Internetworking started to become popular. — DAGwyn 23:30, 2 October 2006 (UTC)
Well if any of that code from the 70s survived to the modern Unix kernels then what you say is simply untrue. I found the papers in question, you can read through them for yourself: http://www.cs.wisc.edu/~bart/fuzz/ . The fact that you can't find the bugs doesn't negate the fact that an objective randomized test can. The papers make it clear, that all Unix vendors were roughly the same and pretty bad (with only the GNU utilities showing any distinction in stability.) Look, one of Thompson, Richie or Kernigan put "gets()" into the C library. So we know for sure that at least one of those guys was a poor amateur programmer at the time, that just put buffer overflows into his code. Qed 11:15, 3 October 2006 (UTC)

[edit] Diagram

How about a diagram that shows data and instructions in memory?

[edit] Choice of programming language

The choice of programming language can have a profound effect on the existence of buffer overflows. As of 2005, the most popular languages generally are C and its derivative, C++. Languages such as these do not check that data, when written to an array (the implementation of a buffer), is within the assumed boundaries of the array. Other programming languages differ in this regard, sending a warning or raising an exception when such a data assignment is made. Examples of such languages are Java and Pascal and its descendants such as Delphi programming language, OCaml, Modula-2, Oberon, and Ada, and also dialects of C such as Cyclone and CCured. There are many other such languages.

This section doesn't make sense. In the end, it is not the programming language that is responsible for buffer overflows. It is the programmer's code, combined with the runtime libraries in place, built over top an architecture. The fault with C/C++ is not in the language itself, but rather the specific code, runtime libraries compiler and architecture. -- Steven Fisher 23:50, 8 December 2005 (UTC)

Feel free to add your comments to the article. The section as it stands now provides pertinent information. 129.62.162.85 00:04, 9 December 2005 (UTC)
The problem is I'm not sure how to fix it without a slash and burn of the paragraph as it stands now. Which, as you said, does provide information. Just not wholly accurate/complete information. If you have any ideas, let me know (or just do it). -- Steven Fisher 19:06, 9 December 2005 (UTC)
The real problem was with the paragraph that follows it. Statements like "C's position is growing increasingly untenable" or "test show that bounds checking doesn't matter" are nonsensical and clearly biased. So I just dug deeper into the issue while trying to provide some "balance", and have removed declarative statements clearly aimed at demonizing C and C++. Qed 6:41, 24 December 2005 (PST)
Actually I'd just like to point out that Steven Fisher's original comments are a very C-centric point of view, and ignores the important functionality bounds checking provided by other languages. While the actual state of correctness in a program may not be related to the existence of bounds checking, how a programmer achieves that correctness does. C and C++ environments can commonly "fail silently" on bounds overflows, and so bounds overflow errors may go undetected even while testing specifically for it. Other programming languages will raise an exception so that you know that the error has occurred. Qed 14:51, 28 December 2005 (UTC)
Not really. There are any number of languages that do not display range errors. There are also ways to get range check errors from C++. I feel attributing it to language is oversimplification to the point where the statement isn't very useful. The current article looks much improved in this area, though. --Steven Fisher 04:42, 30 December 2005 (UTC)

[edit] Article Style and Generality

This article doesn't read like an encyclopedia and the introduction needs rewording to improve clarity and succinctness. I'll attempt to improve it at some point. This article needs to be cleaned up, since the layout and explanation is inferior to many other sources on the internet. Our aim should be to produce something which is technically accurate and yet easy to understand. There is also a wealth of extraneous information not directly related to Buffer Overflows per se.

Tompsci 12:38, 17 December 2005 (UTC)

Further reading reveals some misconceptions and innaccuracies. These will have to be fixed, the most obvious one is about nop-sleds, also the material is not treated in a language and platform independant way.

Any disagreements?

Tompsci 21:59, 21 December 2005 (UTC)

I think that there must be enough material for it to be understood without language and platform dependency, but I believe that examples that are depended on language and/or platform are still useful. 217.147.80.73 13:35, 22 December 2005 (UTC)

I wasn't referring to the examples, but to facts like "0x90h is a NOP", this only applies to the x86 architecture, yet it resides in the main text. Also there are many different "NOP"s on each architecture. You can change a single bit in many Motorola instructions and they do nothing.

It might be useful if we can go over the following points:

1) "dynamic buffer or automatic array is allocated in a function, it is allocated at function call time on the stack." - What is meant by a dynamic buffer? varaible length? It which case it wouldn't reside on the stack.

2) "a RET operation is called" - can you 'call' a RET function. Is RET a function.

3) "Technical rundown" - Rundown? I don't think you would find that in an encyclopedia.

4) "Various techniques have been used to make buffer overflows less likely." - Some of these techniques don't prevent buffer overflows, they just make them less exploitable. Some mention of "canaries" would be useful.

5) Nop sleds are not needed for most windows exploits.

6) What relevance is pointer swizzling?

I don't really have the time to elaborate on this, but I want to hear what other people think. My general impression is that this material is much better covered elsewhere and it the style is not very encyclopedia-like.

The movement of C/C++ related stuff is a big improvement. Tompsci 15:02, 22 December 2005 (UTC)

[edit] User input and bounds checking

It was written that the problem of buffer overflows "can be avoided by sufficient bounds checking on user inputs" - I disagree with this, as shown by the following example:

#include <string.h>

int main(void)
{
  char *buf1 = "12345"
  char buf2[4];
  strcpy(buf2, buf1);
}

No user input involved, but it's still a buffer overflow.

Because of this, I've taken it out. (Note: I'm 217.147.80.73) Parasitical 13:39, 22 December 2005 (UTC)

[edit] Use of safe libraries

The article says "<snip>automatically perform buffer management and include overflow testing is one engineering approach to reduce the occurrence of buffer overflows. The two main building block data types in these languages in which buffer overflows are commonly manifest are strings and arrays, thus use of string and list-like data structure libraries which have been architected to avoid and/or detect buffer overflows provide the vast majority of the necessary coverage."

What I disagree with is "reduce the occurence" and "provide the vast majority". If the safe library is written correctly and the programmer using it doesn't do something really stupid (like overwrite the metadata) - that is, I might add, easily avoidable - it should be IMPOSSIBLE to cause a buffer overflow; not reduction, none.

But the situation might be held hostage to the weakness of the language! C and C++, by their very nature, cannot supply the necessary infrastructure to guarantee anything about buffer overflow safety. (This is why the discussion of the Cyclone language here is actually important, because their modification of the language to provide guarantees is relevant in that guarantees can be provided.)
True, but I did make my statement on the condition that "the programmer doesn't do something really stupid". Perhaps we see what "really stupid" is differently and how hard it is to prevent, but IMHO it's VERY obvious to see direct accessing when that is the exception (which should all be clearly marked) rather than the norm. Because of this, I still think the sentence needs to be changed; although maybye not as much as I was thinking.
At best a library can make guarantees about its own behavior subject to proper use, but unless it is trivial, it will necessarily be contingent upon proper use of the rest of the programming language, so that the machine is in a predictable and usable state. Concretely, if you corrupt your heap via a double-free, for example, then a library that uses the heap, no matter how well written, cannot make any guarantee of any sort after that point. So if the only thing a library can do is guarantee its own correctness subject to proper use of it as well as every other mechanism in the language, then we are in a no better situation than what the standard C library already guarantees.
A library can never prevent security problems, I think it can prevent generic security problems (such as buffer overflows). Provided that the library handles all memory management (AND the library itself isn't broken) then a double free shouldn't be a problem unless the OS's memory management is broken (which is proboably too many). Either way, I would think if data can be corrupted by a double free then simple corruption of data could be a problem - overflow or not, although admittedly it's more likely to be exploitable and easily.
Windows 98 + WATCOM C/C++ => some double frees followed by subsequent heap operations will cause the system to reboot. Qed 17:16, 26 December 2005 (UTC)
Windows 98's memory management is broken. Parasitical 15:39, 28 December 2005 (UTC)
Well it isn't broken -- its just extremely weak. That Win 98 reboots just means that correct usage in a library after incorrect heap usage causes the program to perform actions clearly outside of well defined parameters (and the OS can't handle it so it just goes south along with the application). So it would still fail on NT kernel based Windowses, but probably just not cause the system to reboot. Qed 01:35, 30 December 2005 (UTC)
Hmm... I still consider that broken.
The real value of a "safe" library is that it changes the programmer's behavior and/or makes it easier for the programmer to avoid unsafe scenarios. One way in which this can happen for strings or arrays is for a library to perform automatic memory management, as well as intrinsic index range checking. (And of course some care in making sure that the library itself is well written.) The library making what limited guarantees about itself that it can is important, but just as important is the fact that it addresses key problems (memory management and index range checking) which are otherwise at the heart of the buffer overflow problem.
So "reduce occurrence" and "provide the vast majority" are important and relevant disclaimers, because they describe what is best possible in what is otherwise just too hostile of an environment (namely the C language) to provide for real guarantees, as well as quantifying the real value that such a library might provide. Qed 5:35, 24 December 2005 (PST)
Ok, I accept that point. A library can only gurantee that it stops buffer overflows if nothing else touches the memory and it itself isn't broken. Despite that, I think the sentence should be changed as it is, IMHO, misleading. I'll try and think of a sentence that is acceptable. Parasitical 18:47, 25 December 2005 (UTC)

Of course, theres always the chance of a bug in the safe library or in program but I still think the paragraph should be edited. I'd do it myself, but I question my partiality on this subject. Parasitical 13:47, 22 December 2005 (UTC)

How about this? "Low level languages such as C and C++ allow for buffer overflows as they perform no buffer management themselves. It is possible to abstract handling of buffers using a library. Provided the library itself handles the safe buffer management and it is free from bugs, it can prevent buffer overflows."

But it can't. It can only be responsible for its own structures after assuming nothing has gone wrong with the rest of your programming environment. This is a fundamental problem with the C and C++ languages (but which is different in Cyclone). Also, a library cannot be in charge of how all buffer management is done in the C and C++ languages (which is at the heart of the problem), so you can at best say that a library can provide an alternative buffer management mechanism that will lead to safer general usage under some assumptions (that are possibly easier to meet than just using the language in the raw). Qed 5:35, 24 December 2005 (PST)
Yes, but I don't quite accept "assuming nothing has gone wrong with the rest of your programming environment". There are lots of things that can "go wrong" that does not cause the library to malfunction, it should only malfunction if it's data (internal or external) is corrupted. So really it's only a problem if memory is corrupted - and if all memory management is done through the library then it's only a problem if the library's broken! (which is of course possible.)
Wait a second. A single library cannot be in charge of how all memory in the system is used (unless, possibly, the library implements malloc/free/realloc/calloc itself). In particular, if a library is using the heap via malloc and free then it will typically be sharing the heap with the programmer's other code (which may also use malloc and free). And typically, when the heap corrupts, it means that you will end up losing consistency in all subsequent uses of the heap. Once corrupted (via double free, or a bounds overrun of an allocation, etc) you can snowball into a situation where a subsequent heap operation can cause an unintended overwrite of other allocations in use. That is to say, because the heap is shared, erroneous actions into it can cause errors from one usage of it to leak into others. So to provide a guarantee, a library that uses the heap must include a proviso that the heap remain consistent (meaning that other code its not in charge of cannot corrupt it) for it to function correctly. Qed 17:16, 26 December 2005 (UTC)
Yes, however I contend that provided all memory allocations in the program are handled through the library (which is admittedly an assumption) then the heap *should* not be corrupted. Of course, if the OS's memory management allows for another program to corrupt memory then it is still possible - but that shows, IMO, that the OS's memory management is broken and it's a serious problem library or not.
But yes, if another piece of code has access to the heap (or other memory - having a libraries (meta)data overwritten on the stack, say, is also damaging (although I can't think of any library that allocates data on the stack, it seems entirely possible though)) that is not checked by the library then a buffer overflow *could* occur even from a perfectly performing library. So, yes, it should be made clear that that is the case; but I see little need for the programs code (not including libc, etc) to directly access the heap and/or other memory - although there are some rare cases that may need it for performance reasons. 217.147.80.73 15:36, 28 December 2005 (UTC)
Other pieces of code gain access to the heap via calling malloc, free, etc. A library by itself cannot dictate how other pieces of code access the heap, unless it redefines malloc, free, etc (which cannot be done in a C standard neutral way.) The heap is always shared in C, and is nearly always used in any arbitrarily growable data structure (such as dynamically resizable strings, variable length arrays, etc.) Qed 01:35, 30 December 2005 (UTC)
Ok, accepted. Parasitical 15:26, 14 January 2006 (UTC)
You mention that "the library cannot be in charge of how all buffer management is done" - true, it obviously can't control libc for instance. I question how much difference that makes, though, as if libc can corrupt memory then something is already broken and it is a fault that - save reimplementing libc! - cannot be stopped by the author of the program. I think this, perhaps, highlights an ambiguity in my suggestion: it doesn't distinguish between the program itself having a flaw that causes a buffer overflow and a piece of uncontrolled external code having a flaw that allows for a buffer overflow. This is, obviously, very misleading and I'll attempt to fix it. Parasitical 18:47, 25 December 2005 (UTC)
An errant program can cause misuse of libc (call free twice on an allocation, memcpy to an undersized buffer, etc) which then in turn can manifest itself in garbage being passed into another otherwise well written library. This is the problem of sharing the heap -- one screw up from any code in the program, and any other heap sensitive code can be affected by it. Qed 01:35, 30 December 2005 (UTC)
Yes, but wait - memcpy to an undersized buffer? That's raw writing - as I believe I've said before, a safe library of course cannot stop a buffer overflow from occuring if data is written directly! But I'll accept the double free problem.

Possibly also having a note at the end that says " Although they can prevent buffer overflows, if they are designed and/or used incorrectly other security problems can occur as a result of their usage; for instance truncation of a buffer during copying." Parasitical 13:53, 22 December 2005 (UTC)

This is an library design issue, and doesn't necessarily apply to any given library. For example, it does not apply to "the Better String Library" which was cited. Qed 5:35, 24 December 2005 (PST)
Yes, that's why it says "if they are designed and/or used incorrectly". Parasitical 18:47, 25 December 2005 (UTC)
Ok, you are right. Qed 17:16, 26 December 2005 (UTC)

[edit] A number of edits

From Technical Description:

A more technical view of this would be best explained by using C or C++ to demonstrate a stack based buffer overflow. Heap based buffer overflows are another danger to consider. This example is based around x86 architectures, but works for many others.

Took it out, it's unencyclopedic.

From Executable space protection for Windows systems:

The NX bit is used to mark pieces of data (particularly the stack and the heap) as readable but not executable, thus preventing data on an overflowed stack from being executed. The NX flag can be enabled and disabled per application, and is only currently available on some 64 bit CPUs (regardless of the OS being 32 bits or 64 bits).

Took it out, it's redundant.

Microsoft software DEP provides additional protection takes another approach, preventing the so-called SEH exploit (using a buffer overflow to overwrite an exception handler's address pointer in order to take control).

Took it out, it makes little sense. Exception handlers are just on the stack, like any return address. Any write protection scheme will protect both return addresses and exception handlers.

From Use of safe libraries:

Although in ideal cases they can prevent buffer overflows, if such libraries are designed and/or used incorrectly other security problems can occur as a result of their usage (such as undesirable truncations, data or memory leaking, making a stored reference stale upon resize, etc.).

Replaced it with a shorter sentence.

From Programming language choice:

Sometimes it is claimed that C is a faster language, in part because it does not perform automatic array bounds checking. This is a debatable point, since it depends on usage (it is easy to construct benchmarks or cite tests which show either that there is performance impact or that there is none). Programs that tend to deal in blocks of data at a time will tend to be more performance limited by block manipulations rather than bounds checking, and thus not be significantly impacted by bounds checking. However programs that deal in item-by-item manipulations and where the compiler is unable to hoist the bounds checking can measure a performance impact. It should be noted that block based algorithms are usually faster than than comparable item-by-item based algorithms since they provide greater natural amortization of memory bandwidth (and even ALU bandwidth if SIMD instructions are available).
Some higher level languages may intrinsically assist the compiler in hoisting of bounds checking by the very design of the language (most high level languages make a sharp delineation between arrays and pointers whereas C does not, furthermore many language abstract the concept of a range whereas C does not, and finally many languages provide type and instance seperation of data that makes aliasing impossible whereas C does not). In many cases this can allow a higher level language to remove a runtime check using static analysis.
What this means is that in some cases a C program can be faster as compared with a similar C program that has bounds checking, but might not be faster than a similar Ada program which relies on the language's built-in bounds checking. However, on the other side, the automatic bounds checking typically cannot be removed from other higher level languages (though a given compiler might provide a mechanism for removing them), so for those cases where bounds-checking are non-removable and has a performance impact, the C language can provide faster implementations. (As with all things having to do with performance, things often don't come down to one single programming mechanism, such as bounds checking.)

Dropped the sermon, replacing it with the much simpler:

Static analysis can remove many dynamic bound and type checks, but poor implementations and awkward cases can significantly decrease performance.

Additionally, I added a mention above that "technical, business, and cultural" reasons can make using C/C++ necessary. The rest of this can go somewhere like bounds checking or dynamic typing, but this isn't the place for it.

Dropped the pointer swizzling ref, I think the original editor meant something else. (I don't know why, as "swizzling" is such a precise and clear term.) I dropped the cleanup tag, as I think it looks pretty good now. More formal, concise. I think it could really, really use some stack illustrations. How old is the slashdotted tag? What's the deal with the factual accuracy tag — any specific complaints? I'd love to see a few pictures and a "featured article" tag replace them both. :) --Mgreenbe 23:03, 3 January 2006 (UTC)


[edit] Featured Article

We should go for featured article status! This is one of the longest standing problems in computer security and one yet to be completely solved. I removed the accuracy tag because since I tagged it, alot has improved. I have also removed the slashdotted tag since links from there dissapear from the main page after 1-2 days. The main problems now lie in clarity, especially in the technical description, which I think to be truly encyclopedic must be platform independant. I'm sure there must be some existing graphics we can use to replace the text representation. Tompsci 16:55, 4 January 2006 (UTC)

Definitely true; I'd love to get this to that standard. It seems fairly close, too!
I couldn't find anything on the stack pages I looked at. Stacks are pretty easy to draw, so it won't be a problem if we can't find anything. Function stack, stack frame, and stack unwinding need a lot of work anyway, so it'd be good if we can use some common pictures.
As for architecture independence, I'm not quite sure what to do. I was just going to use the Linux x86 stack, eliding architecture specificity — saved flags and registers, register names, and certain sizes. As for the direction of stack growth, it's easier to show the overflow if the stack grows from upper memory to lower memory (towards the heap), as in Linux x86. We can then say, as we do now, that changing the stack direction is insufficient — buffer overflows can overwrite the return address with locals from other frames or even the heap.
Once all of that's done, I say we get a peer review and go for featured article! --Mgreenbe 18:45, 4 January 2006 (UTC)
For this to be a quality *scientific* article we have to display the general case of a buffer overflow, in this sense it doesn't matter which way the stack grows since this doesn't change the nature of what a buffer overflow is. In this sense we can only discuss the process of a buffer overflowing, not how this might be exploited, since that will be platform and OS independent. One thing to is keep terminology within terms which are universally applicable and acceptable e.g. Stack frame. Tompsci 19:17, 4 January 2006 (UTC)
Good changes, though I think this article is a fair way off FA status. We can do alot to cite more research papers, such as the StackGuard paper. There is also alot of material we can add about exploitation, a clearer technical section with less emphasis on stack based overflows. As regards non-exec memory, there are cases (i.e. function trampolines) where execution is needed on the stack. If we can chip away at this, we can get it up to scratch. Tompsci 00:13, 6 January 2006 (UTC)

I've changed the order of subsections under Protection from buffer overflows to follow the concepts: avoidance (+prevention), detection, exploit prevention and exploit detection. I distinguish between a buffer overflow and a buffer overflow exploit. The old order often confused the two and was:

3 Protection from buffer overflows
   * 3.1 Packet scanning
   * 3.2 Stack-smashing protection
         o 3.2.1 Implementations
   * 3.3 Executable space protection
         o 3.3.1 Protection in Linux, UNIX, and BSD
         o 3.3.2 Protection for Windows systems
   * 3.4 Use of safe libraries
         o 3.4.1 Implementations
   * 3.5 Choice of programming language

The order is now:

3 Protection from buffer overflows
4 Avoidance
   * 4.1 Choice of programming language
   * 4.2 Use of safe libraries
         o 4.2.1 Implementations
5 Detection
   * 5.1 Implementations
6 Exploit prevention
   * 6.1 Executable space protection in Linux, UNIX, and BSD
   * 6.2 Executable space protection for Windows systems
7 Exploit detection

Explanations and questions.

  • Avoidance
    means avoiding the overflow ever arising, for example by design, or by hard proof.
    However this section discusses techniques that are part avoidance and part prevention.
  • Prevention
    Does the article need a subsection for prevention?
    The concept both overlaps and is is separate from avoidance.
    Examples are: enabling runtime bounds checking, enabling memory access bits at the hardware level.
  • Detection
    This means detection of a buffer overflow after it has occurred.
    Stack-smashing protection seems to fit here.
  • Exploit prevention is so placed because it prevents exploits but does not detect overflows per se.
  • Too many section levels? I wanted to avoid subsections getting too deep and hence their titles too small. -Wikibob 23:32, 7 January 2006 (UTC)

[edit] Technical description section

I just somewhat recklessly rewrote Technical description. I hope I haven't stepped on any of the ongoing work mentioned above. My wording may not be the best, but I think it needed a more logical progression of ideas and a few basic illustrations with a minimum of system-specific jargon (e.g. "RET instruction"). ←Hob 08:44, 9 January 2006 (UTC)

Nice edits! I touched them up a bit, but I think they're great. The tables also do fine instead of pictures, but I still might draw something up that can be used on other stack pages. --Mgreenbe 10:48, 9 January 2006 (UTC)
I Agree that this is alot better, I hated the old Technical Section, however it seems that very few people have actually written an exploit for a buffer overflow. Therefore the technical description is slightly misleading/innacurate. My writing style isn't very good, though I have written a zero-day exploit for the Windows platform. But together we can make this a great article. -- Tompsci 12:12, 9 January 2006 (UTC)
I noticed you removed an addition of mine:
Thus the attacker has gained control of the program for at least the next 100 bytes worth of instructions.
The IP points to the beginning of the inserted string; the attacker gets 100 bytes of instructions (plus another 4 if the return address can be interpreted as an instruction). Did you think this was irrelevant or untrue? It is a simplification, but I don't think we need to go through in detail the same way, say, Aleph One does. --Mgreenbe 12:31, 9 January 2006 (UTC)
Sorry, but I do think this is misleading and irrelevant, because it is a simplication. I think it's better to leave it out than to allow for confusion. There are many situations where it might be wrong:
  • where code can be inserted into other buffers apart from the one that is being overflown.
  • return-to-libC attacks
  • where large sections of the stack can be overwritten and corrupted exception handlers can point to the attacker's code.
To keep the article general, I think the sentance should be kept out of the text. Sorry if that seems a little harsh. -- Tompsci 13:09, 9 January 2006 (UTC)
That makes sense and isn't too harsh in general, but we're talking about a concrete example: the new return address is calculated to be the top of the stack. In this case the simplification is merely that buffer can be leveraged for more space; does it still bother you? What we should do in any case is give/reference more examples, such as overflowing into other buffers, return-to-libc attacks, and exception handler attacks. --Mgreenbe 13:55, 9 January 2006 (UTC)

Thanks, I'm glad the rewrite was a helpful start, and the first paragraph of that section is better now - I agree that the definitions of arrays, etc. were too much, I got a little carried away there. ←Hob 18:44, 9 January 2006 (UTC)

[edit] Deletion of 'High Level Description'

Hey guys; I don't know what you think, but reading that section (what does the title mean by "High Level"?), IMO it didn't add much to the rest of the article. As far as I'm concerned, the article is of a much higher quality than it was before, but I think two sections both need work: History and Executable space protection for Windows.

  • History
    • POV? - "Even after this incident, buffer overflows were virtually ignored as a security issue" etc.
    • Citations/References needed.
    • Paragraphing
  • Executable space protection for Windows
    • POV? - "Development of similar protection methods for Windows has taken longer, but Windows customers"
      • use of 'customers' is loaded in relation to linux
      • "taken longer", or just adopted later? Citation?
    • Style/Wording

Please correct me if you think I'm wrong, you guys have been great so far (Wikibob, Hob, Mgreenbe in no order). How "close" are we to FA quality? Tompsci 20:35, 9 January 2006 (UTC)

I think some sort of simpler introduction might be appropriate (a la homomorphism); some readers may not be able to easily understand memory diagrams. The old paragraph didn't do a good job of this; I'll try my hand at it on the morrow.
As for the history, citations will be difficult without violating NOR; has anyone written a history of, say, Bugtraq mailing lists? It's a particularly tight situation when we want to prove that something was ignored. I'll look through ACM's library for security survey papers from the late 80s and early 90s.
Windows stuff...hmm; I don't know much. I'm pretty sure they got support later than Linux: GrSecurity and PAX has been around for a while, and I think the original SeLinux used it; when did OpenBSD add W^X? This stuff can be looked up easily.
OpenBSD first had W_X in a release at 3.3 (May 1 2003); it wasn't added for i386 till 3.4 (Nov 1, 2003, though. Parasitical 15:18, 14 January 2006 (UTC)
We're definitely getting there, I think! --Mgreenbe 23:09, 9 January 2006 (UTC)

I agree that the "high-level" section was redundant and mistitled. Unfortunately I can't be of any help with the references and history, just don't have the expertise. On your other points:

  • "...buffer overflows were virtually ignored..." - Shouldn't say this without a source, though I guess you could use the success of subsequent attacks as evidence... but there are plenty of security risks that are understood without being adequately acted on.
  • Paragraphing - I've done some, see what you think.
  • Executable space protection - couldn't this be a single subsection? There's not all that much in the Linux section, and about half of what's in the Windows section is either redundant or is actually about exploit detection rather than protection. First paragraph of Windows section is vague & unnecessary (what's "longer"? "recently"? "wide variety"?). And has anything changed since 2003?

Also:

  • Also, I think the intro to "Exploit protection" is a little too condensed for a non-expert; it doesn't explain what protecting the executable space is, or why randomizing the address space is helpful.
  • There are two Detection sections!

Sorry I don't have time to help much with the reorg right now. This is really interesting, thanks. ←Hob 01:47, 10 January 2006 (UTC)

I just had a different thought about the "Executable space protection" section(s). Instead of talking about a mechanism of protection, and then saying "the Linux implementation is X" and "the Windows implementation is Y", and then talking about detection mechanisms, and then saying the Linux and Windows implementations of that are Z and Q... why not first give a non-platform-specific explanation of what each anti-exploit mechanism is, and then have another section on what you can do on each specific platform. In other words, have a Windows section that discusses all the relevant security gadgets built into or available for Windows. I'd think that would be of the most practical benefit to anyone wondering about the vulnerabilities of their system. ←Hob 02:30, 10 January 2006 (UTC)

Hey I've bitten the bullet and changed the structure, inserted links to main articles, so just needs intros etc. I'll try and do it soon if possible. -- Tompsci 09:31, 11 January 2006 (UTC)

[edit] Segmentation Fault

The buffer overflow example says "may not always cause a segmentation fault". But the error message is specific to linux. Shouldn't it be replaced with a platform independent error message. or give error messages of other platforms like "Access Violation Error" as in windows. --Soumyasch 07:44, 2 March 2006 (UTC)

It isn't a Linux error message, it is a generic term, see segmentation fault. NicM 17:52, 2 March 2006 (UTC).
Even at that faulty memory access may not be the reason the program crashes (what if 0x41414141 is a debug trap instruction?) Best would probably say just 'fault' as saying a memory-specific fault is assuming too much about the user's address space lay-out (especially users with ASLR implementations.) --Vargc0 15:48, 3 March 2006 (UTC)
fault may be too general, there are lots of faults that have nothing to do with what you normally think of as a fault. In general these errors are either segmentation faults or more commonly page faults. That said i think this may be getting a little too technical for this section, it might be better to just say something like "an illegal or invalid memory access". Michael Lynn

[edit] Stack direction and platform dependent information

The negative buffer overflow can have vulnerability on the platform that the stack grow from low to high memory address. [1] My question is, Is there a platform with such stack direction? If such platform does exists, this article may list this platform dependent information for various platform. --Ans 07:34, 13 July 2006 (UTC)

Buffer overflows are a problem regardless of the direction in which the stack grows and yes there are such platforms. I don't think it's neccessary to list platform specific information as we only need discuss the general case. -- Tompsci 07:01, 14 July 2006 (UTC)

[edit] Prominence of MS

Hello again, sorry if i might have overreacted a little bit. But...please explain why you did increase the prominience of MS in especially the section "Executable Protection" and the seperately linked article for it. Why did you decrease the PaX promininence (it was moved to the end), you even removed the reference to the PaX site. All you left was a link to the sub section ASLR on PaX. However I reorderd it, if i see that again happening i'll take appropriate steps.

If you have a problem with me, take it up with me on my talk page, otherwise, feel free to provide quality edits. -- Tompsci 10:55, 12 August 2006 (UTC)

[edit] DEP

Microsoft DEP's software based approach does, opposed to the widespread believe of protecting from buffer overflows, explicitly protect from one specific exploit that occured one time only and is based on overwriting the pointer to the SEH Exception Handler.

This doesn't seem to make sense, only one specific exploit overwrites the SEH to gain control of program flow? How does it not protect from *any* buffer overflows? How does this behaviour change between software and hardware enforced DEP? Detailed information on DEP itself may be better placed in a seperate article? I think It might be best to discuss Non-Executable Memory Regions generally on this page and the operational sematics of the individual solutions and weaknesses on seperate pages. This page is intended to give a broad overview of Buffer Overflows and leave the interested reader to read other related articles. Let me know what you think. It might be useful if you registered a username so that people can tell which edits are yours and so you can be contacted via your talk page. Cheers, Tompsci 01:23, 16 August 2006 (UTC)

[edit] Software protection (removed)

I added a section on software protection through use of instruction simulation but it was completely removed - reason "not to do with buffer overflow". I think it should go back as it covers overflow of any kind of memory segment of finite length. see above "Software testing regimes designed to find buffer overflow bugs, whether systematically or at random" for suitable heading for this category although the technique does not just aplly to testing if you are prepared to accept an overhead indefinitely or for an extended time or for critical sections of code etc.Below is the text string for it I pulled off history. It appears as though the formatting is destroyed but this is the wording.ken 16:00, 11 September 2006 (UTC)

[edit] Software protection

- It is possible to prevent buffer overflow and all other types of ("STORAGE VIOLATIONS") by executing the target program in emulation mode, under control of another software program - the "monitoring program". - The monitoring program checks that the target program "owns" the memory location it is attempting to modify before execution. Where used, this is typically performed during the Testing and debugging stages of program development although it has been used very sucessfully in later bug detection while attempting to recreate the original bug. - - - If the monitoring code is well designed and reliable, this method will detect 100% of all attempts to overflow buffers, alter protected memory locations and also detect any other "illegal" operations such as attempts to execute "priviliged" instructions such as restricted Kernel (or OS/360 Supervisor calls or "SVC's") with potentially damaging consequences. - - The interplay of many different applications or threads competing within the same address space can create unique and infrequently occuring contentions for memory or system resources with sometimes conflicting priorities. - - - Without access to a monitor program during the testing phase of program development, buffer overflow or storage violations can be the most difficult of all bugs to locate. The bugs themselves may lie dormant for weeks, months or even years and only show themselves after particularly heavy usage of a program or an unusual combunation of data values. - - - The monitoring program adds some additional overhead to the execution of a target program but during testing or bug finding this is usually quite acceptable. If a bug is to be located after a "production" failure (that is - not in a test situation) it is often acceptable to re-run the program with the same data but under control of the monitor in order to try to detect the error. In these circumstances, a small overhead would normally be acceptable for a time. - - - As a "spin off" benefit to the process of storage monitoring, the monitoring program can not only halt execution before any damage is done but, by tracing each and every instruction along the path, can provide invaluable information to the programmer to locate the cause of the bug. The monitoring program normally operates at machine code level but commercial products were frequently able to display the error at pseudo source level dissassembled or even original source level. See "Products" below for examples. - - - Even with this level of error detection it is still possible to miss a bug that only occurs under very special circumstances because only the path actually executed by the target program is monitored for bugs. This is however akin to saying that it is impossible to explore every conceivable set of circumstances through testing - a well established fact in software engineering when there are an exponential number of possible paths through a program.

  • I have removed this and your other changes for now since most of them were unnecessary and made some style changes which were incorrect. I don't see any reason not to include the above as a section in some form, but at present the tone seems a bit off and the whole section is pretty unclear. I don't have time right now, but a bit later I'll have a longer look and suggest a reformatted version. It would help if you could provide some references on this, or links to a few monitoring programs. NicM 15:27, 12 September 2006 (UTC).