Secure input and output handling

Secure input and output handling are secure programming techniques designed to prevent security bugs and the exploitation thereof.

Input handling

Input handling is how an application, server or other computing system handles the input supplied from users, clients, or a computer network.

Secure input handling is often required to prevent vulnerabilities related to Code injection, Directory traversal and so on.

Input validation

Validating (or sanitizing) user input is to ensure that input is safe prior to use.

The most secure way to do this is to Terminate on suspicious input and use a Whitelist strategy to determine if execution should be terminated or not. This behavior is however not always preferred from a usability point of view.

Whitelist or Blacklist?

In computer security, there are often known good inputs — input the developer is completely certain is safe. There are also known bad characters; input the developer is certain is unsafe (can cause Code injection etc.). Based on this, two different approaches to how input should be managed exists:

Whitelist (known goods). A Whitelist is a list of "known good inputs". A Whitelist is basically a list which says "A, B and C is good (and everything else is bad)".
Blacklist (known bads). A Blacklist is a list of "known bad inputs". A Blacklist is basically a list which says "A, B and C is bad (and everything else is good)".

Security professionals tend to prefer Whitelists, because Blacklists may accidentally treat bad input as safe. However, in some cases a whitelist solution may not be easily implemented.

Terminate/stop/abort on input problems

This is a very safe strategy. If unexpected characters occur in input, abort execution. But if implemented poorly, it can lead to a denial-of-service attack in which the attacker floods the system with unexpected input, forcing the system to expend scarce processing and communication resources on rejecting it.

Filter input

Filtering input is used as a less orthodox security principle than Terminate/stop/abort on input problems.

The benefit of the filter approach is that to end-users, the security mechanism often behaves in a less intrusive manner. For example, if "*" is illegal, then "I ***LOVE*** you" will just become "I LOVE you", which is experienced as a minor but acceptable oddity.
The downside is that the filter approach is a bit difficult to get right — in practice many applications have the filter applied at one place in the code, but the programmer accidentally uses the unfiltered input at another place.

Filter input: Automatic taint checking

Some programming languages have built-in support for taint checking. These languages throw compile time or run time exceptions whenever a variable derived from user input is used in a risky way, e.g. to execute a shell command.

Filter input: Whitelist filters (Filter in known goods)

Example:

An input filter which expects all characters to be of charset A-Za-z is used to protect a UNIX application from shell injection.
Attacker supplies input ; ls -l / to attempt shell injection.
Filter is applied to input.
Characters ; - / are thrown away by filter because they are not in whitelist.
Characters lsl are kept by filter because they are in whitelist.
Exploit attempt fails because only safe input remains.

Filter input: Blacklist filters (Filter out known bads)

A strategy that is usually insufficient is to filter out known bads. If the characters in the set [:;.-/] are known to be bad, but ; ls -l / is received, the original input is replaced with ls l (;-/ are thrown away). This strategy has several problems:

It does not protect against unknown threats. There may be other "bad" inputs that the developer did not consider.
It does not protect against future threats. Inputs that are safe at present may obtain a dangerous interpretation if the underlying language changes. For example, a UNIX command line security filter designed to stop attacks against C shell will be insecure if the software is moved to an environment using bash.

Encode (escape) input

To keep malicious inputs contained, any inputs written to the database need to be encoded.

SQL encoding: ' OR 1=1 --' is encoded to \ \'\ OR\ 1\=1\ \-\-'

In PHP this can be done with the function mysql_real_escape_string()^[1] or with PDO::quote()^[2]

Output handling

Output handling is how an application, server or system handles the output (e.g. generating HTML, printing, logging, ...). It is important to keep in mind output often contains input supplied from users, clients, network, databases etc.

Secure output handling is primarily associated with preventing Cross-site Scripting (XSS) vulnerabilities, but could also prove to be important in similar areas (e.g. if generating Microsoft Office documents with some API, output management could potentially be required to prevent macro-injections).

Encode (escape) output

"Encoding" processes content that is about to be output so that any potentially dangerous characters are made safe. Characters from a typical known safe charset for the particular destination medium are often left as they are. A simple encoding might leave alone alphanumerics a–z, A–Z and 0–9. Any other characters could be possibly interpreted in an unexpected manner, and are therefore replaced with the appropriate "encoded" representation.

HTML encoding: <script> is encoded to <script>

In PHP this can be done with the function htmlspecialchars()^[3]

References

↑ "mysql_real_escape_string - Manual". PHP. 2012-11-02. Retrieved 2012-11-08.
↑ "PDO::quote - Manual". PHP. 2012-11-02. Retrieved 2012-11-08.
↑ "htmlspecialchars - Manual". PHP. 2012-11-02. Retrieved 2012-11-08.