scanf format string
Scanf format string (which stands for "scan formatted") refers to a control parameter used by a class of functions in the string-processing libraries of various programming languages. The format string specifies a method for reading a string into an arbitrary number of varied data type parameter(s). The input string is by default read from the standard input, but variants exist that read the input from other sources.
The term "scanf" is due to the C language, which popularized this type of function, but these functions predate C, and other names are used, such as "readf" in ALGOL 68. Scanf format strings, which provide formatted input (parsing), are complementary to printf format strings, which provide formatted output (templating). In both cases these provide simple functionality and fixed format compared to more sophisticated and flexible parsers or template engines, but are sufficient for many purposes.
History
Mike Lesk's portable input/output library, including scanf
, officially became part of Unix in Version 7.[1]
Usage
The scanf
function, which is found in C, reads input for numbers and other datatypes from standard input (often a command line interface or similar kind of a text user interface).
The following shows code in C that reads a variable number of unformatted decimal integers from the console and prints out each of them on a separate line:
#include <stdio.h>
int main(void)
{
int n;
while (scanf("%d", &n) == 1)
printf("%d\n", n);
return 0;
}
After being processed by the program above, an irregularly spaced list of integers such as
456 123 789 456 12 456 1 2378
will appear consistently spaced as:
456 123 789 456 12 456 1 2378
To print out a word:
#include <stdio.h>
int main(void)
{
char word[20];
if (scanf("%19s", word) == 1)
puts(word);
return 0;
}
No matter what the datatype the programmer wants the program to read, the arguments (such as &n
above) must be pointers pointing to memory. Otherwise, the function will not perform correctly because it will be attempting to overwrite the wrong sections of memory, rather than pointing to the memory location of the variable you are attempting to get input for.
In the last example an address-of operator (&
) is not used for the argument: as word
is the name of an array of char
, as such it is (in all contexts in which it evaluates to an address) equivalent to a pointer to the first element of the array. While the expression &word
would numerically evaluate to the same value, semantically it has an entirely different meaning in that it stands for the address of the whole array rather than an element of it. This fact needs to be kept in mind when assigning scanf
output to strings.
As scanf
is designated to read only from standard input, many programming languages with interfaces, such as PHP, have derivatives such as sscanf
and fscanf
but not scanf
itself.
Format string specifications
The formatting placeholders in scanf
are more or less the same as that in printf
, its reverse function.
There are rarely constants (i.e. characters that are not formatting placeholders) in a format string, mainly because a program is usually not designed to read known data. The exception is one or more whitespace characters, which discards all whitespace characters in the input.
Some of the most commonly used placeholders follow:
-
%d
: Scan an integer as a signed decimal number. -
%i
: Scan an integer as a signed number. Similar to%d
, but interprets the number as hexadecimal when preceded by0x
and octal when preceded by0
. For example, the string031
would be read as 31 using%d
, and 25 using%i
. The flagh
in%hi
indicates conversion to ashort
andhh
conversion to achar
. -
%u
: Scan for decimalunsigned int
(Note that in the C99 standard the input value minus sign is optional, so if a minus sign is read, no errors will arise and the result will be the two's complement of a negative number, likely a very large value. Seestrtoul()
.) Correspondingly,%hu
scans for anunsigned short
and%hhu
for anunsigned char
. -
%f
: Scan a floating-point number in normal (fixed-point) notation. -
%g
,%G
: Scan a floating-point number in either normal or exponential notation.%g
uses lower-case letters and%G
uses upper-case. -
%x
,%X
: Scan an integer as an unsigned hexadecimal number. -
%o
: Scan an integer as an octal number. -
%s
: Scan a character string. The scan terminates at whitespace. A null character is stored at the end of the string, which means that the buffer supplied must be at least one character longer than the specified input length. -
%c
: Scan a character (char). No null character is added. - whitespace: Any whitespace characters trigger a scan for zero or more whitespace characters. The number and type of whitespace characters do not need to match in either direction.
-
%lf
: Scan as a double floating-point number. -
%Lf
: Scan as a long double floating-point number.
The above can be used in compound with numeric modifiers and the l
, L
modifiers which stand for "long" in between the percent symbol and the letter. There can also be numeric values between the percent symbol and the letters, preceding the long
modifiers if any, that specifies the number of characters to be scanned. An optional asterisk (*
) right after the percent symbol denotes that the datum read by this format specifier is not to be stored in a variable. No argument behind the format string should be included for this dropped variable.
The ff
modifier in printf is not present in scanf, causing differences between modes of input and output. The ll
and hh
modifiers are not present in the C90 standard, but are present in the C99 standard.[2]
An example of a format string is
"%7d%s %c%lf"
The above format string scans the first seven characters as a decimal integer, then reads the remaining as a string until a space, new line or tab is found, then scans the first non-whitespace character following and a double-precision floating-point number afterwards.
Error handling
scanf
is usually used in situations when the program cannot guarantee that the input is in the expected format. Therefore a robust program must check whether the scanf
call succeeded and take appropriate action. If the input was not in the correct format, the erroneous data will still be on the input stream and must be read and discarded before new input can be read. An alternative method of reading input, which avoids this, is to use fgets
and then examine the string read in. The last step can be done by sscanf
, for example.
Vulnerabilities
Like printf
, scanf
is vulnerable to format string attacks. Great care should be taken to ensure that the formatting string includes limitations for string and array sizes. In most cases the input string size from a user is arbitrary; it can not be determined before the scanf
function is executed. This means that uses of %s
placeholders without length specifiers are inherently insecure and exploitable for buffer overflows. Another potential problem is to allow dynamic formatting strings, for example formatting strings stored in configuration files or other user controlled files. In this case the allowed input length of string sizes can not be specified unless the formatting string is checked beforehand and limitations are enforced. Related to this are additional or mismatched formatting placeholders which do not match the actual vararg list. These placeholders might be partially extracted from the stack, contain undesirable or even insecure pointers depending on the particular implementation of varargs.
See also
References
- ↑ McIlroy, M. D. (1987). A Research Unix reader: annotated excerpts from the Programmer's Manual, 1971–1986 (PDF) (Technical report). CSTR. Bell Labs. 139.
- ↑ C99 standard, §7.19.6.2 "The fscanf function" alinea 11.
External links
- – System Interfaces Reference, The Single UNIX Specification, Issue 7 from The Open Group
- C++ reference for
std::scanf