C syntax
From Wikipedia, the free encyclopedia
The syntax of the C programming language is a set of rules that specifies whether the sequence of characters in a file is conforming C source code. The rules specify how the character sequences are to be chunked into tokens (the lexical grammar), the permissible sequences of these tokens and some of the meaning to be attributed to these permissible token sequences (additional meaning is assigned by the semantics of the language).
C syntax makes use of the maximal munch principle.
Contents |
[edit] Data structures
[edit] Primitive data types
Many programming languages, including C, represent numbers in two forms: integral and real. This distinction reflects similar distinctions in the instruction set architecture of most central processing units. Integral data types store numbers in the set of integers, while real numbers represent numbers in the set of real numbers in floating point form.
All C integer types have signed
and unsigned
variants. If signed
or unsigned
is not specified explicitly, in most circumstances signed
is assumed. However, for historic reasons plain char
is a type distinct from both signed char
and unsigned char
. It may be a signed type or an unsigned type, depending on the compiler and the character set (C guarantees that members of the C basic character set have positive values). Also, bit field types specified as plain int
may be signed or unsigned, depending on the compiler.
[edit] Integral types
The integral types come in different sizes, with varying amounts of memory usage and range of representable numbers. Modifiers are used to designate the size: short
, long
and long long
[1]. The character type, whose specifier is char
, represents the smallest addressable storage unit, which is most often an 8-bit byte (but may be larger). The standard header file limits.h defines the minimum and maximum values of the integral primitive data types, amongst other limits. [1] The long long
modifier was introduced in the C99 standard; some compilers had already supported it.
The following tables provides a list of the integral types and their typical storage sizes and acceptable ranges of values, which may vary from one compiler and platform to another. For integral types of guaranteed sizes, ranging from 8 to 64 bits, ISO C provides the stdint.h header.
Implicit Specifier(s) | Explicit Specifier | Bits | Bytes | Minimum Value | Maximum Value |
---|---|---|---|---|---|
signed char |
same | 8 | 1 | -127 | 127 |
unsigned char |
same | 8 | 1 | 0 | 255 |
char |
one of the above | 8 | 1 | −127 or 0 | 127 or 255 |
short |
signed short int |
16 | 2 | −32,767 | 32,767 |
unsigned short |
unsigned short int |
16 | 2 | 0 | 65,535 |
long |
signed long int |
32 | 4 | −2,147,483,647 | 2,147,483,647 |
unsigned long |
unsigned long int |
32 | 4 | 0 | 4,294,967,295 |
long long [1] |
signed long long int |
64 | 8 | −9,223,372,036,854,775,807 | 9,223,372,036,854,775,807 |
unsigned long long [1] |
unsigned long long int |
64 | 8 | 0 | 18,446,744,073,709,551,615 |
1. Support for the long long
type was introduced with C99.
The size and limits of the plain int
type (without the short
, long
, or long long
modifiers) vary much more than the other integral types among C implementations. The Single UNIX Specification specifies that the int
type must be at least 32 bits, but the ISO C standard only requires 16 bits. Refer to limits.h for guaranteed constraints on these data types.
Integral type literal constants may be represented in one of two ways, by an integer type number, or by a single character surrounded by single quotes, for example 48
, -293
, or 'F'
. A character in single quotes, called a "character constant," represents the value of that character in the execution character set (often ASCII). In C, character constants have type int
(in C++, they have type char
).
[edit] Enumerated type
The enumerated type in C, specified with the enum
keyword, and often just called an "enum," is a type designed to represent values across a series of named constants. Each of the enumerated constants has type int
. Each enum
type itself is compatible with char
or a signed or unsigned integer type, but each implementation defines its own rules for choosing a type.
Some compilers warn if an object with enumerated type is assigned a value that is not one of its constants. However, such an object can be assigned any values in the range of their compatible type, and enum
constants can be used anywhere an integer is expected. For this reason, enum
values are often used in place of the preprocessor #define
directives to create a series of named constants.
An enumerated type is declared with the enum
specifier, an optional name for the enum, a list of one or more constants contained within curly braces and separated by commas, and an optional list of variable names. Subsequent references to a specific enumerated type use the enum
keyword and the name of the enum. By default, the first constant in an enumeration is assigned value zero, and each subsequent value is incremented by one over the previous constant. Specific values may also be assigned to constants in the declaration, and any subsequent constants without specific values will be given incremented values from that point onward.
For example, consider the following declaration:
enum colors { RED, GREEN, BLUE = 5, YELLOW } paint_color;
Which declares the enum colors
type; the int
constants RED
(whose value is zero), GREEN
(whose value is one greater than RED
, one), BLUE
(whose value is the given value, five), and YELLOW
(whose value is one greater than BLUE
, six); and the enum colors
variable paint_color
. The constants may be used outside of the context of the enum, and values other than the constants may be assigned to paint_color
, or any other variable of type enum colors
.
[edit] Floating point types
The floating-point form is used to represent numbers with a fractional component. They do not however represent most rational numbers exactly; they are a close approximation instead. There are three types of real values, denoted by their specifier: single-precision (specifier float
), double-precision (double
) and double-extended-precision (long double
). Each of these may represent values in a different form, often one of the IEEE floating point formats.
Floating-point constants may be written in decimal notation, e.g. 1.23. Scientific notation may be used by adding e
or E
followed by a decimal exponent, e.g. 1.23e2 (which has the value 123). Either a decimal point or an exponent is required (otherwise, the number is an integer constant). C99 introduced hexadecimal floating-point constants, which follow similar rules except that they must be prefixed by 0x
and use p
to specify a hexadecimal exponent. Both decimal and hexadecimal floating-point constants may be suffixed by f
or F
to indicate a constant of type float
, by l
or L
to indicate type long double
, or left unsuffixed for a double
constant.
The standard header file float.h
defines the minimum and maximum values of the floating-point types float
, double
, and long double
. It also defines other limits that are relevant to the processing of floating-point numbers.
[edit] Storage duration specifiers
Every object has a storage class, which may be automatic, static, or allocated. Variables declared within a block by default have automatic storage, as do those explicitly declared with the auto
or register
storage class specifiers. The auto
and register
specifiers may only be used within functions and function argument declarations; as such, the auto
specifier is always redundant. Objects declared outside of all blocks and those explicitly declared with the static
storage class specifier have static storage duration.
Objects with automatic storage are local to the block in which they were declared and are discarded when the block is exited. Additionally, objects declared with the register
storage class may be given higher priority by the compiler for access to registers; although they may not actually be stored in registers, objects with this storage class may not be used with the address-of (&
) unary operator. Objects with static storage persist upon exit from the block in which they were declared. In this way, the same object can be accessed by a function across multiple calls. Objects with allocated storage duration are created and destroyed explicitly with malloc
, free
, and related functions.
The extern
storage class specifier indicates that the storage for an object has been defined elsewhere. When used inside a block, it indicates that the storage has been defined by a declaration outside of that block. When used outside of all blocks, it indicates that the storage has been defined outside of the file. The extern
storage class specifier is redundant when used on a function declaration outside any function.
[edit] Type qualifiers
Objects can be qualified to indicate special properties of the data they contain. The const
type qualifier indicates that the value of an object should not change once it has been initialized. Attempting to modify an object qualified with const
yields undefined behavior, so some C implementations store them in read-only segments of memory. The volatile
type qualifier indicates that the value of an object may be changed externally without any action by the program (see volatile variable); it may be completely ignored by the compiler.
[edit] Pointers
In declarations the asterisk modifier (*
) specifies a pointer type. For example, where the specifier int
would refer to the integral type, the specifier int *
refers to the type "pointer to integer". Pointer values associate two pieces of information: a memory address and a data type. The following line of code declares a pointer-to-integer variable called ptr
:
int *ptr;
[edit] Referencing
When a non-static pointer is declared, it has an unspecified value associated with it. The address associated with such a pointer must be changed by assignment prior to using it. In the following example, ptr
is set so that it points to the data associated with the variable a
:
int *ptr; int a; ptr = &a;
In order to accomplish this, the "address-of" operator (unary &
) was used. It produces the memory location of the data object that follows.
[edit] Dereferencing
The pointed-to data can be accessed through a pointer value. In the following example, the integral variable b
is set to the value 10
:
int *ptr; int a, b; a = 10; ptr = &a; b = *ptr;
In order to accomplish that task, the dereference operator (unary *
) was used. It returns the data to which its operand—which must be of pointer type—points. Thus, the expression *ptr
denotes the same value as a
.
[edit] Arrays
[edit] Array declaration
Arrays are used in C to represent structures of consecutive elements of the same type. The declaration of a (fixed-size) array has the following syntax:
int array[100];
which defines an array named array
to hold 100
values of the primitive type int
. When using a C99-conforming compiler, the array dimension may also be a non-constant expression (if declared within a function), in which case memory for the specified number of elements will be allocated. In most contexts in later use, a mention of the variable array
is converted to a pointer to the first item in the array. The sizeof
operator is an exception: sizeof array
yields the size of the entire array (that is, 100 times the size of an int
).
[edit] Accessing elements
The primary facility for accessing the values of the elements of an array is the array subscript operator. To access the i
-indexed element of array
, the syntax would be array[i]
, which refers to the value stored in that array element.
Array subscript numbering begins at 0. The largest allowed array subscript is therefore equal to the number of elements in the array minus 1. To illustrate this, consider an array a
declared as having 10 elements; the first element would be a[0]
and the last element would be a[9]
. C provides no facility for automatic bounds checking for array usage. Though logically the last subscript in an array of 10 elements would be 9, subscripts 10, 11, and so forth could accidentally be specified, with undefined results.
Due to array↔pointer interchangeability, the addresses of each of the array elements can be expressed in equivalent pointer arithmetic. The following table illustrates both methods for the existing array:
Element index | 0 | 1 | 2 | n |
---|---|---|---|---|
Array subscript | array[0] |
array[1] |
array[2] |
array[n] |
Dereferenced pointer | *array |
*(array + 1) |
*(array + 2) |
*(array + n) |
[edit] Dynamic arrays
Recall that a constant is required for the dimension in a declaration of a static array. Often we would prefer to determine the array length as a run-time variable:
int a[n]; a[3] = 10;
This behavior can be imitated with the help of the C standard library. The malloc
function provides a simple method for allocating memory. It takes one parameter: the amount of memory to allocate in bytes. Upon successful allocation, malloc
returns a generic (void *) pointer value, pointing to the beginning of the allocated space. The pointer value returned is converted to an appropriate type implicitly by assignment. If the allocation could not be completed, malloc
returns a null pointer. The following segment is therefore similar in function to the above desired declaration:
#include <stdlib.h> /* declares malloc */ … int *a; a = malloc(n * sizeof(int)); a[3] = 10;
The result is a "pointer to int
" variable (a
) that points to the first of n
contiguous int
objects; due to array↔pointer equivalence this can be used in place of an actual array name, as shown in the last line. The advantage in using this dynamic allocation is that the amount of memory that is allocated to it can be limited to what is actually needed at run time, and this can be changed as needed (using the standard library function realloc
).
When the dynamically-allocated memory is no longer needed, it should be released back to the run-time system. This is done with a call to the free
function. It takes a single parameter: a pointer to previously allocated memory. This is the value that was returned by the call to malloc
. It is sometimes useful to then set the pointer variable to NULL
so that further attempts to access the memory to which it points will fail.
free(a); a = NULL;
[edit] Multidimensional arrays
In addition, C supports arrays of multiple dimensions, which are stored in row-major order. Technically, C multidimensional arrays are just one-dimensional arrays whose elements are arrays. The syntax for declaring multidimensional arrays is as follows:
int array2d[ROWS][COLUMNS];
(where ROWS
and COLUMNS
are constants); this defines a two-dimensional array. Reading the subscripts from left to right, array2d
is an array of length ROWS
, each element of which is an array of COLUMNS
int
s.
To access an integer element in this multidimensional array, one would use
array2d[4][3]
Again, reading from left to right, this accesses the 5th row, 4th element in that row. (Note that array2d[4]
itself denotes an array, which we are then subscripting with the [3]
to access the integer.)
Higher-dimensional arrays can be declared in a similar manner.
A multidimensional array should not be confused with an array of references to arrays (also known as Iliffe vectors or sometimes array of arrays). The former is always rectangular (all subarrays must be the same size), and occupies a contiguous region of memory. The latter is a one-dimensional array of pointers, each of which may point to the first element of a subarray in a different place in memory, and the sub-arrays do not have to be the same size. The latter can be created by multiple use of malloc
.
[edit] Strings
In C, string literals (constants) are surrounded by double quotes ("
), e.g. "Hello world!"
and are compiled to an array of the specified char
values with an additional null terminating character (0-valued) code to mark the end of the string. If you wish to include a double quote inside the string, that can be done by escaping it with a backslash (\
), for example, "This string contains \"double quotes\"."
. To insert a literal backslash, one must double it, e.g. "A backslash looks like this: \\"
.
Backslashes may be used to enter control characters, etc., into a string:
Escape | Meaning |
---|---|
\\ |
Literal backslash |
\" |
Double quote |
\' |
Single quote |
\n |
Newline (line feed) |
\r |
Carriage return |
\b |
Backspace |
\t |
Horizontal tab |
\f |
Form feed |
\a |
Alert (bell) |
\v |
Vertical tab |
\? |
Question mark (used to escape trigraphs) |
\ nnn |
Character with octal value nnn |
\x hh |
Character with hexadecimal value hh |
The use of other backslash escapes is not defined by the C standard.
Individual character constants are represented by single-quotes, e.g. 'A'
, and have type int
(not char
). The difference is that "A"
represents a pointer to the first element of a null-terminated array, whereas 'A'
directly represents the code value (65 if ASCII is used). The same backslash-escapes are supported as for strings, except that (of course) "
can validly be used as a character without being escaped, whereas '
must now be escaped. A character constant cannot be empty (i.e. ''
is invalid syntax), although a string may be (it still has the null-byte terminator). Multi-character constants (e.g. 'xy'
) are valid, although rarely useful — they let one store several characters in an integer (e.g. 4 ASCII characters can fit in a 32-bit integer, 8 in a 64-bit one). Since the order in which the characters are packed into one int
is not specified, portable use of multi-character constants is difficult.
There are several standard library functions for operating with string data (not necessarily constant) organized as array of char
using this null-terminated format; see below.
C's string-literal syntax has been very influential, and has made its way into many other languages, such as C++, Perl, Python, PHP, Java, Javascript, C#, Ruby. Nowadays, almost all new languages adopt or build upon C-style string syntax; languages which lack this syntax tend to precede C.
[edit] Wide character strings
For historical reasons, type char
typically can represent at most 255 distinct character codes, not nearly enough for all the characters in use worldwide. To provide better support for international characters, the first C standard (C89) introduced wide characters (encoded in type wchar_t
) and wide character strings, which are written as L"Hello world!"
Wide characters are most commonly either 2 bytes (UTF-16) or 4 bytes (UTF-32), but Standard C does not specify the width for wchar_t
, leaving the choice to the implementor. Microsoft Windows generally uses UTF-16, thus the above string would be 26 bytes long for a Microsoft compiler; the Unix world prefers UTF-32, thus compilers such as GCC would generate a 52-byte string.
The original C standard specified only minimal functions for operating with wide character strings; in 1995 the standard was modified to include much more extensive support, comparable to that for the legacy char
arrays.
[edit] Library functions
strings, both constant and variable, may be manipulated without using the standard library. However, the library contains many useful functions for working with null-terminated strings. It is the programmer's responsibility to ensure that enough storage has been allocated to hold the resulting strings.
The most commonly used string functions are:
strcat(dest, source)
- appends the stringsource
to the end of stringdest
strchr(s, c)
- finds the first instance of characterc
in strings
and returns a pointer to it or a null pointer ifc
is not foundstrcmp(a, b)
- compares stringsa
andb
(lexicographical ordering); returns negative ifa
is less thanb
, 0 if equal, positive if greater.strcpy(dest, source)
- copies the stringsource
onto the stringdest
strlen(st)
- return the length of stringst
strncat(dest, source, n)
- appends a maximum ofn
characters from the stringsource
to the end of stringdest
and null terminates the string at the end of input or at indexn+1
when the max length is reachedstrncmp(a, b, n)
- compares a maximum ofn
characters from stringsa
andb
(lexical ordering); returns negative ifa
is less thanb
, 0 if equal, positive if greaterstrrchr(s, c)
- finds the last instance of characterc
in strings
and returns a pointer to it or a null pointer ifc
is not found
Other standard string functions include:
strcoll(s1, s2)
- compare two strings according to a locale-specific collating sequencestrcspn(s1, s2)
- returns the index of the first character ins1
that matches any character ins2
strerror(errno)
- returns a string with an error message corresponding to the code inerrno
strncpy(dest, source, n)
- copiesn
characters from the stringsource
onto the stringdest
, substituting null bytes once past the end ofsource
; does not null terminate if max length is reachedstrpbrk(s1, s2)
- returns a pointer to the first character ins1
that matches any character ins2
or a null pointer if not foundstrspn(s1, s2)
- returns the index of the first character ins1
that matches no character ins2
strstr(st, subst)
- returns a pointer to the first occurrence of the stringsubst
inst
or a null pointer if no such substring existsstrtok(s1, s2)
- returns a pointer to a token withins1
delimited by the characters ins2
strxfrm(s1, s2, n)
- transformss2
ontos1
, such thats1
used withstrcmp
gives the same results ass2
used withstrcoll
There is a similar set of functions for handling wide character strings.
[edit] Structures and unions
[edit] Structures
Structures in C are defined as objects consisting of a sequence of named members of various types. They are analogous to records in other programming languages. The members of a structure are stored in consecutive locations in memory, although the compiler is allowed to insert padding between or after members (but not before the first member) for efficiency. The size of a structure is equal to the sum of the sizes of its members, plus the size of the padding.
[edit] Unions
Unions in C are related to structures and are defined as objects that may hold (at different times) objects of different types and sizes. They are analogous to variant records in other programming languages. Unlike structures, the components of a union all refer to the same location in memory. In this way, a union can be used at various times to hold different types of objects, without the need to create a separate object for each new type. The size of a union is equal to the size of its largest component type.
[edit] Declaration
Structures are declared with the struct
keyword and unions are declared with the union
keyword. The specifier keyword is followed by an optional identifier name, which is used to identify the form of the structure or union. The identifier is followed by the declaration of the structure or union's body: a list of member declarations, contained within curly braces, with each declaration terminated by a semicolon. Finally, the declaration concludes with an optional list of identifier names, which are declared as instances of the structure or union.
For example, the following statement declares a structure named s
that contains three members; it will also declare an instance of the structure known as t
:
struct s { int x; float y; char *z; } t;
And the following statement will declare a similar union named u
and an instance of it named n
:
union u { int x; float y; char *z; } n;
Once a structure or union body has been declared and given a name, it can be considered a new data type using the specifier struct
or union
, as appropriate, and the name. For example, the following statement, given the above structure declaration, declares a new instance of the structure s
named r
:
struct s r;
It is also common to use the typedef
specifier to eliminate the need for the struct
or union
keyword in later references to the structure. The first identifier after the body of the structure is taken as the new name for the structure type. For example, the following statement will declare a new type known as s_type
that will contain some structure:
typedef struct {…} s_type;
Future statements can then use the specifier s_type
(instead of the expanded struct …
specifier) to refer to the structure.
[edit] Accessing members
Members are accessed using the name of the instance of a structure or union, a period (.
), and the name of the member. For example, given the declaration of t
from above, the member known as y
(of type float
) can be accessed using the following syntax:
t.y
Structures are commonly accessed through pointers. Consider the following example that defines a pointer to t
, known as ptr_to_t
:
struct s *ptr_to_t = &t;
Member y
of t
can then be accessed as (*ptr_to_t).y
. Because this common operation has somewhat awkward syntax, C also provides the ->
operator as an abbreviation. Using ->
, t
can also be accessed as
ptr_to_t->y
Members of unions are accessed in the same way.
[edit] Initialization
Structures can be initialized in their declarations using initializer lists, similar to arrays. If a structure is not initialized, the values of its members are undefined until assigned. The components of the initializer list must agree, in type and number, with the components of the structure itself.
The following statement will initialize a new instance of the structure s
from above known as pi
:
struct s pi = { 3, 3.1415, "Pi" };
C99 introduces a more flexible initialization syntax for structures, which allows members to be initialized by name. The following initialization using this syntax is equivalent to the previous one. Note that initialization using this syntax may initialize members in any order:
struct s pi = { .x = 3, .y = 3.1415, .z = "Pi "};
In C89, unions can only be initialized with a value of the type of their first members. That is, the union u
from above can only be initialized with a value of type int
. In C99, any one member of a union may be initialized using the new syntax described above..
[edit] Assignment
Assigning values to individual members of structures and unions is syntactically identical to assigning values to any other object. The only difference is that the lvalue of the assignment is the name of the member, as accessed by the syntax mentioned above.
A structure can also be assigned as a unit to another structure of the same type. Structures (and pointers to structures) may also be used as function parameter and return types.
For example, the following statement assigns the value of 74
(the ASCII code point for the letter 't') to the member named x
in the structure t
, from above:
t.x = 74;
And the same assignment, using ptr_to_t
in place of t
, would look like:
ptr_to_t->x = 74;
Assignment with members of unions is identical, except that each new assignment changes the current type of the union, and the previous type and value are lost.
[edit] Other operations
According to the C standard, the only legal operations that can be performed on a structure are copying it, assigning to it as a unit (or initializing it), taking its address with the address-of (&
) unary operator, and accessing its members. Unions have the same restrictions. It is important to note that one of the operations implicitly forbidden is comparison: structures and unions cannot be compared using C's standard comparison facilities (==
, >
, <
, etc.).
[edit] Bit fields
C also provides a special type of structure member known as a bit field, which is an integer with an explicitly specified number of bits. A bit field is declared as a structure member of type int
, signed int
, unsigned int
, or (in C99 only) _Bool
, following the member name by a colon (:
) and the number of bits it should occupy. The total number of bits in a single bit field must not exceed the total number of bits in its declared type.
As a special exception to the usual C syntax rules, it is implementation-defined whether a bit field declared as type int
, without specifying signed
or unsigned
, is signed or unsigned. Thus, it is recommended to explicitly specify signed
or unsigned
on all structure members for portability.
Empty entries consisting of just a colon followed by a number of bits are also allowed; these indicate padding. The members of bit fields do not have addresses, and as such cannot be used with the address-of (&
) unary operator. The sizeof
operator may not be applied to bit fields.
The following declaration declares a new structure type known as f
and an instance of it known as g
. Comments provide a description of each of the members:
struct f { unsigned int flag : 1; /* a bit flag: can either be on (1) or off (0) */ signed int num : 4; /* a signed 4-bit field; range -7...7 or -8...7 */ : 3; /* 3 bits of padding to round out 8 bits */ } g;
[edit] Operators
- Main article: Operators in C and C++
[edit] Control structures
Basically, C is a free-form language.
Note: bracing style varies from programmer to programmer and can be the subject of great debate ("flame wars"). See Indent style for more details.
[edit] Compound statements
In the items in this section, any <statement> can be replaced with a compound statement. In C89, compound statements in C have the form
{ <optional-declaration-list> <optional-statement-list> }
and are used as the body of a function or anywhere that a single statement is expected. The declaration-list declares variables to be used in that scope, and the statement-list are the actions to be performed. Note that brackets define their own scope, and variables defined inside those brackets will be automatically deallocated at the closing bracket. C99 extends this syntax to allow declarations and statements to be freely intermixed within a compound statement (as does C++).
[edit] Selection statements
C has two types of selection statements: the if
statement and the switch
statement.
The if
statement is in the form:
if (<expression>) <statement1> else <statement2>
In the if
statement, if the <expression> in parentheses is nonzero (true), control passes to <statement1>. If the else
clause is present and the <expression> is false, control will pass to <statement2>. The "else <statement2> part is optional, and if absent, a false <expression> will simply result in skipping over the <statement1>. An else
always matches the nearest previous unmatched if
; braces may be used to override this when necessary, or for clarity.
The switch
statement causes control to be transferred to one of several statements depending on the value of an expression, which must have integral type. The substatement controlled by a switch is typically compound. Any statement within the substatement may be labeled with one or more case
labels, which consist of the keyword case
followed by a constant expression and then a colon (:). The syntax is as follows:
switch (<expression>) { case <label1> : <statements 1> case <label2> : <statements 2> break; default : <statements 3> }
No two of the case constants associated with the same switch may have the same value. There may be at most one default
label associated with a switch - if none of the case labels are equal to the expression in the parentheses following switch
, control passes to the default
label, or if there is no default
label, execution resumes just beyond the entire construct. Switches may be nested; a case
or default
label is associated with the innermost switch that contains it. Switch statements can "fall through", that is, when one case section has completed its execution, statements will continue to be executed downward until a break;
statement is encountered. Fall-through is useful in some circumstances, but is usually not desired. In the preceding example, if <label2> is reached, the statements <statements 2> are executed and nothing more inside the braces. However if <label1> is reached, both <statements 1> and <statements 2> are executed since there is no break
to separate the two case statements.
[edit] Iteration statements
C has three forms of iteration statement:
do <statement> while ( <expression> ) ;
while ( <expression> ) <statement>
for ( <expression> ; <expression> ; <expression> ) <statement>
In the while
and do
statements, the substatement is executed repeatedly so long as the value of the expression remains nonzero (true). With while
, the test, including all side effects from the expression, occurs before each execution of the statement; with do
, the test follows each iteration. Thus, a do
statement always executes its substatement at least once, whereas while
may not execute the substatement at all.
If all three expressions are present in a for
, the statement
for (e1; e2; e3) s;
is equivalent to
e1; while (e2) { s; e3; }
except for the behavior of a continue;
statement (which in the for loop jumps to e3
instead of e2
).
Any of the three expressions in the for
loop may be omitted. A missing second expression makes the while
test always nonzero, creating a potentially infinite loop.
C99 generalizes the for
loop by allowing the first expression to take the form of a declaration (typically including an initializer). The declaration's scope is limited to the extent of the for
loop.
[edit] Jump statements
Jump statements transfer control unconditionally. There are four types of jump statements in C: goto
, continue
, break
, and return
.
The goto
statement looks like this:
goto <identifier>;
The identifier must be a label (followed by a colon) located in the current function. Control transfers to the labeled statement.
A continue
statement may appear only within an iteration statement and causes control to pass to the loop-continuation portion of the innermost enclosing iteration statement. That is, within each of the statements
while (expression) { /* ... */ cont: ; }
do { /* ... */ cont: ; } while (expression);
for (expr1; expr2; expr3) { /* ... */ cont: ; }
a continue
not contained within a nested iteration statement is the same as goto cont
.
The break
statement is used to end a for
loop, while
loop, do
loop, or switch
statement. Control passes to the statement following the terminated statement.
A function returns to its caller by the return
statement. When return
is followed by an expression, the value is returned to the caller as the value of the function. Encountering the end of the function is equivalent to a return
with no expression. In that case, if the function is declared as returning a value and the caller tries to use the returned value, the result is undefined.
[edit] Storing the address of a label
GCC extends the C language with a unary && operator that returns the address of a label. This address can be stored in a void* variable type and may be used later in a goto instruction. For example, the following prints "hi " in an infinite loop:
void *ptr = &&J1; J1: printf("hi "); goto *ptr;
This feature can be used to implement a jump table.
[edit] Functions
[edit] Syntax
A C function definition consists of a return type (void
if no value is returned), a unique name, a list of parameters in parentheses (void
if there are none), and various statements. A function with non-void
return type should include at least one return
statement:
<return-type> functionName( <parameter-list> ) { <statements> return <expression of type return-type>; }
where <parameter-list>
of n
variables is declared as data type and variable name separated by a comma:
<data-type> var1, <data-type> var2, ... <data-type> varN
[edit] Function Pointers
A pointer to a function can be declared as follows:
<return-type> (*functionName)(<parameter-list);
The following program shows use of a function pointer for selecting between addition and subtraction:
#include <stdio.h> int (*operation)(int x, int y); int add(int x, int y) { return x + y; } int subtract(int x, int y) { return x - y; } int main(int argc, char* args[]) { int foo = 1, bar = 1; operation = add; printf("%d + %d = %d\n", foo, bar, operation(foo, bar)); operation = subtract; printf("%d - %d = %d\n", foo, bar, operation(foo, bar)); return 0; }
[edit] Global structure
After preprocessing, at the highest level a C program consists of a sequence of declarations at file scope. These may be partitioned into several separate source files, which may be compiled separately; the resulting object modules are then linked along with implementation-provided run-time support modules to produce an executable image.
The declarations introduce functions, variables and types. C functions are akin to the subroutines of Fortran or the procedures of Pascal.
A definition is a special type of declaration. A variable definition sets aside storage and possibly initializes it, a function definition provides its body.
An implementation of C providing all of the standard library functions is called a hosted implementation. Programs written for hosted implementations are required to define a special function called main
, which is the first function called when execution of the program begins.
Hosted implementations of C start program execution by invoking the main
function, which must be defined in a fashion compatible with one of the following prototypes:
int main(void) int main(int argc, char *argv[])
In particular, the function main
must be declared as having an int
return type according to the C Standard. The C standard defines return values 0 and EXIT_SUCCESS
as indicating success and EXIT_FAILURE
as indicating failure. (EXIT_SUCCESS
and EXIT_FAILURE
are defined in <stdlib.h>
). Other return values have implementation defined meanings.
Here is a minimal C program:
int main(void) { return 0; }
The main
function will usually call other functions to help it perform its job.
Some implementations are not hosted, usually because they are not intended to be used with an operating system. Such implementations are called free-standing in the C standard. A free-standing implementation is free to specify how it handles program startup; in particular it need not require a program to define a main
function.
Functions may be written by the programmer or provided by existing libraries. Interfaces for the latter are usually declared by including header files—with the #include
preprocessing directive—and the library objects are linked into the final executable image. Certain library functions, such as printf
, are defined by the C standard; these are referred to as the standard library functions.
A function may return a value to the environment that called it. This is usually another C function; however, the calling environment of the main
function is the parent process in Unix-like systems or the operating system itself in other cases. By definition, the return value zero (or the value of the EXIT_SUCCESS
macro) from main
signifies successful completion of the program. (There is also an EXIT_FAILURE
macro to signify failure.) The printf
function mentioned above returns how many characters were printed, but this value is often ignored.
[edit] Argument passing
In C, arguments are passed to functions by value while other languages may pass variables by reference. This means that the receiving function gets copies of the values and has no direct way of altering the original variables. For a function to alter a variable passed from another function, the caller must pass its address (a pointer to it), which can then be dereferenced in the receiving function (see Pointers for more info):
void incInt(int *y) { (*y)++; // Increase the value of 'x', in main, by one } int main(void) { int x = 0; incInt(&x); // pass a reference to the var 'x' return 0; }
The function scanf works the same way:
int x; scanf("%d", &x);
In order to pass an editable pointer to a function you have to pass a pointer to that pointer; its address:
#include <stdio.h> #include <stdlib.h>
void setInt(int **p, int n) { *p = malloc(sizeof(int)); // allocate a memory area, saving the pointer in the // location pointed to by the parameter "p" if (*p == NULL) { perror("malloc"); exit(EXIT_FAILURE); }
// dereference the given pointer that has been assigned an address // of dynamically allocated memory and set the int to the value of n (42) **p = n; } int main(void) { int *p; // create a pointer to an integer setInt(&p, 42); // pass the address of 'p' return 0; }
int **p
defines a pointer to a pointer, which is the address to the pointer p
in this case.
[edit] Array parameters
Function parameters of array type may at first glance appear to be an exception to C's pass-by-value rule. The following program will print 2, not 1:
#include <stdio.h> int setArray(int array[], int index, int value) { array[index] = value; } int main(void) { int a[1] = {1}; setArray(a, 0, 2); printf ("array[0]=%d\n", array[0]); return 0; }
However, there is a different reason for this behavior. In fact, a function parameter declared with an array type is treated almost exactly like one declared to be a pointer. That is, the preceding declaration of setArray
is equivalent to the following:
int setArray(int *array, int index, int value)
At the same time, C rules for the use of arrays in expressions cause the value of a
in the call to setArray
to be converted to a pointer to the first element of array a
. Thus, in fact this is still an example of pass-by-value, with the caveat that it is the address of the first element of the array being passed by value, not the contents of the array.
[edit] Input/Output
In C, input and output are performed via a group of functions in the standard library. In ISO C, those functions are declared in the <stdio.h>
header.
[edit] Standard I/O
Three standard I/O streams are predefined:
stdin
standard inputstdout
standard outputstderr
standard error
These streams are automatically opened and closed by the runtime environment, they need not and should not be opened explicitly.
The following example demonstrates how a filter program is typically structured:
#include <stdio.h> int main(void) { int c; while ((c = getchar()) != EOF ) { /* do various things to the characters */ if (anErrorOccurs) { fputs("An error occurred\n", stderr); break; } /* ... */ putchar(c); /* ... */ } return 0; }
[edit] File I/O
[edit] Miscellaneous
[edit] Reserved keywords
The following words are reserved, and may not be used as identifiers:
|
|
|
[edit] Case sensitivity
C is case sensitive. Some linkers may map external identifiers to a single case, although this is uncommon these days.
[edit] Comments
Text starting with /*
is treated as a comment and ignored. The comment ends at the next */
and can span multiple lines. Accidental omission of the comment terminator is problematic in that the next comment's properly constructed comment terminator will be used to terminate the initial comment, and all code in between the comments will be considered as a comment.
The C99 standard introduced C++ style line comments. These start with //
and extend to the end of the line.
// this line will be ignored by the compiler /* these lines will be ignored by the compiler */
[edit] Command-line arguments
The parameters given on a command line are passed to a C program with two predefined variables - the count of the command-line arguments in argc
and the individual arguments as character strings in the pointer array argv
. So the command
myFilt p1 p2 p3
results in something like
(Note: While individual strings are contiguous arrays of char
, there is no guarantee that the strings are stored as a contiguous group.)
The name of the program, argv[0]
, may be useful when printing diagnostic messages. The individual values of the parameters may be accessed with argv[1]
, argv[2]
, and argv[3]
, as shown in the following program:
#include <stdio.h> int main(int argc, char *argv[]) { int i; printf ("argc\t= %i\n", argc); for (i = 0; i < argc; i++) printf ("argv[%i]\t= %s\n", i, argv[i]); return 0; }
[edit] Evaluation order
A conforming C compiler can evaluate expressions in any order between sequence points. Sequence points are defined by:
- Statement ends at semicolons.
- The sequencing operator: a comma. However, commas that delimit function arguments are not sequence points.
- The short-circuit operators: logical and (
&&
) and logical or (||
). - The conditional operator (
?:
): This operator evaluates its first sub-expression first, and then its second or third (never both of them) based on the value of the first. - Entry to and exit from a function call (but not between evaluations of the arguments).
Expressions before a sequence point are always evaluated before those after a sequence point. In the case of short-circuit evaluation, the second expression may not be evaluated depending on the result of the first expression. For example, in the expression (a() || b())
, if the first argument evaluates to nonzero (true), the result of the entire expression will also be true, so b()
is not evaluated.
[edit] Undefined behavior
An interesting (though certainly not unique) aspect of the C standard is that the behavior of certain code is said to be "undefined". In practice, this means that the program produced from this code can do anything, from working as intended, to crashing every time it is run.
For example, the following code produces undefined behavior, because the variable b
is modified more than once with no intervening sequence point:
#include <stdio.h> int main(void) { int a, b = 1; a = b++ + b++; printf("%d\n", a); return 0; }
Because there is no sequence point between the modifications of b
in b++ + b++
, it is possible to perform the evaluation steps in more than one order, resulting in an ambiguous statement. This can be fixed by rewriting the code to insert a sequence point:
a = b++; a += b++;
[edit] See also
[edit] References
- Kernighan, Brian W., and Dennis M. Ritchie. The C Programming Language. 2nd ed. Upper Saddle River, New Jersey: Prentice Hall PTR, 1988.
[edit] External links
- The syntax of C in Backus-Naur form
- Programming in C
- The comp.lang.c Frequently Asked Questions Page
C programming language | |
---|---|
Libraries: | C standard library | glibc | Dietlibc | uClibc | Newlib |
History: | Criticism of the C programming language |
Language Features: | String | Syntax | Preprocessor | Variable types and declarations | Functions |
Dialects: | C++ | Objective-C |
C and Other Languages: | Compatibility of C and C++ | Operators in C and C++ | Comparison of Pascal and C | C to Java byte-code compiler |