Naming convention (programming)

From Wikipedia, the free encyclopedia

In computer programming, a naming convention is a set of rules for choosing the character sequence to be used for identifiers which denote variables, types, functions, and other entities in source code and documentation.

Reasons for using a naming convention (as opposed to allowing programmers to choose any character sequence) include the following:

  • to reduce the effort needed to read and understand source code;[1]
  • to enhance source code appearance (for example, by disallowing overly long names or unclear abbreviations).

The choice of naming conventions can be an enormously controversial issue, with partisans of each holding theirs to be the best and others to be inferior. Colloquially, this is said to be a matter of dogma.[2] Many companies have also established their own set of conventions to best meet their interests.

Potential benefits

Some of the potential benefits that can be obtained by adopting a naming convention include the following:

  • to provide additional information (i.e., metadata) about the use to which an identifier is put;
  • to help formalize expectations and promote consistency within a development team;
  • to enable the use of automated refactoring or search and replace tools with minimal potential for error;
  • to enhance clarity in cases of potential ambiguity;
  • to enhance the aesthetic and professional appearance of work product (for example, by disallowing overly long names, comical or "cute" names, or abbreviations);
  • to help avoid "naming collisions" that might occur when the work product of different organizations is combined (see also: namespaces);
  • to provide meaningful data to be used in project handovers which require submission of program source code and all relevant documentation
  • to provide better understanding in case of code reuse after a long interval of time.

Challenges

The choice of naming conventions (and the extent to which they are enforced) is often a contentious issue, with partisans holding their viewpoint to be the best and others to be inferior. Moreover, even with known and well-defined naming conventions in place, some organizations may fail to consistently adhere to them, causing inconsistency and confusion. These challenges may be exacerbated if the naming convention rules are internally inconsistent, arbitrary, difficult to remember, or otherwise perceived as more burdensome than beneficial.

Business value

Although largely hidden from the view of most business users, well-chosen identifiers make it significantly easier for subsequent generations of analysts and developers to understand what the system is doing and how to fix or extend the source code for new business needs.

For example, although the following:

 a = b * c;

is syntactically correct, its purpose is not evident. Contrast this with:

 weekly_pay = hours_worked * pay_rate;

which implies the intent and meaning of the source code, at least to those familiar with the underlying context of the application.

Common elements

The exact rules of a naming convention depend on the context in which they are employed. Nevertheless, there are several common elements that influence most if not all naming conventions in common use today.

Length of identifiers

A fundamental element of all naming conventions are the rules related to identifier length (i.e., the finite number of individual characters allowed in an identifier). Some rules dictate a fixed numerical bound, while others specify less precise heuristics or guidelines.

Identifier length rules are routinely contested in practice, and subject to much debate academically.

Some considerations:

  • shorter identifiers may be preferred as more expedient, because they are easier to type
  • extremely short identifiers (such as 'i' or 'j') are very difficult to uniquely distinguish using automated search and replace tools
  • longer identifiers may be preferred because short identifiers cannot encode enough information or appear too cryptic
  • longer identifiers may be disfavored because of visual clutter

It is an open research issue whether some programmers prefer shorter identifiers because they are easier to type, or think up, than longer identifiers, or because in many situations a longer identifier simply clutters the visible code and provides no perceived additional benefit.

Brevity in programming could be in part attributed to:

  • early linkers which required variable names to be restricted to 6 characters to save memory. A later "advance" allowed longer variable names to be used for human comprehensibility, but where only the first few characters were significant. In some versions of BASIC such as TRS-80 Level 2 Basic, long names were allowed, but only the first two letters were significant. This feature permitted erroneous behaviour that could be difficult to debug, for example when names such as "VALUE" and "VAT" were used and intended to be distinct.
  • early source code editors lacking autocomplete
  • early low-resolution monitors with limited line length (e.g. only 80 characters)
  • much of computer science originating from mathematics, where variable names are traditionally only a single letter

Letter case and numerals

Some naming conventions limit whether letters may appear in uppercase or lowercase. Other conventions do not restrict letter case, but attach a well-defined interpretation based on letter case. Some naming conventions specify whether alphabetic, numeric, or alphanumeric characters may be used, and if so, in what sequence.

Multiple-word identifiers

A common recommendation is "Use meaningful identifiers." A single word may not be as meaningful, or specific, as multiple words. Consequently, some naming conventions specify rules for the treatment of "compound" identifiers containing more than one word.

As most programming languages do not allow whitespace in identifiers, a method of delimiting each word is needed (to make it easier for subsequent readers to interpret which characters belong to which word).

Delimiter-separated words

One approach is to delimit separate words with a nonalphanumeric character. The two characters commonly used for this purpose are the hyphen ("-") and the underscore ("_"); e.g., the two-word name "two words" would be represented as "two-words" or "two_words". The hyphen is used by nearly all programmers writing COBOL, Forth, and Lisp; it is also common for selector names in Cascading Style Sheets. Most other languages (e.g., languages in the C and Pascal families) reserve the hyphen for use as the subtraction infix operator, so it is not available for use in identifiers and underscores are therefore used instead. See snake case.

Letter-case separated words

Another approach is to indicate word boundaries using medial capitalization (also called "CamelCase" and many other names), thus rendering "two words" as either "twoWords" or "TwoWords". This convention is commonly used in Java, C#, and Visual Basic. Treatment of acronyms in identifiers (e.g. the "XML" and "HTTP" in XMLHttpRequest) varies. Some dictate that they be lowercased (e.g. XmlHttpRequest) to ease typing and readability, whereas others leave them uppercased (e.g. XMLHTTPRequest) for accuracy. A less popular option is to always expand any acronyms (e.g. ExtensibleMarkupLanguageHyperTextTransferProtocolRequest).

Metadata and hybrid conventions

Some naming conventions represent rules or requirements that go beyond the requirements of a specific project or problem domain, and instead reflect a greater overarching set of principles defined by the software architecture, underlying programming language or other kind of cross-project methodology.

Hungarian notation

Perhaps the most well-known is Hungarian notation, which encodes either the purpose ("Apps Hungarian") or the type ("Systems Hungarian") of a variable in its name.[3] For example, the prefix "sz" for the variable szName indicates that the variable is a zero-(that is null-)terminated string.

Positional notation

A style used for very short (8 characters and less) could be: LCCIIL01, where LC would be the application (Letters of Credit), C for COBOL, IIL for the particular process subset, and the 01 a sequence number.

This sort of convention is still in active use in mainframes dependent upon JCL and is also seen in the 8.3 (maximum 8 characters with period separator followed by 3 character file type) MS-DOS style.

Composite word scheme (OF Language)

One of the earliest published convention systems was IBM's "OF Language" documented in a 1980s IMS (Information Management System) manual [citation needed].

It detailed the PRIME-MODIFIER-CLASS word scheme, which consisted of names like "CUST-ACT-NO" to indicate "customer account number".

PRIME words were meant to indicate major "entities" of interest to a system.

MODIFIER words were used for additional refinement, qualification and readability.

CLASS words ideally would be a very short list of data types relevant to a particular application. Common CLASS words might be: NO (number), ID (identifier), TXT (text), AMT (amount), QTY (quantity), FL (flag), CD (code), W (work) and so forth. In practice, the available CLASS words would be a list of less than two dozen terms.

CLASS words, typically positioned on the right (suffix), served much the same purpose as Hungarian notation prefixes.

The purpose of CLASS words, in addition to consistency, was to specify to the programmer the data type of a particular data field. Prior to the acceptance of BOOLEAN (two values only) fields, FL (flag) would indicate a field with only two possible values.

Language-specific conventions

ActionScript

Adobe's Coding Conventions and Best Practices suggests naming standards for ActionScript that are mostly consistent with those of ECMAScript.[citation needed] The style of identifiers is similar to that of Java.

Ada

In Ada, the only recommended style of identifiers is Mixed_Case_With_Underscores.[4]

C and C++

In C and C++, keywords and standard library identifiers are mostly lowercase. In the C standard library, abbreviated names are the most common (e.g. isalnum for a function testing whether a character is alphanumeric), while the C++ standard library often uses an underscore as a word separator (e.g. out_of_range). Identifiers representing macros are, by convention, written using only upper case letters and underscores (this is related to the convention in many programming languages of using all-upper-case identifiers for constants). Names containing double underscore or beginning with an underscore and a capital letter are reserved for implementation (compiler, standard library) and should not be used (e.g. reserved__ or _Reserved).[5][6] This is superficially similar to stropping, but the semantics differ: the underscores are part of the value of the identifier, rather than being quoting characters (as is stropping): the value of __foo is __foo (which is reserved), not foo (but in a different namespace).

Java

In Java, naming conventions for identifiers have been established and suggested by various Java communities such as Sun Microsystems,[7] Netscape,[8] AmbySoft,[9] etc. A sample of naming conventions set by Sun Microsystems are listed below, where a name in "CamelCase" is one composed of a number of words joined without spaces, with each word's initial letter in capitals — for example "CamelCase".

Identifier type Rules for naming Examples
Classes Class names should be nouns in UpperCamelCase, with the first letter of every word capitalised. Use whole words — avoid acronyms and abbreviations (unless the abbreviation is much more widely used than the long form, such as URL or HTML).
  • class Raster;
  • class ImageSprite;
Methods Methods should be verbs in lowerCamelCase or a multi-word name that begins with a verb in lowercase; that is, with the first letter lowercase and the first letters of subsequent words in uppercase.
  • run();
  • runFast();
  • getBackground();
Variables Local variables, instance variables, and class variables are also written in lowerCamelCase. Variable names should not start with underscore (_) or dollar sign ($) characters, even though both are allowed. This is in contrast to other coding conventions that state that underscores should be used to prefix all instance variables.

Variable names should be short yet meaningful. The choice of a variable name should be mnemonic — that is, designed to indicate to the casual observer the intent of its use. One-character variable names should be avoided except for temporary "throwaway" variables. Common names for temporary variables are i, j, k, m, and n for integers; c, d, and e for characters.

  • int i;
  • char c;
  • float myWidth;
Constants Constants should be written in uppercase characters separated by underscores. Constant names may also contain digits if appropriate, but not as the first character.
  • final static int MAX_PARTICIPANTS = 10;

Java compilers do not enforce these rules, but failing to follow them may result in confusion and erroneous code. For example, widget.expand() and Widget.expand() imply significantly different behaviours: widget.expand() implies an invocation to method expand() in an instance named widget, whereas Widget.expand() implies an invocation to static method expand() in class Widget.

One widely used Java coding style dictates that UpperCamelCase be used for classes and lowerCamelCase be used for instances and methods.[7] Recognising this usage, some IDEs, such as Eclipse, implement shortcuts based on CamelCase. For instance, in Eclipse's content assist feature, typing just the upper-case letters of a CamelCase word will suggest any matching class or method name (for example, typing "NPE" and activating content assist could suggest NullPointerException).

Initialisms of three or more letters are CamelCase instead of upper case (e.g., parseDbmXmlFromIPAddress instead of parseDBMXMLFromIPAddress). One may also set the boundary at two or more letters (e.g. parseDbmXmlFromIpAddress).

JavaScript

The built-in JavaScript libraries use the same naming conventions as Java. Classes use upper camel case (RegExp, TypeError, XMLHttpRequest, DOMObject) and methods use lower camel case (getElementById, getElementsByTagNameNS, createCDATASection). In order to be consistent most JavaScript developers follow these conventions.[citation needed] See also: Douglas Crockford's conventions

Lisp

Common practice in most Lisp dialects is to use dashes to separate words in identifiers, as in with-open-file and make-hash-table. Global variable names conventionally start and end with asterisks: *map-walls*. Constants names are marked by plus signs: +map-size+.[10]

.NET

Microsoft .NET recommends UpperCamelCase (a.k.a. "Pascal Style") for most identifiers. (lowerCamelCase is recommended for parameters and variables) and is a shared convention for the .NET languages.[11] Microsoft further recommends that no type prefix hints (also known as Hungarian notation) are used.[12] Instead of using Hungarian notation it is recommended to end the name with the base class' name; LoginButton instead of LoginBtn.[13]

Perl

Perl takes some cues from its C heritage for conventions. Locally scoped variables and subroutine names are lowercase with infix underscores. Subroutines and variables meant to be treated as private are prefixed with an underscore. Package variables are title cased. Declared constants are all caps. Package names are camel case excepting pragmata—e.g., strict and mro—which are lowercase. [14] [15]

Python and Ruby

Python and Ruby both recommend UpperCamelCase for class names, CAPITALIZED_WITH_UNDERSCORES for constants, and lowercase_separated_by_underscores for other names. In Python, if a name is intended to be 'private', it is prefixed by an underscore.[16]

See also

References

External links

This article is issued from Wikipedia. The text is available under the Creative Commons Attribution/Share Alike; additional terms may apply for the media files.