Unified Code Count (UCC)

Unified Code Count (UCC)

USC Unified CodeCount (UCC)v.201007, first release to feature GUI interface of UCC utility
Original author(s) Vu Nguyen
Developer(s) USC CSSE
Initial release 2009, 2–3 years ago
Written in C++
Operating system Cross-platform
Available in English
Type File comparison Tool
License USC-CSSE Limited Public License
Website http://sunset.usc.edu/research/CODECOUNT/

The Unified Code Counter (UCC) is a comprehensive software lines of code counter produced by the USC Center for Systems and Software Engineering. It is available to the general public as open source code and can be compiled with any ANSI standard C++ compiler.

Contents

Introduction

One of the major problems in software estimation is sizing which is also one of the most important attributes of a software product. It is not only the key indicator of software cost and time but also is a base unit to derive other metrics for project status and software quality measurement. The size metric is used as an essential input for most of cost estimation models[1] such as COCOMO, SLIM, SEER-SEM, and Price-S. Although source lines of code or SLOC is a widely accepted sizing metric, in general there is a lack of standards that enforce consistency of what and how to count SLOC.

The University of Southern California (USC) Center for Systems and Software Engineering (CSSE) has developed and released a code counting toolset called the Unified CodeCount(UCC), which ensures consistency across independent organizations in the rules used to count software lines of code. The primary purpose is to support sizing software counts and metrics for historical data collection and reporting purposes. It implements the popular code counting standards published by the Software Engineering Institute (SEI) and adapted by COCOMO. Logical and physical SLOC are among the metrics generated by the toolset. SLOC refers to Source Lines of Code and is a unit used to measure the size of software program based on a set of rules[2]. SLOC is a key input for estimating project effort and is also used to calculate productivity and other measurements. There are two types of SLOC: physical and logical sloc. Physical SLOC (PSLOC)– One physical SLOC corresponds to one line starting with the first character and ending by a carriage return or an end-of-file marker of the same line. Blank and comment lines are not counted. Logical SLOC (LSLOC)– Lines of code intended to measure “statements”, which normally terminate by a semicolon (C/C++, Java, C#) or a carriage return (VB, Assembly), etc. Logical SLOC are not sensitive to format and style conventions, but they are language dependent.

The Unified CodeCount(UCC) differencing capability allows the user to count, compare, and collect logical differentials between two versions of the source code of a software product. The differencing capability allows users to count the number of added/new, deleted, modified, and unmodified logical SLOC of the current version in comparison with the previous version.

History

Many different code counting tools existed in the early 2000s. However, due to the lack of standard counting rules and software accessibility issues, the National Reconnaissance Organization Cost Analysis and Improvement Group (NCAIG) identified the need for a new code counting tool to analyze software program costs. In order to avoid any industry bias, the CodeCount tool[3] was developed at the esteemed USC Center of Systems and Software Engineering (USC CSSE) under the direction of Dr. Barry Boehm, Merilee Wheaton, and A. Winsor Brown, with IV&V provided by The Aerospace Corporation. Many organizations including the Northrop Grumman and Boeing Corporations donated several code counting tools to the USC CSSE. The goal was to develop a public domain code counting tool that handles multiple languages and produces consistent results for large and small software systems.

Project plans are developed every semester, and graduate students from USC doing directed research are assigned projects to update the code count tool. Vu Nguyen, a PhD student at USC, led several semesters of student projects. All changes are verified and validated by the Aerospace Corporation IV & V team which works closely with the USC Instructor on the projects. The beta versions are tested by industry Affiliates, and then released to the public as open source code.

In 2006, work was done to develop a differencing tool which would compare two software system baselines to determine the differences between two versions of software. The CodeCount tool set, which is a precursor of UCC, was released in the year 2007. It was a collection of standalone programs written in a single language to measure source code written in languages like COBOL, Assembly, PL/1, Pascal, and Jovial.

Nguyen produced the Unified CodeCount (UCC) system design as a framework and the existing code counters and differencing tool were merged into it. Additional features like unified counting and differencing capabilities, detecting duplicate files, support for text and CSV output files, etc. were also added. A presentation on “Unified Code Count with Differencing Functionality” was presented in the 24th International Forum on COCOMO in October 2009[4].

UCC tool has been released to the public with a license[5] enabling users to use and modify the code; if the modifications are to be distributed, the user must send a copy of the modifications to USC CSSE. The US Government has made UCC its standard for code counting, and it has been specified in many government software contracts.

Importance

The Unified CodeCount (UCC) is used to analyze existing projects for physical and logical SLOC counts which directly relate to work accomplished. The data collected can then be used by software cost estimation models to accurately estimate time and cost taken for similar projects to get to a successful conclusion. There are many code count tools available in the market, however most have various draw backs such as:

The University of Southern California Center for Systems and Software Engineering was approached by the NRO Cost Analysis and Improvement Group (NCAIG) to create a code counting solution developed by non-biased, industry-respected institution and which provides the following features:

  • Consistently
  • With documented standards
  • Ability to easily add new languages
  • Support and maintenance
  • Determine addition, modification, deletion

The UCC is the result of that effort, and is available as open source to the general public.

Features

The Unified CodeCount Toolset with Differencing Functionality (UCC) is a collection of tools designed to automate the collection of source code sizing and change information. The UCC runs on multiple programming languages and focuses on two possible Source Lines of Code (SLOC) definitions, physical and/or logical. The Differencing functionality can be used to compare two baselines of software systems and determine change metrics: SLOC addition, deletion, modification, and non-modification counts.

The UCC toolset is copyright USC Center for Software Engineering but is made available with a Limited Public License which allows anyone to make modifications on the code. However, if they distribute that modified code to others, the person or agency has to return a copy to USC so the toolset can be improved for the benefit of all.

Uses of CodeCount

a) Logical SLOC
b) Physical SLOC
c) Comment
d) Executable, data declaration
e) Compiler directive SLOC
f) Keywords

Functionality of CodeCount

CodeCount is written in C/C++, and utilizes relatively simple algorithms to recognize comments and physical/logical lines. Testing has shown the UCC to process acceptably fast except in extreme situations. A number of switches are available to inhibit certain types of processing if needed. Users may be able to compile using optimization switches for faster execution; refer to the users manual the compiler being used.
CodeCount has been tested extensively in the laboratory, and is being used globally. There is a defect-reporting capability, and any defects reported are corrected promptly. It is not uncommon for users to add functionality or correct defects and notify the UCC managers along with providing the code for the changes.
The UCC open source distribution contains Release Notes, User’s Manual, and Code Counting Standards for the language counters. The source code contains file headers and in-line comments. The UCC Software Development Plan, Software Requirements Specification, and Software Test Plan are available upon request.
The UCC is a monolithic, object-oriented toolset which facilitates ease of maintenance.
The "CSCI" CodeCount flavor lends itself to ease of extension. Users are able to easily add another language counter on their own. Users may also specify which file extensions will select a particular language counter.
CodeCount is the clear winner if compatibility with the COCOMO estimation mechanism is required or desired. CodeCount also wins if compatibility with companies already using CodeCount is desired.
CodeCount has been tested on a wide variety of operating systems and hardware platforms and found to be portable to any environment that has an ANSI standard C++ compiler.
Source code for CodeCount is available as a downloadable zip file.
Source code for CodeCount is provided under the terms of the USC-CSE Limited Public License, which allows anyone to make modifications on the code. However, if they distribute that modified code to others, the person or agency has to return a copy to USC so the toolset can be improved for the benefit of all. The full text of the license can be viewed at UCC License.

Standards for the Language

The main objective for the Unified CodeCount (UCC) is to provide counting methods that define a consistent and repeatable SLOC measurement. There are more than 20 SLOC counting applications, each of which produces the different physical and logical SLOC count, with some 75 commercially available software cost estimating tools existing in today’s market. The differences in cost results from the various tools show the deficiencies of the current techniques in estimating the size of the code, particularly true for the projects of the large magnitude,[6] where cost estimation depends on automatic procedures to generate reasonably accurate predictions. This led to the need of a universal SLOC counting standard which would produce consistent results.

SLOC serves as a main factor for cost estimation techniques. Although it is not the sole contributor to software cost estimation, it does provide the foundation for a number of metrics that are derived throughout the software development life cycle. The SLOC counting procedure can be automated, requiring less time and effort to produce metrics. A well defined set of rules identify what to include and exclude in SLOC counting measures. The two most accepted measures for SLOC are the number of physical and logical lines of code.

In the UCC, logical SLOC measures the total number of source statements in a block of code. The three types of statements are: executable, declaration and compiler directives. Executable statements are eventually translated into machine code to cause run-time actions, while declaration and compiler directive statements affect compiler’s actions.

The UCC treats the source statements as independent units at source code level, where a programmer constructs a statement and its sub-statements completely. The UCC assumes that the source code will compile; otherwise the results are unreliable. A big challenge was to decide the ends of each statement for counting logical SLOC. The semicolon option may sound appealing, but not all the popular languages uses the semicolon (like SQL, JavaScript, UNIX scripting languages, etc.). The Software Engineering Institute (SEI) at Carnegie Mellon University and COCOMO II SLOC defined a way to count ‘how many of what program elements’. The table 1 and 2 illustrates the summary of SLOC counting rules [7] for logical lines of code for C/C++, Java, and C# programming languages. The UCC Code Counting Rules for each language are distributed with the open source release.

Measurement Unit Order of Precedence Physical SLOC
Executable lines
Statements 1 One per line
Non-executable lines
Declaration (Data) lines 2 One per line
Compiler directives 3 One per line
Table 1. Physical SLOC Counting Rules
Structure Order of Precedence Logical SLOC
SELECTION STATEMENTS: 1 Count once per each occurrence.
if, else if, else, “?” operator, try , catch, switch Nested statements are counted in the similar fashion.
ITERATION STATEMENTS: 2 Count once per each occurrence.
For, while, do..while Initialization, condition and increment within the “for” construct are not counted. i.e.
    for ( i= 0; i < 5; i++)…
In addition, any optional expressions within the “for” construct are not counted either, e.g.
   for (i = 0, j = 5; i < 5, j > 0; i++, j--)…
Braces {…} enclosed in iteration statements and semicolon that follows “while” in “do..while” structure are not counted.
JUMP STATEMENTS: 3 Count once per each occurrence.
Return, break, goto, exit, continue, throw Labels used with “goto” statements are not counted.
EXPRESSION STATEMENTS: 4 Count once per each occurrence.
Function call, assignment, empty statement Empty statements do not affect the logic of the program, and usually serve as placeholders or to consume CPU for timing purposes.
STATEMENTS IN GENERAL: 5 Count once per each occurrence.
Statements ending by a semicolon Semicolons within “for” statement or as stated in the comment section for “do..while” statement are not counted.
BLOCK DELIMITERS, BRACES 6 Count once per pair of braces {..},
except where a closing brace is followed by a semicolon, i.e.
 };.
Braces used with selection and Iteration statements are not counted. Function definition is counted once since it is followed by a set of braces.
COMPILER DIRECTIVE 7 Count once per each occurrence.
DATA DECLARATION 8 Count once per each occurrence.
Includes function prototypes, variable declarations, “typedef” statements. Keywords like “struct”, “class” do not count.
Table 2. Logical SLOC Counting Rules for C/C++, Java, and C#

Software design

The Unified CodeCount (UCC) produces the counting by capturing the LSLOC strings from a file based on the Counting Standards Document created for each language. The differencing feature compares the LSLOC strings from the two files captured during the counting process with the help of a common engine.

UCC Architecture

The main architecture of UCC can be seen as a hierarchical structure of the following components:

1. MainObject

The MainObject is the top level class which performs the command line parsing, to extract the list of files from the command parameters and then reads each file into the memory for counting or differentiation. The MainObject calls the CodeCounters in order to process the embedded languages. The output of the counting function provides the following sets of files(.txt) for duplicate and counting/complexity results:

<LANG>_outfile.txt, is the file where Main displays the counting results for source files of <LANG>.<LANG> is the name of the language of the source files, e.g., C_CPP for C/C++ files and Java for Java files.
outfile_cplx.txt, which shows the complexity results for the source file.
Duplicates-<LANG>_outfile.txt, displays the list of duplicate files for the language <LANG>.
Duplicates-outfile_cplx.txt, contains the complexity results for the duplicated files.
DuplicatePairs.txt, is a text file listing matches between a source file and its duplicate file.

2. DiffTool

DiffTool is the derivative of MainObject, which parses the command line parameters and processes the list of files for each baseline. The DiffTool class provides the following sets of files(.txt,.csv) across baselines:

Baseline-<A|B>-<LANG>_outfile.txt, counts results for source files of <LANG> for Baseline A and Baseline B.
Baseline-<A|B>-<LANG>_cplx.txt, Complexity results for Baseline A and Baseline B.
MatchedPairs, A text file listing matches between files in Baseline A and Baseline B.
outfile_diff_results.txt, Main differencing results in the plain text format.
outfile_diff_results.csv, Main differencing results in .csv format that can be opened using MS Excel.

DiffTool performs the comparison between baselines, with the help of ‘CmpMngr’ class.

3. CmpMngr

CmpMngr calculates the differences by comparing two lists of LSLOC and determines the variations by calculating total LSLOC added, deleted, modified, unmodified from the two lists.

4. CCodeCounter

The CCodeCounter is used for pre-count processing, where it performs the following operations:

• Counts the blank lines and comments,
• Filters the literal strings,
• Counts the complexity of keywords, operators, etc
• Counts the compiler directive SLOC (using CountDirectiveSLOC method).
• Performs the language specific processing (creates sub classes).

Future enhancements and release

Future plans for UCC include improving complexity metrics computation, providing support for existing code counters and adding new counters for additional languages, better reporting, and improving performance. Counters for text, assembly, Cobol, Jovial, Matlab, and Pascal are in development. Also, a graphical user interface is being produced which may be used in place of the current command line interface.

System Requirements

A. Hardware

B. Software Operating Systems

C. Compilers Supported

See also

References

External links