Clustering coefficient

From Wikipedia, the free encyclopedia

Example clustering coefficient on an undirected graph for the shaded node i. Black edges are nodes connecting neighbors of i, and dotted red edges are for unused possible edges.
Example clustering coefficient on an undirected graph for the shaded node i. Black edges are nodes connecting neighbors of i, and dotted red edges are for unused possible edges.

Duncan J. Watts and Steven Strogatz (1998) introduced the clustering coefficient1 graph measure to determine whether or not a graph is a small-world network.

First, let us define a graph in terms of a set of n vertices \scriptstyle{V={v_1,v_2,\dots,v_n}} and a set of edges E, where eij denotes an edge between vertices vi and vj. Below we assume vi, vj and vk are members of V.

We define the neighbourhood N for a vertex vi as its immediately connected neighbours as follows:

N_i = \{v_j\} : e_{ij} \in E.

The degree ki of a vertex is the number of vertices, | Ni | , in its neighbourhood Ni.

The clustering coefficient Ci for a vertex vi is the proportion of links between the vertices within its neighbourhood divided by the number of links that could possibly exist between them. For a directed graph, eij is distinct from eji, and therefore for each neighbourhood Ni there are ki(ki − 1) links that could exist among the vertices within the neighbourhood. Thus, the clustering coefficient is given as

C_i = \frac{|\{e_{jk}\}|}{k_i(k_i-1)} : v_j,v_k \in N_i, e_{jk} \in E.

An undirected graph has the property that eij and eji are considered identical. Therefore, if a vertex vi has ki neighbours, \frac{k_i(k_i-1)}{2} edges could exist among the vertices within the neighbourhood. Thus, the clustering coefficient for undirected graphs can be defined as

C_i = \frac{2|\{e_{jk}\}|}{k_i(k_i-1)} : v_j,v_k \in N_i, e_{ij} \in E.

Let λG(v) be the number of triangles on v \in V(G) for undirected graph G. That is, λG(v) is the number of subgraphs of G with 3 edges and 3 vertices, one of which is v. Let τG(v) be the number of triples on v \in G. That is, τG(v) is the number of subgraphs (not necessarily induced) with 2 edges and 3 vertices, one of which is v and such that v is incident to both edges. Then we can also define the clustering coefficient as

C_i = \frac{\lambda_G(v)}{\tau_G(v)}.

It is simple to show that the two preceding definitions are the same, since

\tau_G(v) = C({k_i},2) = \frac{1}{2}k_i(k_i-1).

These measures are 1 if every neighbour connected to vi is also connected to every other vertex within the neighbourhood, and 0 if no vertex that is connected to vi connects to any other vertex that is connected to vi.

The clustering coefficient for the whole system is given by Watts and Strogatz as the average of the clustering coefficient for each vertex:

\overline{C} = \frac{1}{n}\sum_{i=1}^{n} C_i.

[edit] References

1 Watts, D. J. and Strogatz, S. H. "Collective dynamics of 'small-world' networks." [1]

In other languages