NETWORK BASICS

Adjancency matrix

In graph/network theory: adjacency matrix is a square matrix used to represent a finite graph. Elements indicate whether pairs of vertices/nodes are adjacent/connected. Boolean. Simple finite graph: 1/0 representing vertices that are connected/not connected with an edge

Distance matrix

Same size but elements indicate the distance between vertices rather than just connectivity (more relevant to our data? ie 0-1 correlations = distance/weight?). Hierarchical clustering trees decomposes into distance matrix.


NETWORK CLUSTERS/ MODULES

One network feature that has been emphasized in recent work is community structure, the gathering of vertices into groups such that there is a higher density of edges within groups than between them

Modularity

First mentioned in Clauset et al 2004 were they describe a new algorithm for inferring community structure from network topology which works by greedily optimizing the modularity.

Modularity [21] is a property of a network and a specific proposed division of that network into communities. Girvan and Newman [20, 21] proposed a divisive algorithm that uses edge betweenness as a metric to identify the boundaries of such communities. Computationally expensive. It measures when the division is a good one, in the sense that there are many edges within communities and only a few between them:

  • will be large for good divisions of the network, in the sense of having many within-community edges
  • on its own, not a good measure of community structure (it takes its largest value of 1 in the trivial case where all vertices belong to a single community).
  • However, if we subtract from it the expected value of the same quantity in the case of a randomized network, we do get a useful measure.
  • Nonzero values represent deviations from randomness, and in practice it is found that a value above about 0.3 is a good indicator of significant community structure in a network.
  • While finding the global maximum modularity over all possible divisions seems hard in general, reasonably good solutions can be found with approximate optimization techniques. The algorithm proposed in (Newman 2004, Fast algorithm for detecting community structure in networks) uses a greedy optimization in which, starting with each vertex being the sole member of a community of one, we repeatedly join together the two communities whose amalgamation produces the largest increase in Q.
  • The entire process can be represented as a tree whose leaves are the vertices of the original network and whose internal nodes correspond to the joins. This dendrogram represents a hierarchical decomposition of the network into communities at all levels.



Identifying hierarchical structure in networks:

Hierarchical structure and the prediction of missing links in networks Aaron Clauset, Cristopher Moore & M. E. J. Newman

the existence of hierarchy can simultaneously explain and quantitatively reproduce many commonly observed topological properties of networks, such as right-skewed degree distributions, high clustering coefficients and short path lengths.

Hierarchical structure goes beyond simple clustering, however, by explicitly including organiza- tion at all scales in a network simultaneously.

Sales-Pardo et al. 2007 Extracting the hierarchical organization of complex systems

A central idea in biology is that life processes are hierarchically organized (2–4). Additionally, it seems plausible that this hierarchical structure plays an important role in the system’s dynamics.

Complex networks are convenient representations of the interactions within complex systems. Here, we focus on the identification of inclusion hierarchies in complex networks, that is, to the unraveling of the nested organization of the nodes in a network into modules, which in turn are composed of submodules and so on.

Method must identify the different levels in the hierarchy as well as the number of modules and their composition at each level. Methods whose output is subject to interpretation excluded i.e. if it organizes nodes into a tree structure, but it is up to the researcher to find a ‘‘sensible’’ criterion to establish which are the different levels in that tree. Also unacceptable for method to yield a tree even for networks with no internal structure.

The idea of the paper extends the definition of community struture discussed above to internal structure within each higher level module if there are subgroups of nodes within the module (submodules) that are more interconnected than to other nodes in the same module.


Procedure


Extracting the Hierarchical Organization of Networks

Our method comprises two major steps (Fig. 1):

- i) estimating the ‘‘proximity’’ in the hierarchy between all pairs of nodes, which we call ‘‘node affinity’’;
- ii) uncovering the overall hierarchical organization of node affinities.


new affinity measure based on the surveying of the modularity landscape (18)

Node Affinity: ‘‘topological overlap’’ (11, 16, 17), which is defined as the ratio between the number of common neighbors of the two nodes and the minimum degree of the two nodes. In defining affinity between a pair of nodes, one wants nodes that are close in the hierarchy to have a large affinity and to be classified in the same module in partitions with large modularity. We formalize this requirement by defining the affinity of a pair of nodes as the probability that the two nodes are classified in the same module for partitions that are local maxima of the modularity landscape


calculate modularity

For each possible partition of the network into modules \(P\), modularity \(M\) can be calculated.

\[M(P)=\sum_{i=1}^m\left[\frac{l_i}{L}-\left(\frac{d_i}{2L}\right)^2\right]\]

where \(L\) is the total number of links in the network, \(l_i\) is the number of links within module \(i\), \(d_i\) is the sum of degrees of all of the nodes inside module \(i\), and the sum is over all of the \(m\) modules in partition \(P\) (Fig. 1A). The modularity of a partition is high when the number of intramodule links is much larger than expected for a random partition.

Better partition evaluation

  • multiple partitions \(P_{op}\) for which \(M(P_{op})\) is a global maximum of the modularity.
  • the formulation of \(M(P)\) is such that, depending on the density of connections and groups size, it prevents some desired grouping and forces some undesired grouping.
  • \(P_{op}\) may be in practice unreachable if its basin of attraction is too small.

propose that the affinity \(A_{ij}\) of nodes \(i\) and \(j\) is determined by those partitions that are local maxima in the landscape and by their basins of attraction.


Consider all \(P\in{P_{max}}\)

\(P_{max}\) = the set of partitions for which the modularity \(M\) is a local maxima, that is, partitions for which neither the change of a single node from one module to another nor the merging of two modules will yield a higher modularity \(A_{ij}\) = consider all partitions \(P\in{P_{max}}\) and find the fraction in which \((i, j)\) are placed in the same module.

However, such a procedure would not take into consid- eration the size of the basins of attraction of the different maxima, see below


calculate \(P_{max}\) basins of attraction

We would start by grouping the nodes into a randomly chosen partition; let us say, partition 13.