In graph/network theory: adjacency matrix is a square matrix used to represent a finite graph. Elements indicate whether pairs of vertices/nodes are adjacent/connected. Boolean. Simple finite graph: 1/0 representing vertices that are connected/not connected with an edge

Same size but elements indicate the distance between vertices rather than just connectivity (*more relevant to our data? ie 0-1 correlations = distance/weight?*). Hierarchical clustering trees decomposes into distance matrix.

One network feature that has been emphasized in recent work is community structure, the gathering of vertices into groups such that there is a higher density of edges within groups than between them

First mentioned in Clauset et al 2004 were they describe a new algorithm for inferring community structure from network topology which works by greedily optimizing the modularity.

Modularity [21] is a property of a network and a specific proposed division of that network into communities. Girvan and Newman [20, 21] proposed a divisive algorithm that uses **edge betweenness** as a metric to identify the boundaries of such communities. ** Computationally expensive**. It measures when the division is a good one, in the sense that there are many edges within communities and only a few between them:

- will be large for good divisions of the network, in the sense of having many within-community edges
- on its own, not a good measure of community structure (it takes its largest value of 1 in the trivial case where all vertices belong to a single community).
- However, if we subtract from it the expected value of the same quantity in the case of a randomized network, we do get a useful measure.
- Nonzero values represent deviations from randomness, and in practice it is found that a value above about
**0.3 is a good indicator of significant community structure**in a network. - While finding the
**global maximum modularity over all possible divisions**seems hard in general, reasonably good solutions can be found with approximate optimization techniques. The algorithm proposed in (Newman 2004, Fast algorithm for detecting community structure in networks) uses a**greedy optimization in which, starting with each vertex being the sole member of a community of one, we repeatedly join together the two communities whose amalgamation produces the largest increase in Q.** The entire process can be represented as a tree whose leaves are the vertices of the original network and whose internal nodes correspond to the joins. This dendrogram represents a hierarchical decomposition of the network into communities at all levels.

the existence of hierarchy can simultaneously explain and quantitatively reproduce many commonly observed topological properties of networks, such as right-skewed degree distributions, high clustering coefficients and short path lengths.

Hierarchical structure goes beyond simple clustering, however, by explicitly including organiza- tion at all scales in a network simultaneously.

Sales-Pardo et al. 2007 *Extracting the hierarchical organization of complex systems*

A central idea in biology is that life processes are hierarchically organized (2â€“4). Additionally, it seems plausible that this hierarchical structure plays an important role in the systemâ€™s dynamics.

Complex networks are convenient representations of the interactions within complex systems. Here, we focus on the **identification of inclusion hierarchies in complex networks**, that is, to the **unraveling of the nested organization of the nodes in a network into modules, which in turn are composed of submodules and so on**.

Method must **identify the different levels in the hierarchy** as well as **the number of modules and their composition at each level**. Methods whose output is subject to interpretation excluded i.e.Â if it ** organizes nodes into a tree structure**, but it is

The idea of the paper extends the definition of community struture discussed above to internal structure within each higher level module if there are subgroups of nodes within the module (submodules) that are more interconnected than to other nodes in the same module.

Our method comprises two major steps (Fig. 1):

*Node Affinity:*** â€˜â€˜topological overlapâ€™â€™** (11, 16, 17), which is defined as the

For each possible partition of the network into modules \(P\), modularity \(M\) can be calculated.

\[M(P)=\sum_{i=1}^m\left[\frac{l_i}{L}-\left(\frac{d_i}{2L}\right)^2\right]\]

where \(L\) is the total number of links in the network, \(l_i\) is the number of links within module \(i\), \(d_i\) is the sum of degrees of all of the nodes inside module \(i\), and the sum is over all of the \(m\) modules in partition \(P\) (Fig. 1A). The modularity of a partition is high when the number of intramodule links is much larger than expected for a random partition.

- multiple partitions \(P_{op}\) for which \(M(P_{op})\) is a global maximum of the modularity.
- the formulation of \(M(P)\) is such that, depending on the density of connections and groups size, it prevents some desired grouping and forces some undesired grouping.
- \(P_{op}\) may be in practice unreachable if its basin of attraction is too small.

propose that the affinity \(A_{ij}\) of nodes \(i\) and \(j\) is determined by those partitions that are local maxima in the landscape and by their basins of attraction.

**\(P_{max}\)** = the set of partitions for which the **modularity \(M\) is a local maxima**, that is, partitions for which neither the change of a single node from one module to another nor the merging of two modules will yield a higher modularity \(A_{ij}\) = consider all partitions \(P\in{P_{max}}\) and find the fraction in which \((i, j)\) are placed in the same module.

**However, such a procedure would not take into consid- eration the size of the basins of attraction of the different maxima**, see below

We would start by grouping the nodes into a randomly chosen partition; let us say, partition 13.