# Math Insight

### The absurd high dimensionality of random graphs

#### Exponentially increasing dimension

If $X$ is a random variable described by the discrete probability distribution \begin{gather*} P_X(x) = \Pr(X =x), \end{gather*} then to completely characterize $X$, we just need to know all the probabilities $\Pr(X=x)$ for the different possibilities $x$ of $X$. For example, if $X$ was the result of rolling a die, then we just need to know the six number corresponding to the possible outcomes $x \in \{1,2,3,4,5,6\}$. If the die were a fair die, then $P_X(x)=\frac{1}{6}$ for each of those values. But, we shouldn't make such assumptions and should allow for possibilities like $P_X(1)=0.1$, $P_X(2)=0.01$, $P_X(3)=0.5$, $P_X(4)=0$, $P_X(5)=0.09$, and $P_X(6)=0.3$. We could summarize this probability distribution with a probability table.

 $X\nobreak{=}1$ $X\nobreak{=}2$ $X\nobreak{=}3$ $X\nobreak{=}4$ $X\nobreak{=}5$ $X\nobreak{=}6$ 0.1 0.01 0.5 0 0.09 03

To specify the probability distribution of $X$, one needs to specify five different values. (The sixth value can be determined from the other five by the constraint that the probabilities must sum to one.) We could think of the probability distribution being determined by five different parameters, so the dimension of the parameter space is five.

Now, imagine that you have two random variables $X$ and $Y$. To look at the joint behavior, you need to know the probability distribution \begin{gather*} P_{X,Y}(x,y) = \Pr(X=x,Y=y), \end{gather*} which gives a number for all possible combinations of $X$ and $Y$. If both $X$ and $Y$ have six different possibilities, such as if they were dice, then you need a separate probability for every $6 \cdot 6 = 36$ possibilities of the ordered pair $(x,y)$. We can't be fooled into thinking a pair of independent dice describes all the possibilities; these dice might be tricky. The random variable $X$ might behave quite differently from $Y$, and we have to allow for the possibility that $X$ and $Y$ are communicating with each other. For example $X$ might only be small (1 or 2) or large (5 or 6) but never take a middle value (3 or 4). And $X$ and $Y$ might be more likely to have similar values than very different values, so that they are most likely to be both small or both large. For example, the 36 probabilities for all 36 possibilities could be those of the following probability table. The table means that $P_{X,Y}(1,1)=0.1$, $P_{X,Y}(2,1)=0.05$, $P_{X,Y}(3,1)=0$, etc.

 $X\nobreak{=}1$ $X\nobreak{=}2$ $X\nobreak{=}3$ $X\nobreak{=}4$ $X\nobreak{=}5$ $X\nobreak{=}6$ $Y\nobreak{=}1$ 0.1 0.05 0 0 0 0 $Y\nobreak{=}2$ 0.05 0.03 0 0 0 0 $Y\nobreak{=}3$ 0.03 0.01 0 0 0.04 0.05 $Y\nobreak{=}4$ 0.02 0.01 0 0 0.05 0.06 $Y\nobreak{=}5$ 0 0 0 0 0.1 0.1 $Y\nobreak{=}6$ 0 0 0 0 0.1 0.2

To specify the probability distribution of $X$ and $Y$, one needs to specify 35 different values (since the probabilities must sum to one). In this case, the parameter space has 35 dimensions.

Adding another variable $Z$ to the mix greatly increases the dimension of the parameter space describing the distribution \begin{gather*} P_{X,Y,Z}(x,y,z) = \Pr(X=x,Y=y,Z=z). \end{gather*} If we were to write out a probability table, we could write the numbers in a $6 \times 6 \times 6$ cube containing all $6^3 = 216$ probabilities. If we don't do anything to simplify the situation, we'd have a 215 dimensional parameter space.

To go past three variables, let's change the variable names from $X$, $Y$, and $Z$ to $X_1$, $X_2$, and $X_3$. Then, we could imagine that we have $n$ such variables labeled by $X_i$ for $i =0,1,2, \ldots, n$. We can denote them all together as the vector $\vc{X}$. The probability distribution of $\vc{X}$ is \begin{align*} P_{\vc{X}}(\vc{x}) &= \Pr(\vc{X}=\vc{x})\\ &= \Pr(X_1=x_1, X_2=x_2, \ldots, X_n=x_n). \end{align*} Specifying this probability distribution requires a table of $6^n$ different numbers, so the parameter space has $6^n-1$ dimensions.

The number of dimension goes up very fast as we add more random variable. With four random variable, the dimension is up to $6^4-1=1295$, and we'd be past 60 million dimensions for 10 variables.

To bring the dimension down, let's imagine we had coins rather than dice, so that each random variable had only two possible values. We'll make each random variable be a Bernoulli random variable that is either 0 or 1. This simplification reduces the size of the probability table for $n$ variables to $2^n$ different numbers, giving $2^n-1$ dimensions.

For three Bernoulli random variables, we have only $2^3-1=7$ dimensions, but the size still increases exponentially with the number of variables. With $n=10$ different random variables, we are up to $2^{10}-1 = 1023$ dimensions, which is much less than the 10 million we had for the dice, but still pretty large. When $n=20$, $2^{20}-1 = 1,048,575$, and we're over a billion dimensions by the time $n=30$.

#### The dimension of random networks

For a random network, the random variables are the entries of the adjacency matrix. These entries are 0 or 1, depending on whether or not the corresponding edge exists, so the probability distribution is similar to the previous one mentioned above. The dimension of the parameter space describing the probability distribution will be $2^n-1$ if we have $n$ different random variables.

If we have a network of $N$ nodes, then the adjacency matrix is an $N \times N$ matrix. This matrix has $N^2$ entries, but if we exclude self connections and make the network undirected, then there are only $N(N-1)/2$ unique entries in the matrix. The probability distribution describing the resulting random network will be in terms of $n=N(N-1)/2$ different Bernoulli random variables. The dimension of the parameter space is $2^{N(N-1)/2}-1$.

Therefore, to determine a probability distribution defining an $N \times N$ unweighted and undirected random network, we just need to pick $2^{N(N-1)/2}-1$ different positive numbers (making sure they are all really small so that there sum is no larger than one). For each possible combination of these $2^{N(N-1)/2}-1$ different numbers, we have a different probability distribution, hence a different random network model.

Let's say we have a network of $N=3$ nodes, then we have to pick $2^{3\cdot 2/2}-1 = 2^3-1=7$ different numbers to specify the network probability distribution. Increasing the network size to $N=5$ nodes, and the dimension of the parameter space increases to $2^{10}-1=1023$. These numbers are getting large fast. If we stay with tiny networks and just increase the size to $N=10$ nodes, the dimension of the parameter space jumps to $2^{45}-1$, which is over 35 trillion. Increase the size to a small network of $N=100$, and we have to choose over $10^{1490}$ different parameters. We won't even think of what happens if you go beyond a medium size network of $N=1000$, where the parameter space weighs in at over $10^{150,364}$ dimensions. Given that the number of atoms in the observable universe is estimated to be around $10^{80}$, it's hard to comprehend the size of this number of dimensions.

#### Reducing the dimension

To develop tractable random network models, it's clear we have to shave off a few dimensions from the parameter space describing the network probability distribution. Otherwise, it will take us awhile to come up with the $10^{150,364}$ number describing the probability distribution for a network of $N=1000$ nodes.

##### Statistical homogeneity

In our above calculation, we made the probability distribution completely general. In particular, we allowed all 1000 nodes to be completely different. A dramatic simplification will be to assume that all 1000 nodes are statically identical, or that the network is statistically homogeneous. In other words, we can insist that the probability distribution doesn't change if we switch the order of the nodes. If we switch node one and node two, we want to make sure that the table of probabilities doesn't change.

For $N$ nodes, there are $N!=N(N-1)(N-2) \cdots 1$ different orderings of the the nodes. So it seems we could reduce the size of the dimension by a factor of $N!$ if we insist the probabilities don't change by any reordering. (It actually doesn't reduce it by quite this much due to complicated symmetries, but let's not worry about this.) Let's say the dimension of the parameter space for $N$ nodes is reduced to $2^{N(N-1)/2}/N!$.

The factorial $N!$ increases very rapidly with $N$, so dividing by $N!$ should help keep the dimension down. When $N=1000$, $N!$ is an astronomically large value of around $10^{2567}$. But, unfortunately, $2^{N(N-1)/2}$ grows much faster. Dividing by $N!$ only reduces the dimension from around $10^{150,364}$ to the not-so-small number of $10^{147,797}$.

Enforcing statistical homogeneity on the network wasn't nearly enough to make the dimension tractable. The possible network structures are just too complicated. We need to do something much more drastic to obtain a workable random network model.

##### Independence

One simplification that will immediately bring down the dimension of the probability distribution parameter space is to assume that the random variables are independent. If the random variables $X_i$ are independent, then we can factor the probability distribution of the vector $\vc{X}=(X_1,X_2,\ldots,X_n)$ into many probability distributions for a single variable, \begin{align*} P_{\vc{X}}(\vc{x}) &= \Pr(\vc{X}=\vc{x})\\ &= \prod_{i=1}^n \Pr(X_i=x_i)\\ &= \prod_{i=1}^n P_{X_i}(x_i). \end{align*}

This description is much simpler. If we assume that each random variable $X_i$ is a Bernoulli random variable, then we need just one parameter (the probability $p_i$ that it is 1) to determine the probability distribution $P_{X_i}(x_i)$: \begin{gather*} P_{X_i}(x_i) = \begin{cases} p & \text{if $x_i=1$}\\ 1-p & \text{if $x_i=0$} \end{cases}. \end{gather*} For the probability distribution of the $n$ variables in $\vc{X}$, we just need $n$ different parameters, way less than the $2^n-1$ we needed without the independence assumption.

This assumption makes it tractable to write down a probability distribution even for large networks. For an undirected network of $N$ nodes, we will have $n=N(N-1)/2$ independent Bernoulli random variables corresponding to each possible location for an edge. We'll need to pick just $N(N-1)/2$ parameters (the probabilities $p_{ij}$ that there is an edge between each pair of nodes $i$ and $j$). If $N=1000$, this is $499,500$ different numbers, a few less than $10^{150,364}$ we had earlier.

Still $499,500$ is still a bit large. We can reduce this further by assuming statistical homogeneity, which in this case is assuming the probability $p_{ij}$ of an edge does not depend on the node labels $i$ and $j$. Now we just need to pick a single parameter $p$, the probability that any edge exists. We've reduced the parameter space down to one dimension!

What we've come up with is the Erdös-Rényi random network model1, the simplest type of random network model. This model is so natural, that people often refer to these networks as simply “random networks,” and may view deviations from the Erdös-Rényi model as “non-randomness.“

This model stays one-dimensional even if we extend it to directed graphs and is easy to generate. We can also extend the model to populations of different types of nodes, allowing the probability of a connection to depend on the populations of the nodes involved. As long as we keep the independence assumption, the model is easy to understand, and it is a simple matter to generate the networks.