For a recent university project we were analysing large networks. We used the Standford Network Analysis Project (SNAP) to perform a number of operations on any dataset we wanted. We searched for a large dataset as we believed it could yeild some pretty interesting results. We settled on the California Road Network where nodes represent either an intersection, or end point (dead end) and the edges represent the roads between them. The website states that this is an undirected edge but we found during the visualisation that there is infact directed edges.
There are some quite interesting things to notice about the dataset when you perform some network anaysis on it. However we were interested in producing some kind of visual for our presentation.
Given that the dataset contains 1965206 nodes, the adjaency matrix would be [1965206 X 1965206] large and if each element is represented as an integer (4 bytes), the size of the matrix is around 3596GB so that was out of the question. Using the SNAP tools, we could calculate the communities within the matrix.
We were able to extract a matrix of a few thousand nodes and visualise that.
Once you’ve downloaded and extracted the dataset from SNAP, navigate to that folder using the bar just below the ribbon. You can import the data by either using the Import Data button on the ribbon, or by typing
roadNet-CA = textread('roadNet-CA.txt'). Note that if you use this second command method you’ll need to delete the first few lines which contain the hash symbol.
To extract a specific community you’ll need to also load in the communities to MATLAB. If you place file generated by SNAP into the same directory as open in MATLAB, you can load that just as above. Either use the Import Data button or
communities = textread('communities.txt'). The
communities variable contains a list of nodes and the number of which community they belong to. We had nearly 2 million nodes, so we wanted to filter the data and generate networks based on smaller communities.
Filtering the network
We need to create a subset of the
roadNet-CA variable, we can do so but running a command which goes through the second column in the matix and returns a
1 (or true). You can do so by running
communityIndice = communities(:, 2) = # replace the # with the community number you’re filtering.
Next we need to create a matrix holding only the nodes found in the
roadNet-CA matrix and we can do so by running
community# = communities(comminityIndice, :) which gets the rows where a 1 appears inside either of column and adds that to the new variable.
The final stage is to get a logic matrix of when the node ID appears inside the whole network, which an be done by
ans = ismember(roadNet-CA, community#) and finally to get an edge list by running
roadNet-CA# = roadNetCA(ans(:, 1), :)
Creating the Network Graph
We need to split up the newly created
roadNet-CA# variable, into two seperate variables.
s = roadNet-CA#(:, 1) gets our first column, and
t = roadNet-CA#(:, 2) gets our second column. We can’t generate the graph quite yet as they need to be rotated, so we can apply the
s = rot90(s) and
t = rot90(t) to each variables.
To create the graph, we can do so by running
G = digraph(s, t) (use
graph if you have an undirected network). To produce the figure run the command
plot(G) and this will generate a nice visual on the graph.