For a recent university project we were analysing large networks. We used the Standford Network Analysis Project (SNAP) to perform a number of operations on any dataset we wanted. We searched for a large dataset as we believed it could yeild some pretty interesting results. We settled on the California Road Network where nodes represent either an intersection, or end point (dead end) and the edges represent the roads between them. The website states that this is an undirected edge but we found during the visualisation that there is infact directed edges.

There are some quite interesting things to notice about the dataset when you perform some network anaysis on it. However we were interested in producing some kind of visual for our presentation.

# Visualisation

Given that the dataset contains 1965206 nodes, the adjaency matrix would be [1965206 X 1965206] large and if each element is represented as an integer (4 bytes), the size of the matrix is around **3596GB** so that was out of the question. Using the SNAP tools, we could calculate the communities within the matrix.

We were able to extract a matrix of a few thousand nodes and visualise that.

## Using MATLAB

Once you’ve downloaded and extracted the dataset from SNAP, navigate to that folder using the bar just below the ribbon. You can import the data by either using the **Import Data** button on the ribbon, or by typing `roadNet-CA = textread('roadNet-CA.txt')`

. Note that if you use this second command method you’ll need to delete the first few lines which contain the hash symbol.

To extract a specific community you’ll need to also load in the communities to MATLAB. If you place file generated by SNAP into the same directory as open in MATLAB, you can load that just as above. Either use the **Import Data** button or `communities = textread('communities.txt')`

. The `communities`

variable contains a list of nodes and the number of which community they belong to. We had nearly 2 million nodes, so we wanted to filter the data and generate networks based on smaller communities.

## Filtering the network

We need to create a subset of the `roadNet-CA`

variable, we can do so but running a command which goes through the second column in the matix and returns a `1`

(or true). You can do so by running `communityIndice = communities(:, 2) = #`

replace the **#** with the community number you’re filtering.

Next we need to create a matrix holding only the nodes found in the `roadNet-CA`

matrix and we can do so by running `community# = communities(comminityIndice, :)`

which gets the rows where a 1 appears inside either of column and adds that to the new variable.

The final stage is to get a logic matrix of when the node ID appears inside the whole network, which an be done by `ans = ismember(roadNet-CA, community#)`

and finally to get an edge list by running `roadNet-CA# = roadNetCA(ans(:, 1), :)`

## Creating the Network Graph

We need to split up the newly created `roadNet-CA#`

variable, into two seperate variables. `s = roadNet-CA#(:, 1)`

gets our first column, and `t = roadNet-CA#(:, 2)`

gets our second column. We can’t generate the graph quite yet as they need to be rotated, so we can apply the `s = rot90(s)`

and `t = rot90(t)`

to each variables.

To create the graph, we can do so by running `G = digraph(s, t)`

(use `graph`

if you have an undirected network). To produce the figure run the command `plot(G)`

and this will generate a nice visual on the graph.