Using Graph CNNs in Keras
GraphCNNs recently got interesting with some easy to use keras implementations.
The basic idea of a graph based neural network is that not all data comes in traditional table form. Instead some data comes in well, graph form. Other relevant forms are spherical data or any other type of manifold considered in geometric deep learning.
So what does graph data look like if not like a table? Here’s an example:
Let’s put some meaning into those variables, and no I’m not gonna use a “citation network” example which would be the default for graph based neural networks. While easy to understand I find more value in something more applicable to the data scientists day-to-day work, recommendation graphs:
- V_1 is a Netflix user who watched “House of Cards” and rated it with 5 stars.
- V_2 is “House of Cards”
- V_3 is another Netflix user who is befriended with V_1
We can describe the rating as an edge, as well as the “friends” relation, depicted by the black lines.
- We label V_1 as a “House of Cards” lover, encoded by the label 1
- We label the Series V_2 as 2
- and we want to guess the label for V_3.
The final ingredient are features. The two users, vertices 1 and 3 have features:
- x_1 = most liked genre
2. x_2 = second most liked genre
3. x_3 = country
. The show has the related features:
- x_1 = genre
2. x_2 = second closest genre of movie
3. x_3 = most liked in which country
Let’s put those things into a more mathematical form. We will use X, A, and Y. X for features, A for the graph and Y for the labels. Features and labels are encoded numerically; the graph is encoded by it’s adjacency matrix.
X = np.array([[1,2,10], [4,2,10], [0,2,11]])
Y = np.array([1,2,1])
from keras.utils import to_categorical
Y = to_categorical(Y)
A = np.array([0,1,5],[1,0,0],[5,0,0])
Now let’s try out a keras based graphCNN implementation on this before we continue with a larger dataset.
Verma’s Graph Learning Implementation
We’re going to use this module which is not pip installable, but can be included as a git submodule.
git submodule add https://github.com/vermaMachineLearning/keras-deep-graph-learning.git
Then add the subfolder to the syspath so you can import as if it where a usual pip installable module. And don’t worry all code is available at github.
import os, sys
Now let’s load thedata and the submodule and get started. Here’s the first part of the code:
Then we have to tell the GraphCNN to use just the labels from the vertices 1 and 2 by setting the sample_weight to 0 on the last one. We’re also going use a simple “filter” this time. A filter is always the size of A. The filter is basically the way the edges are used in the training of the GraphCNN. So:
- If we set the filter to the identity matrix, we have a usual MLP, without the edge relations.
- If we set it to A, we are using the edge information in the most basic way.
- If we set it to concat[A, A*A^t] we are using the edges as well as putting special attention on large weights. This would be two filters, not one anymore btw.
We then take a usual keras Sequential model, add one layer, use categorical_crossentropy as loss function, no fanzy Laplacian, and fit the model to our data. Here’s the code:
Examples on More Data
The CORA dataset is a graph of scientific publications and citations. It contains around 5k citations and 2,7k publications. It’s provided by provided by linqs.soe.ucsc.edu/data.
So this time we’ll use a bunch of technical things you can find in this paper. The authors also provided a keras and tensorflow implementation of graphCNNs which I link to below.
The first part is to load the CORA dataset from the repository, maybe using a helper function, then splitting up the data into train & test data. Then we do some preprocessing which is called the “Renormalization trick”. The renormalization trick is used to transform the edge matrix A in a way that keep the “edge information” but makes it easier to compute with. This is especially helpful against overfitting.
A considerable part of graph magic is put into the so called filters. The name filters comes, as far as I understand it, from spectral filtering of signals.
Let’s try out three different ones. Filters are basically the way we use the additional information from the edges in our propagation model.
Example 1, a Simple MLP
Let’s first not use the edges at all. For that we set the filter to the identity matrix. Then we run an evaluation on that.
0.51% that’s not too good.
Example 2, Using “A” as Filter
Now here we get an accuracy of 0.76% which is already way higher than the 0.51% we got from using the features alone.
Example 3, two Filters
Finally we’re gonna use concat[A, A*A^t] as filter.
0.80% and already 0.79% after 17 runs through the data.
That’s it! You’re ready to play around with graphCNNs. The framework also implements two other types of layers and you can play around with different kinds of representations of your graphs. I’ll try to put up another post working on an actual data science problem, not just the citation based ones as soon as I get around to it.
- https://github.com/sbalnojan/graphcnn-examples github code for this post.
- https://github.com/vermaMachineLearning/keras-deep-graph-learning module used in this post.
- https://arxiv.org/abs/1609.02907, T. N. Kipf, M. Welling (2016), Semi-Supervised Classification with Graph Convolutional Networks, a great source for everything related. Created GCNs and a keras & tensorflow implementation.