Using Neo4j, Graph Algorithms, the Yelp public dataset, and React

The Applied Graph Algorithms online training course shows how to enhance the functionality of a web application using Neo4j Graph Algorithms.

When designing the Applied Graph Algorithms online training course, we thought it was important to show how to build applications that leverage graph algorithms to add smart features using real-world datasets. This blog post details the technique behind one of the exercises in the course: building a photo-based personalized recommendations application.

This is just one of the exercises, if this is interesting for you, check out the free Applied Graph Algorithms online course! You can learn more about the trainings in this overview video.

Graph Algorithms

Graph algorithms enable graph analytics and fall into 5 general categories: pathfinding & search, centrality, community detection, link prediction, and similarity. You can learn more about Neo4j Graph Algorithms here.

The five basic categories of graph algorithms available in Neo4j.

Yelp Open Dataset

The Yelp Open Dataset is a subset of all Yelp businesses, reviews and user data made publicly available by Yelp and can be modelled as a graph and imported into Neo4j. This is a great dataset for getting acquainted with building applications and working with graph data.

The Yelp Open Dataset is a subset of Yelp data released publicly. This data can be modelled as a graph in Neo4j.

A Business Reviews Application

We start with a simple business reviews web application with the goal of enhancing its features using Neo4j Graph Algorithms. The basic functionality of the application allows the user to search for businesses and view reviews of the businesses.

A business reviews application using the Yelp Open Dataset. It allows the user to search for businesses and view reviews. We explore how to enhance the functionality of this application using Neo4j Graph Algorithms.

The web application is a React application that uses the Neo4j JavaScript driver to execute Cypher queries against a Neo4j database instance and handle the results.

Basic architecture of the business reviews application: a React frontend uses the Neo4j JavaScript driver to send Cypher queries to a Neo4j database instance.

Personalized Recommendations

Personalized recommendations are extremely useful across many industries and types of applications. By suggesting items to purchase, articles to read, movies to watch, or songs to listen to, users are happy to have content relevant to their interests, and developers are happy to have more engagement with their application.

Content-based vs collaborative filtering

Content-based recommendations and collaborative filtering are the two basic approaches to implementing personalization recommendation systems.

The content-based approach uses attributes of the items being recommended (movie genre, type of cuisine, etc) and compares those to the preferences of the user to make recommendations. Similarity metrics to compute similarity between items can be useful for this approach.

Collaborative filtering on the other hand is based on users’ interactions with items, such as purchases or ratings to generate recommendations.

In this example we use a content-based approach, using photo labels to make personalized recommendations with Neo4j graph algorithms.

Photo-based recommendations

The Yelp Open Dataset includes an archive of 200,000 user-uploaded photos, connected to businesses. These photos will serve as the basis of our personalized recommendation approach.

Our application will show photos to the user at random, allowing the user to select photos they like. Then, we will find similar photos to the ones liked by the user and the businesses connected to those photos will be recommended.

Specifically, the steps are:

  1. Identify similar photos using Jaccard similarity
  2. Cluster similar photos using Label Propagation
  3. Recommend businesses connected to photos in the same community by traversing the graph.

Finding Similar Photos

How will we determine if photos are similar? The Yelp data includes minimal metadata about each photo, but we ran each photo through the Google Vision API which uses machine learning to determine labels for a photo. Our script fetches the labels for each photo and creates Labelnodes in the graph:

Adding photos labels to the graph.
Jaccard similarity is defined as the size of the intersection of two sets divided by the size of the union of two sets.

We can use Jaccard similarity to compute how similar a given pair of photos is. Often used to find recommendations of similar items as well as part of link prediction Jaccard similarity measures the similarity between sets (in our case sets of labels attached to photos).

Jaccard similarity is a set comparison algorithm and is calculated by dividing the size of the intersection of the two sets by the size of their union. The Jaccard similarity algorithm is available in the Neo4j Graph Algorithms library. Here’s how we use it in a Cypher query, computing the similarity of all pairs of photos.

MATCH (p:Photo)-[:HAS_LABEL]->(label)
WITH {item: id(p), categories: COLLECT(id(label))} AS userData
WITH COLLECT(userData) AS data
CALL algo.similarity.jaccard(data,
{topK: 3, similarityCutoff: 0.9, write: true})

This query will create SIMILAR_TOrelationships in the graph, storing the similarity value in a scoreproperty:

Let’s take a look at two photos with overlapping labels and see how the Jaccard similarity score is calculated. In the image below the two photos have 9 overlapping labels and one label connected to only one photo:

Therefore, the Jaccard similar score is 9/10 or 0.9

Cluster Similar Photos Using Label Propagation

Label propagation is a community detection algorithm that segments the graph into partitions, or communities, by assigning partition values to each node. It works by seeding nodes with a community assignment then iterating, assigning the community to neighboring nodes at each iteration until the full graph is partitioned.

To run Label Propagation in Neo4j we pass the node labels and relationship types that define the subgraph on which we want to run the algorithm. In this case, we want to use run the algorithm on the subgraph defined by the SIMILAR relationships:

CALL algo.labelPropagation("Photo", "SIMILAR", "BOTH")

This query will add a partition property to group photos by community. We can then query to see the distribution of Photo nodes by community:

We can validate our photos’ community assignments by selecting a community and viewing the photos assigned to the community. Here we take a community at random and compare the photos:

Looking at these photos they all seem to be pizzas, so this seems to be a “pizza” cluster. Looks like our approach worked pretty well!

Recommend business connected to photos in the same community by traversing the graph

Now that we’ve grouped the photos into communities using the Label Propagation algorithm we’re ready to generate our personalized business recommendations. Given some photos selected by the user we traverse from those photos to their assigned communities, then to other photos in same community, and then traversing to the business connected to the photo, which become our recommendations. We define this traversal using a simple Cypher query:

Note how we are combining global and local graph operations in this operation, taking advantage of graph algorithms together with the near real-time performance of graph databases known as index-free adjacency.

The final result

Here’s our new feature in action:

  1. The user is presented with a photo gallery of random photos
  2. The user selects 5 photos they find appealing
  3. Our application traverses the graph, finding the clusters of the selected photos, then traversing to other photos in the same clusters, and finally to the businesses connected to those photos which become the recommendations.

Enroll In The Free Applied Graph Algorithms Course

The Neo4j Applied Graph Algorithms free online training teaches how to leverage graph algorithms to enhance the functionality of a web application, including this photo-based recommendations example. Enroll today to learn more about Neo4j Graph Algorithms like Personalized PageRank, similarity metrics, community detection and adding features such as personalization, recommendations, and enhancing search results.

You can find all the data and code mentioned in this part by enrolling in the Applied Graph Algorithms Course!