Easily build a low-code content-based recommender system
Easily build a low-code content-based recommender system
Recommender systems are an application of Machine Learning that is massively used in many e-commerce platforms. In this tutorial, I will present an easy way to build and deploy a recommender system model that you can integrate into your e-commerce platform. For that, we will use SmartPredict, a powerful and easy-to-use tool that helps you to easily build Machine Learning models for your uses and easily deploy them with minimal effort to be ready to use. The dataset used for this project along with the custom modules and flowcharts can be downloaded from this repository.
How does it work?
In general, there are two approaches that we can adopt when building a recommender system :
- content-based filtering: In this first approach, we measure the similarity between the items. When a user reacted to an item (buying, rating, clicking, …), the recommender system model finds all k items that are similar to the first one and proposes these last ones to the user. The similarities between each pairwise item are measured using the cosine similarity formula. Of course, before we compute the cosine similarity matrix for all items, the used features should be treated by some features engineering steps.
- collaborative filtering: This second approach is based on the comparison of users' tastes. The basic rule is that if user A and user B had opinions about some seen items, it is likely that user B would also have the same opinion as user A about unseen items. So here, the comparison resides in the users in place of the items.
In this tutorial, we will implement a content-based recommender system from scratch.
Dataset
The dataset used for this tutorial contains movie genres. For this first tutorial, we will just recommend movies to users based on the genres of movies they would like. For example, a user that has reacted to a Sci-fi movie would like other Sci-fi movies so we recommend these ones to him. Of course, you can try to fine-tune your recommender system if your dataset has more features that describe the items with additional preprocessing steps. In this dataset, a movie can be categorized by multiple genres separated by a “|” character.
Build flowchart
The first module of the build flowchart (after the dataset) is a custom module (movie dataset transformer) in which we change each “|” character separating movie genres into spaces. This step is necessary for the next module which is a Term Frequency - Inverse Document Frequency (TF-IDF) module. This next module transforms movie genres into vectors so that we can use them with an algorithm.
Then, the result is fed to a second custom module (word vectors to array) : the output of the TF-IDF vectorizer module is a data frame in which the movie genres have been transformed to their corresponding vectors. To calculate the similarity between them in the next module, we need an array that contains all genre vector representations.
We use the cosine similarity algorithm to calculate the similarity between each pairwise movie genre. We measure the cosine similarities between each pairwise row of the output of the last module. The cosine similarity is obtained by this formula :
cosine_similarity(A,B) = A*B/||A||*||B||
Finally, we save the result of the cosine similarity module as an ML model as we will use it for the prediction in our build space.
Deploy flowchart
We also use two custom modules in the deploy flowcharts: the first one is an index matcher module. It is used to match the movie id to the corresponding row of the original dataset that has been used to compute the cosine similarity matrix. The second one computes the inverse operation: this is used to retrieve the corresponding movies referred to as the indexes output by the Model Predictor module. The Number of elements parameter of this module is used to define the number of suggested elements.
The original dataset is reused in the flowchart in order to construct the mapping between the movie ids and the corresponding indexes in the dataset.
The flowchart receives a movie Id as input and returns the most similar movies according to the defined number of elements cited above.
Final notes
In this article, we constructed a basic content-based recommender system using smartpredict with low code. In the next tutorials, I will show how to proceed with collaborative filtering which is the common algorithm used for building recommender systems. So stay in touch to see the power of smartpredict to improve your e-commerce site visiting experience. Thank you :)