Personalized Recommendation on Sephora using Neural Collaborative Filtering

7 min readDec 16, 2020

Introduction

The recommendation system is popular in a wide variety of services e.g., Amazon, Spotify, and Netflix. Sephora, one of the most influential stores along with its own private brand, is moving toward more personalized shopping experiences.

Therefore, we demonstrate machine learning techniques based on the user’s past behavioral history to recommend products that are suitable for the user. Namely, we implement the recommender system on Neural Collaborative Filtering (NCF) beyond the most famous approach: Collaborative filtering (CF).

However, services such as Amazon and Netflix do not provide the reason why a product was recommended. We will also demonstrate the Local Interpretable Model-Agnostic Explanations (LIME) method to present reasons for recommendations tailored to the user.

Additionally, we apply a clustering method: KModes to tackle the cold start problem on recommended products for new users. Therefore, the new users will give recommended products based on their information which is clustered to the most similar existing users.

Dataset

We get the review data directly from the Sephora website by web scraping. You can see our code in Github repository. The dataset contains brands, product names, product descriptions, prices, and ratings range from 1 to 5. Additionally, the data has a user’s review and time when the user submitted the review, user’s personal information such as skin types and skin tones.

We do exploratory data analysis and visualize the top25 recommended products. ‘LANEIGE: Lip Sleeping Mask’ is the most popular product.

Then we illustrated the long-tail phenomenon in recommender systems across product_id which was/were reviewed. The first vertical line separates the top 20% of items by popularity called ‘short head’. The short head items are very popular items. The second vertical line divides the tail of distribution into ‘long tail: these items are what we focus here’ and ‘distant tail: items that receive so few ratings’.

Product distribution which has user’s review

Methodology

Preprocessing

We start with sampling the data from users who have more than five reviews in this analysis to avoid sparse matrix on a matrix-base approach for the recommender system. Finally, we sampled 54,169 reviews, 7,948 users and 269 items from the whole dataset.

Then we preprocessed the price information e.g., $18.00 — $200.00 to be the average prices and group them in specific ranges. We convert them into numerical data, for example, range-based data is converted to the mean of the minimum and maximum values. We also removed the $ from the amounts with $ attached and converted them into numerical data. In addition, we grouped the prices according to their distribution.

Collaborative Filtering and SVD

We first developed fundamental matrix factorization approaches using a surprise library: Item-based CF and SVD with rating scale from 1 to 5. The item-based CF calculates user similarity based on item ratings, while SVD is often used to reduce the number of features of a dataset by reducing dimensions.

Neural Collaborative Filtering

Then we explored more complex models with NCF by using Pytorch library. First, input layers were assigned to users and products, respectively, and embedding tensors were created for each input. The embedding tensors for users and products were combined to learn ratings through multiple hidden layers (NCF Layer).

We show the structure of this network as shown below, you can see entire code here.

Neural Collaborative Filtering adding Attributes

Additionally, we could easily add the user’s attribute information (skin type, skin color, skin concerns, etc.) and product features ( benefits, price, brand, etc.) to the input layer.

In addition to NCF structure, we need to create embedding layer for features. Other parts of the structure are almost the same as NCF, and you can see the entire code here.

If you want to store parameters of Neural Network, you can run the below code, so that you do not need to re-train the model. Also, when the model trained by using GPU, you should add to('cpu') to use them on CPU.

model_path = 'Addemmbednet.pth'torch.save(Addnet.to('cpu').state_dict(), model_path)

LightGBM

LightGBM was rarely used for recommender systems; however, we applied this model to the dataset as the regression task. The main advantages of lightGBM are it has an outstanding accuracy, and it can calculate the feature importances: a tree based model. We then developed lightGBM model using LightGBM library combining with Optuna library for tuning hyperparameters.

Also, lightGBM has lots of hyperparameters, such that it is critical issue to find hyperparamters where the model has the best performance.

Results and Discussions

Among the models, Neural Collaborative Filtering (NCF) and NCF with attributes had the two highest performance with the lowest mean absolute error (MAE) score. MAE is the average of absolute differences between the predicted and measured values.

As we expected, the model with additional attribute information had higher accuracy than the model using only ratings. The prediction results of lightGBM were concentrated in 5 and 4, whereas the distribution of the NCF results was similar to the distribution of the true values.

Rating distribution of NCF with attributes (left) and LightGBM (right) results

LIME for presenting reasons for recommended product

By applying the LIME method, based on weights of features, we could provide the reason why the item was chosen. In this case, we could see that skin type and the brand were the main reasons for recommending products.

We showed a method of applying LIME to a recommendation model and presenting the reason for the recommendation to the user using the importance of the derived features.

Input user’s characteristics such as skin type, skin tone, skin concerns and price range which the user desires for a product.
Compute ratings for all products and choose an item which has the highest rating as a recommended item.
Send input data and predicted rating to the LIME model.
Provide the product with feature importance based on LIME results.

Structure how to suggest reasons for recommendation

We can see that price range and the brand were the main reasons for recommending products.

Cold Start Problem

We built a system that recommended products based on the user’s preferences e.g., skin_type, skin_tone, skin_concerns, and price_band. On the other hand, the cold start problem was the next issue. To solve this problem, we proposed a method to substitute similar user ids using a clustering technique according to the preferences input by the user.

Implementation of clustering

We showed a method of applying KModes, which is used for categorical variables. It defines clusters based on the number of matching categories between data points.

Cost vs K plot for selection of optimal “K” clusters in KModes. We chose K = 3

The examples of results from the clustering is shown in below figures. Based on the result, we can find the similar users to new user.

Distribution of skin type and skin tone results

What‘s Next

To improve the model performance, we will use further information about product’s description (sentence) within the review dataset.

Using deep learning methods, we could extract more insights e.g., product information from the description, as well as URLs about the product image to improve the system.

Finally, we recommended future research on avoiding bias against rating predictions and recommended popular items. Currently, the distribution of ratings was left skewed, which meant more ratings were 5, but less ratings were 1 and 2.

To manage the popularity bias, many research proposed many methods to recommend more on long-tail items which could be defined as potential ‘top-head’ products, and helped recommend across the accuracy/diversity tradeoff.

Reference

Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, Tat-Seng Chua. “Neural Collaborative Filtering.” Cornell University. August 26, 2017. [paper]

Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, Tie-Yan Liu. “LightGBM: A Highly Efficient Gradient Boosting Decision Tree.” Advances in Neural Information Processing Systems 30 (NIPS 2017). December, 2017. [paper]

Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin. “”Why Should I Trust You?”: Explaining the Predictions of Any Classifier.” Cornell University. August 9, 2016 [paper]

Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, and Masanori Koyama. “Optuna: A Next-generation Hyperparameter Optimization Framework”. In KDD 2019. [paper]