Neural Graph Collaborative Filtering

Neural Graph Collaborative Filtering

author: Xiang Wang, Xiangnan He, Meng Wang
from: National University of Singapore
published in SIGIR’19, July 21-25, 2019, Paris, France

Research Background:

Learning vector representations (aka. embeddings) of users and items lies at the core of modern recommender systems. Ranging from early matrix factorization to recently emerged deep learning based methods, existing efforts typically obtain a user’s (or an item’s) embedding by mapping from pre-existing features that describe the user (or the item), such as ID and attributes.

Existing Question:

  • Collaborative signal, which is latent in user-item interactions, is not encoded in the embedding process. As such, the resultant embeddings may not be sufficient to capture the collaborative filtering effect.
    • Question:
      • What is the meaning of collaborative signal?

Motivation

  • integrate the user-item interactions —more specifically the bipartite graph structure — into the embedding process.

    • Question: bipartite graph structure?
  • develop a new recommendation framework Neural Graph Collaborative Filtering (NGCF), which exploits the user-item graph structure by propagating embeddings on it. This leads to the expressive modeling of high-order connectivity in user-item graph, effectively injecting the collaborative signal into the embedding process in an explicit manner.

    • Question:
      • How to propagating embeddings on it?
      • The difference between this and KG propagation?

Introduction

Collaborative filtering (CF) addresses it by assuming that behaviorally similar users would exhibit similar preference on items. Common paradigm is to parameterize users and items for reconstructing historical interactions, and predict user preference based on the parameters.

  • Tow key components in learnable CF models:

    • embeddings: transforms users and items to vectorized representations
    • interaction modeling: reconstructs historical interactions based on the embeddings.
model explanation
MF directly embeds user/item ID as an vector and models user-item interaction with inner product
collaborative deep learning extends the MF embedding function by integrating the deep representations learned from rich side
neural collaborative filtering models replace the MF interaction function of inner product with nonlinear neural networks
translation-based CF models use Euclidean distance metric as the interaction function

Existing Question:

  • most existing methods build the embedding function with the descriptive features only (e.g., ID and attributes), without considering the user-item interactions — which are only used to define the objective function for model training.

  • the scale of interactions can easily reach millions or even larger in real applications, making it difficult to distill the desired collaborative signal.

    • their method: exploiting the high-order connectivity from user-item interactions, a natural way that encodes collaborative signal in the interaction graph structure.
      Neural Graph Collaborative Filtering_第1张图片

Methodology

architecture

Neural Graph Collaborative Filtering_第2张图片

Embedding Layer

在这里插入图片描述
d denotes the embedding size
在这里插入图片描述

Embedding Propagation Layers【important】

first-order propagation

theoretical basis: the users that consume an item can be treated as the item’s features and used to measure the collaborative similarity of two items.

  • two main processes: message construction & message aggregation
  • message construction :
    the message from i i i to u u u as:
    在这里插入图片描述
    m u ← i m_{u←i} mui is the message embedding (i.e., the information to be propagated). f ( ∗ ) f(*) f() is the message encoding function, which takes embeddings e i e_{i} ei and e u e_{u} eu as input, and uses the coefficient p u i p_{ui } pui to control the decay factor on each propagation on edge ( u , i ) (u,i) (u,i).
    在这里插入图片描述
    where W 1 , W 2 ∈ R d ′ × d W1,W2∈ R^{d′×d} W1,W2Rd×d are the trainable weight matrices to distill useful information for propagation, and d ′ d′ d is the transformation size. ⊙ denotes the element-wise product.
    From the description, p u i = 1 / ∣ N u ∣ ∣ N i ∣ p_{ui} = 1 / \sqrt {|N_{u}||N_{i}|} pui=1/NuNi , N u N_{u} Nu and N i N_{i} Ni denote the first-hop neighbors of user u u u and item i i i. 【 N u N_{u} Nu and N i N_{i} Ni are the amounts of their neighbors?】
    From the viewpoint of representation learning, p u i p_{ui} pui reflects how much the historical item contributes the user preference.
    From the viewpoint of message passing, p u i p_{ui} pui can be interpreted as a discount factor, considering the messages being propagated should decay with the path length.
  • Message Aggregation
    aggregate the messages propagated from u’s neighborhood to refine u’s representation.
    在这里插入图片描述
    在这里插入图片描述
    W 1 W_{1} W1 is the weight matrix shared with the one used in Equation above. m u ⬅ u m_{u⬅u} muuretains the information of original features.
    e u ( 1 ) e^{(1)}_{u} eu(1) denotes the representation of user u u u obtained after the first embedding propagation layer.

we can obtain the representation e i ( 1 ) e^{(1)}_{i} ei(1) for item i i i by propagating information from its connected users.
LeakyReLU ?

High-order Propagation

stack more embedding propagation layers to explore the high-order connectivity information.
在这里插入图片描述
在这里插入图片描述
where W 1 ( l ) , W 2 ( l ) , ∈ R d l × d l − 1 W^{(l)}_{1},W^{(l)}_{2},∈ R^{d_{l}×d_{l−1}} W1(l),W2(l),Rdl×dl1are the trainable transformation matrices, and d i d_{i} dis the transformation size; e i ( l − 1 ) e^{(l−1)}_{i} ei(l1) is the item representation generated from the previous message-passing steps, memorizing the messages from its (l-1)-hop neighbors.
Analogously, we can obtain the representation for item i at the layer l.

  • Propagation Rule in Matrix Form : refer to the original paper
    Neural Graph Collaborative Filtering_第3张图片

Model Prediction

在这里插入图片描述
where ∥ ∥ is the concatenation operation. We can use all kinds of ways to aggregate these embeddings. The reason why use concatenation is that it is simplicity and quite effectively in graph neural networks which refers to layer-aggregation mechanism.
Finally, we conduct the inner product to estimate the user’s preference towards the target item:
在这里插入图片描述

Optimization

在这里插入图片描述
Neural Graph Collaborative Filtering_第4张图片

Message and Node Dropout:

prevent overfitting. adopt two dropout techniques in NGCF: message dropout and node dropout.

  • message dropout: drop out the messages being propagated in Equation (6), with a probability p 1 p_{1} p1.As such, in the l-th propagation layer, only partial messages contribute to the refined representations. The message dropout endows the representations more robustness against the presence or absence of single connections between users and items
  • node dropout: randomly block a particular node and discard all its outgoing messages. For the l-th propagation layer, we randomly drop ( M + N ) p 2 (M + N)p_{2} (M+N)p2 nodes of the Laplacian matrix, where p 2 p_{2} p2 is the dropout ratio. the node dropout focuses on reducing the influences of particular users or items

Related Work

Model-based CF Methods

To enhance the embedding function, much effort has been devoted to incorporate side information. While inner product can force user and item embeddings of an observed interaction close to each other, its linearity makes it insufficient to reveal the complex and nonlinear relationships between users and items.
Recent efforts [11, 14, 15, 35] focus on exploiting deep learning techniques to enhance the interaction function, so as to capture the nonlinear feature interactions between users and items. For instance, neural CF models, such as NeuMF [14], employ nonlinear neural networks as the interaction function; meanwhile, translation-based CF models, such as LRML [28], instead model the interaction strength with Euclidean distance metrics.
The design of the embedding function is insufficient to yield optimal embeddings for CF, since the CF signals are only implicitly captured. there is no guarantee that the indirectly connected users and items are close in the embedding space. Without an explicit encoding of the CF signals, it is hard to obtain embeddings that meet the desired properties.

Graph-Based CF Methods

【Knowledge of blind area】

Graph Convolutional Networks

Other methods use GCN on other sides or not make full use of high-order connectivities.

Experiences

  • RQ1:How does NGCF perform as compared with state-of-the-art CF methods?
  • RQ2: How do different hyper-parameter settings (e.g., depth of layer, embedding propagation layer, layer-aggregation mechanism, message dropout, and node dropout) affect NGCF?
  • How do the representations benefit from the high-order connectivity?

Dataset Description

Neural Graph Collaborative Filtering_第5张图片

Experimental Settings

  • recall@K & ndcg@K , set K = 20

  • baseline model:

    • MF & NeuMF &CMN:
      主要通过计算近似度来进行推荐,
    • HOP-Rec:This is a state-of-the-art graph-based model, where the high-order neighbors derived from random walks are exploited to enrich the user-item interaction data.
    • PinSage & GC-MC: 基于GCN,

RQ1: Performance Comparison

Overall Comparison

Neural Graph Collaborative Filtering_第6张图片

  • NeuMF consistently outperforms MF across all cases, demonstrating the importance of nonlinear feature interactions between user and item embeddings.
  • the performance of GC-MC verifies that incorporating the first-order neighbors can improve the representation learning.
  • attention mechanism can improve performance.
  • the positive effect of modeling the high-order connectivity or neighbors.
  • the importance of capturing collaborative signal in the embedding function.
  • different propagation layers encode different information in the representations
Performance Comparison w.r.t. Interaction Sparsity Levels

Q:whether exploiting connectivity information helps to alleviate this issue.

Neural Graph Collaborative Filtering_第7张图片

  • exploiting high-order connectivity greatly facilitates the representation learning for inactive users, as the collaborative signal can be effectively captured.

RQ2:Study of NGCF

Effect of Layer Numbers

Neural Graph Collaborative Filtering_第8张图片

  • applying a too deep architecture might introduce noises to the representation learning
  • conducting three propagation layers is sufficient to capture the CF signal

Effect of Embedding Propagation Layer and LayerAggregation Mechanism

Neural Graph Collaborative Filtering_第9张图片

  • importance of messages passed by the nodes themselves and the nonlinear transformation.
  • the significance of layer-aggregation mechanism

Effect of Dropout

Neural Graph Collaborative Filtering_第10张图片

  • node dropout can be an effective strategy to address overfitting of graph neural networks

Test Performance w.r.t. Epoch

Neural Graph Collaborative Filtering_第11张图片

  • better model capacity of NGCF and the effectiveness of performing embedding propagation in the embedding space.

RQ3: Effect of High-order Connectivity

andomly selected six users from Gowalla dataset, as well as their relevant items. We observe how their representations are influenced w.r.t. the depth of NGCF.

Neural Graph Collaborative Filtering_第12张图片

  • The connectivities of users and items are well reflected in the embedding space, that is, they are embedded into the near part of the space. In particular, the representations of NGCF-3 exhibit discernible clustering, meaning that the points with the same colors (i.e., the items consumed by the same users) tend to form the clusters.
  • when stacking three embedding propagation layers, the embeddings of their historical items tend to be closer. It qualitatively verifies that the proposed embedding propagation layer is capable of injecting the explicit collaborative signal (via NGCF-3) into the representations.

Conclusion and Future work

  • exploring the adversarial learning on user/item embedding and the graph structure for enhancing the robustness of NGCF.
  • This work represents an initial attempt to exploit structural knowledge with the message-passing mechanism in model-based CF and opens up new research possibilities.
  • the cross features [40] in context-aware and semantics-rich recommendation
  • item knowledge graph [31], and social networks [33].
  • by integrating item knowledge graph with user-item graph, we can establish knowledge-aware connectivities between users and items, which help unveil user decision-making process in choosing items.

My thoughts

  • 这篇论文是一个非常好的论文的书写范例,逻辑很清晰,故事讲得好,实验的安排也很合理
  • 提供了将协同信号融入CF的一种新的思路,可以尝试将协同信号与知识图谱融合,例如在计算embeddings时,即融入协同信号又融入KG中对应实体的知识表征;
  • 反过来,可以将协同信号作为补全知识图谱的一种辅助信息,是否可以提高KG补全的准确性?
  • 实验验证了传播机制如果过深,都对导致过拟合和噪音的干扰,最佳深度大概在3-4之间
  • 基准模型的选择要选择与自己的模型有相关性的并且要将基准模型分类
  • 这篇文章的不足在于在最终组合embeddings的时候,将不同深度的路径直接进行拼接,没有对不同的路径或者节点添加例如注意力机制,这个是可以改进的点。
  • 它的实验部分做的很好,值得借鉴。

你可能感兴趣的:(推荐系统)