论文《Collaboration-Aware Graph Convolutional Network for Recommender Systems》阅读
论文《Collaboration-Aware Graph Convolutional Network for Recommender Systems》阅读
- 论文概况
- Introduction and Motivation
- Methodology
- LightGCN 传播形式
- CIR
- CAGCN
- Implementation
- Experiments
论文概况
论文《Collaboration-Aware Graph Convolutional Network for Recommender Systems》对推荐场景下的GNN进行了改良,在 D − 1 / 2 A D − 1 / 2 \mathbf{D}^{-1/2}\mathbf{A}\mathbf{D}^{-1/2} D−1/2AD−1/2 的简单无权integration的基础上加上了邻居之间的重要性参数。论文来自范德比尔特大学Yu Wang,提出了模型CAGCN。
论文地址:https://dl.acm.org/doi/10.1145/3543507.3583229
代码仓库:https://github.com/YuWVandy/CAGCN
Introduction and Motivation
论文提出了一个概念 CIR —— Common Interacted Ratio, 共同交互比例,用于衡量 neighbor 在 整个 neighborhood 圈子中 对于 target 的综合影响能力。
同时,论文的证明部分比较多。这里主要梳理一下pipeline。
Methodology
LightGCN 传播形式
首先给出 LightGCN 的传播形式:
e u l + 1 = d u − 0.5 ∑ j ∈ N u 1 d j − 0.5 e j l , e i l + 1 = d i − 0.5 ∑ v ∈ N i 1 d v − 0.5 e v l , (1) \begin{aligned}\mathbf{e}_u^{l+1}&=d_u^{-0.5} \sum_{j \in \mathcal{N}_u^1} d_j^{-0.5} \mathbf{e}_j^l,\\ \mathbf{e}_i^{l+1}&=d_i^{-0.5} \sum_{v \in \mathcal{N}_i^1} d_v^{-0.5} \mathbf{e}_v^l, \end{aligned} \tag{1} eul+1eil+1=du−0.5j∈Nu1∑dj−0.5ejl,=di−0.5v∈Ni1∑dv−0.5evl,(1)
传播完成后,聚合方式采用 mean-pooling 方式,对 L + 1 L+1 L+1 层 embedding都进行逐位求平均操作如下:
e u = 1 ( L + 1 ) ∑ l = 0 L e u l , e i = 1 ( L + 1 ) ∑ l = 0 L e i l , ∀ u ∈ U , ∀ i ∈ I (2) \begin{aligned}\mathbf{e}_u&=\frac{1}{(L+1)} \sum_{l=0}^L \mathbf{e}_u^l,\\ \mathbf{e}_i&=\frac{1}{(L+1)} \sum_{l=0}^L \mathbf{e}_i^l,\end{aligned} \quad \forall u \in \mathcal{U}, \forall i \in \mathcal{I} \tag{2} euei=(L+1)1l=0∑Leul,=(L+1)1l=0∑Leil,∀u∈U,∀i∈I(2)
Loss 采用 BPR Loss,如下:
L B P R = ∑ ( u , i , i − ) ∈ O − ln σ ( y u i − y u i − ) , (3) \mathcal{L}_{\mathrm{BPR}}=\sum_{\left(u, i, i^{-}\right) \in O}-\ln \sigma\left(y_{u i}-y_{u i^{-}}\right), \tag{3} LBPR=(u,i,i−)∈O∑−lnσ(yui−yui−),(3)
CIR
LightGCN 传播在整个图 G = { V , E } \mathcal{G} = \left\{\mathcal{V}, \mathcal{E}\right\} G={V,E},为提取邻居间的互动信息和交互影响,提取以节点 p p p 为中心的子图 S p = ( V S p , E S p ) \mathcal{S}_p=\left(\mathcal{V}_{\mathcal{S}_p}, \mathcal{E}_{\mathcal{S}_p}\right) Sp=(VSp,ESp) ,其中 N ~ p 1 = N p 1 ∪ { p } \tilde{N}_p^1=\mathcal{N}_p^1 \cup\{p\} N~p1=Np1∪{p} 表示 p p p 及其 l l l 跳 邻居集合。
作者提出两个关键问题:
RQ1: 交互影响如何捕捉并提高 ranking 表现?
RQ2: 交互影响何时提高性能?
作者将 LightGCN 的 L L L 层 embedding 集合后 的 ( u , i ) \left(u, i\right) (u,i) 对 间的交互预测表示合并得到如下形式:
y u i L = ( ∑ l 1 = 0 L ∑ j ∈ N u l l 1 ∑ l 2 = l 1 L β l 2 α j u l 2 e j 0 ) ⊤ ( ∑ l 1 = 0 L ∑ v ∈ N i l l 1 ∑ l 2 = l 1 L β l 2 α v i l 2 e v 0 ) , (4) y_{u i}^L=\left(\sum_{l_1=0}^L \sum_{j \in \mathcal{N}_u^l}^{l_1} \sum_{l_2=l_1}^L \beta_{l_2} \alpha_{j u}^{l_2} \mathbf{e}_j^0\right)^{\top}\left(\sum_{l_1=0}^L \sum_{v \in \mathcal{N}_i^l}^{l_1} \sum_{l_2=l_1}^L \beta_{l_2} \alpha_{v i}^{l_2} \mathbf{e}_v^0\right), \tag{4} yuiL= l1=0∑Lj∈Nul∑l1l2=l1∑Lβl2αjul2ej0 ⊤ l1=0∑Lv∈Nil∑l1l2=l1∑Lβl2αvil2ev0 ,(4)
其中, α j u l 2 = ∑ P j u l 2 ∈ P j u l 2 ∏ e p q ∈ P j u l 2 d p − 0.5 d q − 0.5 \alpha_{j u}^{l_2}=\sum_{P_{j u}^{l_2} \in \mathscr{P}_{j u}^{l_2}} \prod_{e_{p q} \in P_{j u}^{l_2}} d_p^{-0.5} d_q^{-0.5} αjul2=∑Pjul2∈Pjul2∏epq∈Pjul2dp−0.5dq−0.5,( α j u l 2 = 0 if P j u l 2 = ∅ \alpha_{j u}^{l_2}=0 \text { if } \mathscr{P}_{j u}^{l_2}=\emptyset αjul2=0 if Pjul2=∅)表示节点 j , u j, u j,u 间 举例为 l 2 l_2 l2 的 所有路径权重之和。 β l 2 \beta_{l_2} βl2 表示层数为 l 2 l_{2} l2 的 embedding 的权重贡献。因此,将上述公式分为了三部分用于评估 CIR 对结果的 影响,具体如下:
对于 L L L 跳 节点 ( i , j ) (i,j) (i,j) 及其影响范围 ( { ( j , v ) ∣ j ∈ ⋃ l = 0 L N u l , v ∈ ⋃ l = 0 L N i l } \left\{(j, v) \mid j \in \bigcup_{l=0}^L \mathcal{N}_u^l, v \in \bigcup_{l=0}^L \mathcal{N}_i^l\right\} {(j,v)∣j∈⋃l=0LNul,v∈⋃l=0LNil},其结果主要受三部分影响:
- e j 0 ⊤ e v 0 {\mathbf{e}_{j}^{0}}^\top\mathbf{e}_{v}^{0} ej0⊤ev0
- { α j u l } l = 0 L ( { α v i l } l = 0 L ) { β l } l = 0 L \left\{\alpha_{j u}^l\right\}_{l=0}^L\left(\left\{\alpha_{v i}^l\right\}_{l=0}^L\right)\left\{\beta_l\right\}_{l=0}^L {αjul}l=0L({αvil}l=0L){βl}l=0L
- { β l } l = 0 L \left\{\beta_{l}\right\}_{l=0}^{L} {βl}l=0L
此外,定义CIR为:针对用户 u u u 的任意邻居 j ∈ N u l j \in \mathcal{N}_{u}^{l} j∈Nul, j j j 对 用户 u u u 的 L + 1 L+1 L+1 跳范围内的所有邻居的共同交互率 CIR ,即 ϕ u L ^ ( j ) \phi_u^{\widehat{L}}(j) ϕuL (j),定义为 j j j 与 u u u 的 所有邻居 N u 1 \mathcal{N}_u^1 Nu1 的最大路径为 2 L ^ 2\widehat{L} 2L 的均值如下:
ϕ u L ^ ( j ) = 1 ∣ N u 1 ∣ ∑ i ∈ N u 1 ∑ l = 1 L ^ α 2 l ∑ P j i 2 l ∈ P j i 2 l 1 f ( { N k 1 ∣ k ∈ P j i 2 l } ) , ∀ j ∈ N u 1 , ∀ u ∈ U . (5) \phi_u^{\widehat{L}}(j)=\frac{1}{\left|\mathcal{N}_u^1\right|} \sum_{i \in \mathcal{N}_u^1} \sum_{l=1}^{\widehat{L}} \alpha^{2 l}\sum_{P_{j i}^{2 l} \in \mathscr{P}_{j i}^{2 l}} \frac{1}{f\left(\left\{\mathcal{N}_k^1 \mid k \in P_{j i}^{2 l}\right\}\right)},\\\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad \forall j \in \mathcal{N}_u^1, \forall u \in \mathcal{U}. \tag{5} ϕuL (j)=∣Nu1∣1i∈Nu1∑l=1∑L α2lPji2l∈Pji2l∑f({Nk1∣k∈Pji2l})1,∀j∈Nu1,∀u∈U.(5)
其中, { N k 1 ∣ k ∈ P j i 2 l } \left\{\mathcal{N}_k^1 \mid k \in P_{j i}^{2 l}\right\} {Nk1∣k∈Pji2l} 表示 P j i 2 l P_{j i}^{2 l} Pji2l 中任意节点 k k k 的一阶邻居; f ( ⋅ ) f(\cdot) f(⋅) 是归一化函数,用于指导 P j i 2 l \mathscr{P}_{j i}^{2 l} Pji2l 中 路径的 权重; α 2 l \alpha^{2 l} α2l 是 路径 的 系数。
ϕ u L ^ ( j ) \phi_u^{\widehat{L}}(j) ϕuL (j) 由 路径长度 2 → 2 L 2 \rightarrow 2L 2→2L 的路径决定。 指定不同的 L ^ \widehat{L} L 及 f ( ⋅ ) f(\cdot) f(⋅), ϕ u L ^ ( j ) \phi_u^{\widehat{L}}(j) ϕuL (j) (简写为 ϕ u ( j ) \phi_u(j) ϕu(j) )结合 ∑ P j i 2 l ∈ P j i 2 l 1 f ( { N k 1 ∣ k ∈ P j i 2 l } ) \sum_{P_{j i}^{2 l} \in \mathscr{P}_{j i}^{2 l}} \frac{1}{f\left(\left\{\mathcal{N}_k^1 \mid k \in P_{j i}^{2 l}\right\}\right)} ∑Pji2l∈Pji2lf({Nk1∣k∈Pji2l})1 就能体现不同的 图相似性。
这些都会在不同的实现形式中给出不同的实现方式,给出这一定义,主要是为了证明越大 CIR 的节点作用越大。作者给出实验,如下:
CAGCN
CAGCN是为了给邻居节点分配不同的权重进行优化,首先给出群众矩阵如下:
Φ i j = { ϕ i ( j ) , if A i j > 0 0 , if A i j = 0 , ∀ i , j ∈ V (6) \Phi_{i j}=\left\{\begin{array}{ll} \phi_i(j), & \text { if } \mathrm{A}_{i j}>0 \\ 0, & \text { if } \mathrm{A}_{i j}=0 \\ \end{array}, \forall i, j \in \mathcal{V}\right. \tag{6} Φij={ϕi(j),0, if Aij>0 if Aij=0,∀i,j∈V(6)
相应地,聚合函数如下:
e i l + 1 = ∑ j ∈ N i 1 g ( γ i Φ i j ∑ k ∈ N i 1 Φ i k , d i − 0.5 d j − 0.5 ) e j l , ∀ i ∈ V (7) \mathrm{e}_i^{l+1}=\sum_{j \in \mathcal{N}_i^1} g\left(\gamma_i \frac{\Phi_{i j}}{\sum_{k \in \mathcal{N}_i^1} \Phi_{i k}}, d_i^{-0.5} d_j^{-0.5}\right) \mathrm{e}_j^l, \forall i \in \mathcal{V} \tag{7} eil+1=j∈Ni1∑g(γi∑k∈Ni1ΦikΦij,di−0.5dj−0.5)ejl,∀i∈V(7)
模型结构图如下:
Implementation
具体地,针对 ϕ u L ^ ( j ) \phi_u^{\widehat{L}}(j) ϕuL (j)
ϕ u L ^ ( j ) = 1 ∣ N u 1 ∣ ∑ i ∈ N u 1 ∑ l = 1 L ^ β 2 l ∑ P j i 2 l ∈ P j i 2 l 1 f ( { N k 1 ∣ k ∈ P j i 2 l } ) , (8) \phi_u^{\widehat{L}}(j)=\frac{1}{\left|\mathcal{N}_u^1\right|} \sum_{i \in \mathcal{N}_u^1} \sum_{l=1}^{\widehat{L}} \beta^{2 l} \sum_{P_{j i}^{2 l} \in \mathscr{P}_{j i}^{2 l}} \frac{1}{f\left(\left\{\mathcal{N}_k^1 \mid k \in P_{j i}^{2 l}\right\}\right)}, \tag{8} ϕuL (j)=∣Nu1∣1i∈Nu1∑l=1∑L β2lPji2l∈Pji2l∑f({Nk1∣k∈Pji2l})1,(8)
本文提供不同的相似度度量函数:
- 杰卡尔德相似性:
J C ( i , j ) = ∣ N i 1 ∩ N j 1 ∣ ∣ N i 1 ∪ N j 1 ∣ (9) \mathrm{JC}(i, j)=\frac{\left|\mathcal{N}_i^1 \cap \mathcal{N}_j^1\right|}{\left|\mathcal{N}_i^1 \cup \mathcal{N}_j^1\right|} \tag{9} JC(i,j)= Ni1∪Nj1 Ni1∩Nj1 (9)
指定 L ^ = 1 \widehat{L}=1 L =1, f ( { N k 1 ∣ k ∈ P j i 2 } ) = ∣ N i 1 ∪ N j 1 ∣ f\left(\left\{\mathcal{N}_k^1 \mid k \in P_{j i}^2\right\}\right)=\left|\mathcal{N}_i^1 \cup \mathcal{N}_j^1\right| f({Nk1∣k∈Pji2})= Ni1∪Nj1 , 可以得到:
ϕ u 1 ( j ) = 1 ∣ N u 1 ∣ ∑ i ∈ N u 1 β 2 ∑ P j i 2 ∈ P j i 2 1 ∣ N i 1 ∪ N j 1 ∣ = β 2 ∣ N u 1 ∣ ∑ i ∈ N u 1 ∣ N i 1 ∩ N j 1 ∣ ∣ N i 1 ∪ N j 1 ∣ = β 2 ∣ N u 1 ∣ ∑ i ∈ N u 1 J C ( i , j ) (10) \begin{aligned} \phi_u^1(j)&=\frac{1}{\left|\mathcal{N}_u^1\right|} \sum_{i \in \mathcal{N}_u^1} \beta^2 \sum_{P_{j i}^2 \in \mathscr{P}_{j i}^2} \frac{1}{\left|\mathcal{N}_i^1 \cup \mathcal{N}_j^1\right|}\\&=\frac{\beta^2}{\left|\mathcal{N}_u^1\right|} \sum_{i \in \mathcal{N}_u^1} \frac{\left|\mathcal{N}_i^1 \cap \mathcal{N}_j^1\right|}{\left|\mathcal{N}_i^1 \cup \mathcal{N}_j^1\right|}\\&=\frac{\beta^2}{\left|\mathcal{N}_u^1\right|} \sum_{i \in \mathcal{N}_u^1} \mathrm{JC}(i, j)\end{aligned} \tag{10} ϕu1(j)=∣Nu1∣1i∈Nu1∑β2Pji2∈Pji2∑ Ni1∪Nj1 1=∣Nu1∣β2i∈Nu1∑ Ni1∪Nj1 Ni1∩Nj1 =∣Nu1∣β2i∈Nu1∑JC(i,j)(10)
- Salton 余弦相似度
S C ( i , j ) = ∣ N i 1 ∩ N j 1 ∣ ∣ N i 1 ∪ N j 1 ∣ (11) \mathrm{SC}(i, j)=\frac{\left|\mathcal{N}_i^1 \cap \mathcal{N}_j^1\right|}{\sqrt{\left|\mathcal{N}_i^1 \cup \mathcal{N}_j^1\right|}} \tag{11} SC(i,j)= Ni1∪Nj1 Ni1∩Nj1 (11)
指定 L ^ = 1 \widehat{L}=1 L =1, f ( { N k 1 ∣ k ∈ P j i 2 } ) = ∣ N i 1 ∪ N j 1 ∣ f\left(\left\{\mathcal{N}_k^1 \mid k \in P_{j i}^2\right\}\right)=\sqrt{\left|\mathcal{N}_i^1 \cup \mathcal{N}_j^1\right|} f({Nk1∣k∈Pji2})= Ni1∪Nj1 :
ϕ u 1 ( j ) = 1 ∣ N u 1 ∣ ∑ i ∈ N u 1 β 2 ∑ P j i 2 ∑ ∈ P j i 2 1 ∣ N i 1 ∪ N j 1 ∣ = β 2 ∣ N u 1 ∣ ∑ i ∈ N u 1 ∣ N i 1 ∩ N j 1 ∣ ∣ N i 1 ∪ N j 1 ∣ = β 2 ∣ N u 1 ∣ ∑ i ∈ N u 1 SC ( i , j ) (12) \begin{aligned} \phi_u^1(j)&=\frac{1}{\left|\mathcal{N}_u^1\right|} \sum_{i \in \mathcal{N}_u^1} \beta^2 \sum_{P_{j i}^2} \sum_{\in \mathscr{P}_{j i}^2} \frac{1}{\sqrt{\left|\mathcal{N}_i^1 \cup \mathcal{N}_j^1\right|}}\\&=\frac{\beta^2}{\left|\mathcal{N}_u^1\right|} \sum_{i \in \mathcal{N}_u^1} \frac{\left|\mathcal{N}_i^1 \cap \mathcal{N}_j^1\right|}{\sqrt{\left|\mathcal{N}_i^1 \cup \mathcal{N}_j^1\right|}}\\&=\frac{\beta^2}{\left|\mathcal{N}_u^1\right|} \sum_{i \in \mathcal{N}_u^1} \operatorname{SC}(i, j)\end{aligned} \tag{12} ϕu1(j)=∣Nu1∣1i∈Nu1∑β2Pji2∑∈Pji2∑ Ni1∪Nj1 1=∣Nu1∣β2i∈Nu1∑ Ni1∪Nj1 Ni1∩Nj1 =∣Nu1∣β2i∈Nu1∑SC(i,j)(12)
- 共同邻居个数
CN ( i , j ) = ∣ N i 1 ∩ N j 1 ∣ (13) \operatorname{CN}(i, j)=\left|\mathcal{N}_i^1 \cap \mathcal{N}_j^1\right| \tag{13} CN(i,j)= Ni1∩Nj1 (13)
指定 L ^ = 1 \widehat{L}=1 L =1, f ( { N k 1 ∣ k ∈ P j i 2 } ) = 1 f\left(\left\{\mathcal{N}_k^1 \mid k \in P_{j i}^2\right\}\right)=1 f({Nk1∣k∈Pji2})=1,有:
ϕ u 1 ( j ) = 1 ∣ N u 1 ∣ ∑ i ∈ N u 1 β 2 ∑ P j i 2 ∈ P j i 2 1 = β 2 ∣ N u 1 ∣ ∑ i ∈ N u 1 ∣ N i 1 ∩ N j 1 ∣ = β 2 ∣ N u 1 ∣ ∑ i ∈ N u 1 C N ( i , j ) (14) \begin{aligned}\phi_u^1(j)&=\frac{1}{\left|\mathcal{N}_u^1\right|} \sum_{i \in \mathcal{N}_u^1} \beta^2 \sum_{P_{j i}^2 \in \mathscr{P}_{j i}^2} 1\\&=\frac{\beta^2}{\left|\mathcal{N}_u^1\right|} \sum_{i \in \mathcal{N}_u^1}\left|\mathcal{N}_i^1 \cap \mathcal{N}_j^1\right|\\&=\frac{\beta^2}{\left|\mathcal{N}_u^1\right|} \sum_{i \in \mathcal{N}_u^1} \mathrm{CN}(i, j)\end{aligned} \tag{14} ϕu1(j)=∣Nu1∣1i∈Nu1∑β2Pji2∈Pji2∑1=∣Nu1∣β2i∈Nu1∑ Ni1∩Nj1 =∣Nu1∣β2i∈Nu1∑CN(i,j)(14)
- LHN
LHN ( i , j ) = ∣ N i 1 ∩ N j 1 ∣ ∣ N i 1 ∣ ⋅ ∣ N j 1 ∣ (15) \operatorname{LHN}(i, j)=\frac{\left|\mathcal{N}_i^1 \cap \mathcal{N}_j^1\right|}{\left|\mathcal{N}_i^1\right| \cdot\left|\mathcal{N}_j^1\right|} \tag{15} LHN(i,j)=∣Ni1∣⋅ Nj1 Ni1∩Nj1 (15)
指定 L ^ = 1 \widehat{L}=1 L =1, f ( { N k 1 ∣ k ∈ P j i 2 } ) = ∣ N i 1 ∣ ⋅ ∣ N j 1 ∣ f\left(\left\{\mathcal{N}_k^1 \mid k \in P_{j i}^2\right\}\right)=\left|\mathcal{N}_i^1\right| \cdot\left|\mathcal{N}_j^1\right| f({Nk1∣k∈Pji2})= Ni1 ⋅ Nj1 ,则有:
ϕ u 1 ( j ) = 1 ∣ N u 1 ∣ ∑ i ∈ N u 1 β 2 ∑ P j i 2 1 ∣ N i 1 ∣ ⋅ ∣ N j 1 ∣ = β 2 ∣ N u 1 ∣ ∑ i ∈ N u 1 ∣ N i 1 ∩ N j 1 ∣ ∣ N i 1 ∣ ⋅ ∣ N j 1 ∣ = β 2 ∣ N u 1 ∣ ∑ i ∈ N u 1 LHN ( i , j ) (16) \begin{aligned}\phi_{\mathcal{u}}^1(j)&=\frac{1}{\left|\mathcal{N}_{\mathcal{u}}^1\right|} \sum_{i \in \mathcal{N}_{\mathcal{u}}^1} \beta^2 \sum_{P_{j i}^2} \frac{1}{\left|\mathcal{N}_i^1\right| \cdot\left|\mathcal{N}_j^1\right|}\\&=\frac{\beta^2}{\left|\mathcal{N}_{\mathcal{u}}^1\right|} \sum_{i \in \mathcal{N}_{\mathcal{u}}^1} \frac{\left|\mathcal{N}_i^1 \cap \mathcal{N}_j^1\right|}{\left|\mathcal{N}_i^1\right| \cdot\left|\mathcal{N}_j^1\right|}\\&=\frac{\beta^2}{\left|\mathcal{N}_{\mathcal{u}}^1\right|} \sum_{i \in \mathcal{N}_{\mathcal{u}}^1} \operatorname{LHN}(i, j)\end{aligned} \tag{16} ϕu1(j)=∣Nu1∣1i∈Nu1∑β2Pji2∑∣Ni1∣⋅ Nj1 1=∣Nu1∣β2i∈Nu1∑∣Ni1∣⋅ Nj1 Ni1∩Nj1 =∣Nu1∣β2i∈Nu1∑LHN(i,j)(16)