文献分享: Muvera多向量到单向量的转化方法——原理与理论保证
原论文
详细的理论证明,包含了所有细节
文章目录
- 1. MUVERA \textbf{1. MUVERA} 1. MUVERA的全过程
- 2. \textbf{2. } 2. 定理 2.1 \textbf{2.1} 2.1证明的思路
- 3.0. \textbf{3.0. } 3.0. 定理 2.1 \textbf{2.1} 2.1的主要内容
- 2.1. \textbf{2.1. } 2.1. 无投影无重复时 FDE \textbf{FDE} FDE相似度的上界
- 2.2. \textbf{2.2. } 2.2. 加上一定维度的投影后原上界近似成立
- 2.3. \textbf{2.3. } 2.3. 已给出无重复 FDE \textbf{FDE} FDE上界👉还需证明什么才能给出无重复 FDE \textbf{FDE} FDE的上下界
- 2.4. \textbf{2.4. } 2.4. 无投影无重复时 FDE \textbf{FDE} FDE相似度的子下界
- 2.5. \textbf{2.5. } 2.5. 加上一定维度的投影后原子下界近似成立
- 3. \textbf{3. } 3. 定理 2.2 \textbf{2.2} 2.2证明的思路
1. MUVERA \textbf{1. MUVERA} 1. MUVERA的全过程
![]()
1️⃣文本嵌入:对查询文本和段落文本分别应用嵌入器(如 ColBERTv2 \text{ColBERTv2} ColBERTv2),得到各自的多向量嵌入
- 查询嵌入 Q Q Q: { q 1 , q 2 , . . . , q m } \{q_1,q_2,...,q_m\} {q1,q2,...,qm},其中 q i ⊆ R d q_i\text{⊆}\mathbb{R}^{d} qi⊆Rd即为固定 d d d维
- 段落嵌入 P P P: { p 1 , p 2 , . . . , p n } \{p_1,p_2,...,p_n\} {p1,p2,...,pn},其中 p i ⊆ R d p_i\text{⊆}\mathbb{R}^{d} pi⊆Rd即为固定 d d d维
2️⃣向量分桶:用 SimHash \text{SimHash} SimHash将原有空间分为 2 k sim 2^{k_{\text{sim}}} 2ksim个桶,每个桶用长为 k sim k_{\text{sim}} ksim的定长二进制向量编码
- 法向抽取:从高斯分布中抽取 k sim ≥ 1 k_{\text{sim}}\text{≥}1 ksim≥1个向量 g 1 , … , g k sim ∈ R d g_{1},\ldots,g_{k_{\text{sim}}}\text{∈}\mathbb{R}^{d} g1,…,gksim∈Rd,作为 k sim k_{\text{sim}} ksim个超平面的法向量
- 空间划分: φ ( x ) = ( 1 ( ⟨ g 1 , x ⟩ > 0 ) , … , 1 ( ⟨ g k sim , x ⟩ > 0 ) ) \varphi(x)\text{=}\left(\mathbf{1}\left(\left\langle{}g_{1},x\right\rangle{}\text{>}0\right),\ldots,\mathbf{1}\left(\left\langle{}g_{k_{\text{sim}}},x\right\rangle{}\text{>}0\right)\right) φ(x)=(1(⟨g1,x⟩>0),…,1(⟨gksim,x⟩>0))
- 1 ( ⟨ g i , x ⟩ > 0 ) \mathbf{1}\left(\left\langle{}g_{i},x\right\rangle{}\text{>}0\right) 1(⟨gi,x⟩>0):当 ⟨ g i , x ⟩ > 0 \langle{}g_{i},x\rangle{}\text{>}0 ⟨gi,x⟩>0成立(即 x x x投影在超平面 g i g_i gi的正侧)时,将该位设为 1 1 1
- 向量分桶:让所有的 m + n m\text{+}n m+n个嵌入通过 φ ( ⋅ ) \varphi(\cdot) φ(⋅)得到长 k sim k_{\text{sim}} ksim的二进制编码,相同编码者(即桶编码)放入同一桶
3️⃣向量生成:按照如下三种情况,为每个桶 k k k都生成一个子向量 q ⃗ ( k ) , p ⃗ ( k ) ⊆ R d \vec{q}_{(k)},\vec{p}_{(k)}\text{⊆}\mathbb{R}^{d} q(k),p(k)⊆Rd
Case \textbf{Case} Case 桶 k \boldsymbol{k} k的情况 桶 k \boldsymbol{k} k子向量 q ⃗ ( k ) \boldsymbol{\vec{q}_{(k)}} q(k) 桶 k \boldsymbol{k} k子向量 p ⃗ ( k ) \boldsymbol{\vec{p}_{(k)}} p(k) Case-0 \text{Case-0} Case-0 无 p i → { q 1 , q 2 , . . . ∣ } p_i\text{→}\{q_1,q_2,...\mid{}\} pi→{q1,q2,...∣} q ⃗ ( k ) = ∑ q i \displaystyle{}\vec{q}_{(k)}=\sum{}q_i q(k)=∑qi p ⃗ ( k ) = \displaystyle{}\vec{p}_{(k)}= p(k)=与该桶海明距离最近的 p p p Case-1 \text{Case-1} Case-1 单 p i → { q 1 , q 2 , . . . ∣ p } p_i\text{→}\{q_1,q_2,...\mid{}p\} pi→{q1,q2,...∣p} q ⃗ ( k ) = ∑ q i \displaystyle{}\vec{q}_{(k)}=\sum{}q_i q(k)=∑qi p ⃗ ( k ) = p \displaystyle{}\vec{p}_{(k)}=p p(k)=p Case-n \text{Case-n} Case-n 多 p i → { q 1 , q 2 , . . . ∣ p 1 , p 2 , . . . } p_i\text{→}\{q_1,q_2,...\mid{}p_1,p_2,...\} pi→{q1,q2,...∣p1,p2,...} q ⃗ ( k ) = ∑ q i \displaystyle{}\vec{q}_{(k)}=\sum{}q_i q(k)=∑qi p ⃗ ( k ) = 1 # p ∑ p j \displaystyle{}\vec{p}_{(k)}=\cfrac{1}{\#p}\sum{}p_j p(k)=#p1∑pj( p p p的质心) 4️⃣向量压缩:对每个 q ⃗ ( k ) , p ⃗ ( k ) ⊆ R d \vec{q}_{(k)},\vec{p}_{(k)}\text{⊆}\mathbb{R}^{d} q(k),p(k)⊆Rd应用随机线性投影 ψ : R d → R d proj ( d proj ≤ d ) \psi\text{:}\mathbb{R}^{d}\text{→}\mathbb{R}^{d_{\text{proj}}}(d_{\text{proj}}\text{≤}d) ψ:Rd→Rdproj(dproj≤d)
- 投影函数: ψ ( x ) = ( 1 d proj ) S x \boldsymbol{\psi}(x)\text{=}\left(\cfrac{1}{\sqrt{d_{\text{proj}}}}\right)\mathbf{S}x ψ(x)=(dproj1)Sx,其中 S ∈ R d proj × d \mathbf{S}\text{∈}\mathbb{R}^{d_{\text{proj}}\text{×}d} S∈Rdproj×d为随机矩阵
- 当 d proj = d d_{\text{proj}}\text{=}d dproj=d时 S \mathbf{S} S的每个元素 s i j = 1 s_{ij}\text{=}1 sij=1,即 ψ ( x ) = x \boldsymbol{\psi}(x)\text{=}x ψ(x)=x
- 当 d proj < d d_{\text{proj}}\text{<}d dproj<d时 S \mathbf{S} S的每个元素 s i j s_{ij} sij满足离散均匀分布 P r [ s i j = 1 ] = P r [ s i j =– 1 ] = 1 2 \mathbb{Pr}\left[s_{ij}\text{=}1\right]\text{=}\mathbb{Pr}\left[s_{ij}\text{=}–1\right]\text{=}\cfrac{1}{2} Pr[sij=1]=Pr[sij=–1]=21
- 投影操作: { q ⃗ ( k ) , ψ ⊆ R d proj ← ψ ( q ⃗ ( k ) ) q ⃗ ( k ) ⊆ R d p ⃗ ( k ) , ψ ⊆ R d proj ← ψ ( p ⃗ ( k ) ) p ⃗ ( k ) ⊆ R d \begin{cases}\vec{q}_{(k),\psi}\text{⊆}\mathbb{R}^{d_{\text{proj}}}\xleftarrow{\psi\left(\vec{q}_{(k)}\right)}\vec{q}_{(k)}\text{⊆}\mathbb{R}^{d}\\\\\vec{p}_{(k),\psi}\text{⊆}\mathbb{R}^{d_{\text{proj}}}\xleftarrow{\psi\left(\vec{p}_{(k)}\right)}\vec{p}_{(k)}\text{⊆}\mathbb{R}^{d}\end{cases} ⎩ ⎨ ⎧q(k),ψ⊆Rdprojψ(q(k))q(k)⊆Rdp(k),ψ⊆Rdprojψ(p(k))p(k)⊆Rd
- 合并操作:将每个桶的压缩向量依次从左到右合并 → { q ⃗ ψ = ( q ⃗ ( 1 ) , ψ , … , q ⃗ ( B ) , ψ ) ⊆ R d proj 2 k s i m p ⃗ ψ = ( p ⃗ ( 1 ) , ψ , … , p ⃗ ( B ) , ψ ) ⊆ R d proj 2 k s i m \text{→}\begin{cases}\vec{q}_{\psi}\text{=}\left(\vec{q}_{(1),\psi},\ldots,\vec{q}_{(B),\psi}\right)\text{⊆}\mathbb{R}^{d_{\text{proj}}2^{k_{sim}}}\\\\\vec{p}_{\psi}\text{=}\left(\vec{p}_{(1),\psi},\ldots,\vec{p}_{(B),\psi}\right)\text{⊆}\mathbb{R}^{d_{\text{proj}}2^{k_{sim}}}\end{cases} →⎩ ⎨ ⎧qψ=(q(1),ψ,…,q(B),ψ)⊆Rdproj2ksimpψ=(p(1),ψ,…,p(B),ψ)⊆Rdproj2ksim
5️⃣重复生成:重复2️⃣ → \text{→} →4️⃣过程 R reps R_{\text{reps}} Rreps次,每次重复完成后生成 q ⃗ i , ψ , p ⃗ i , ψ \vec{q}_{i,\psi},\vec{p}_{i,\psi} qi,ψ,pi,ψ,拼接所有 q ⃗ i , ψ , p ⃗ i , ψ \vec{q}_{i,\psi},\vec{p}_{i,\psi} qi,ψ,pi,ψ
- Q Q Q最终生成的单向量: F q u e ( Q ) = ( q ⃗ 1 , ψ , … , q ⃗ R reps , ψ ) ⊆ R d proj 2 k s i m R reps \mathbf{F}_{\mathrm{que}}(Q)\text{=}\left(\vec{q}_{1,\psi},\ldots,\vec{q}_{R_{\text{reps}},\psi}\right)\text{⊆}\mathbb{R}^{d_{\text{proj}}2^{k_{sim}}R_{\text{reps}}} Fque(Q)=(q1,ψ,…,qRreps,ψ)⊆Rdproj2ksimRreps
- P P P最终生成的单向量: F d o c ( P ) = ( p ⃗ 1 , ψ , … , p ⃗ R reps , ψ ) ⊆ R d proj 2 k s i m R reps \mathbf{F}_{\mathrm{doc}}(P)\text{=}\left(\vec{p}_{1,\psi},\ldots,\vec{p}_{R_{\text{reps}},\psi}\right)\text{⊆}\mathbb{R}^{d_{\text{proj}}2^{k_{sim}}R_{\text{reps}}} Fdoc(P)=(p1,ψ,…,pRreps,ψ)⊆Rdproj2ksimRreps
2. \textbf{2. } 2. 定理 2.1 \textbf{2.1} 2.1证明的思路
3.0. \textbf{3.0. } 3.0. 定理 2.1 \textbf{2.1} 2.1的主要内容
👉前提 1 1 1:设定 ∀ ε , δ > 0 \forall{}\varepsilon,\delta\text{>}0 ∀ε,δ>0(其中 ε ≤ 1 2 \varepsilon{}\text{≤}\cfrac{1}{2} ε≤21且 δ ≤ ε \delta{≤}\varepsilon δ≤ε),给定单位向量集 P , Q ⊆ R d P,Q\text{⊆}\mathbb{R}^d P,Q⊆Rd并满足 m = ∣ Q ∣ + ∣ P ∣ m\text{=}|Q|\text{+}|P| m=∣Q∣+∣P∣
👉前提 2 2 2:选择参数 k sim = O ( 1 ε log ( m δ ) ) k_{\text{sim}}\text{=}O\left(\cfrac{1}{\varepsilon}\log{\left(\cfrac{m}{\delta}\right)}\right) ksim=O(ε1log(δm)), d proj = O ( 1 ε 2 log ( m ε δ ) ) d_{\text{proj}}\text{=}O\left(\cfrac{1}{\varepsilon^2}\log{\left(\cfrac{m}{\varepsilon\delta}\right)}\right) dproj=O(ε21log(εδm)), R reps = 1 R_{\text{reps}}\text{=}1 Rreps=1
👉结论: Chamfer ( Q , P ) – ε ≤FDE ( Q , P ) ≤Chamfer ( Q , P ) + ε \text{Chamfer}(Q,P)\text{–}\varepsilon\text{≤}\text{FDE}(Q,P)\text{≤}\text{Chamfer}(Q,P)\text{+}\varepsilon Chamfer(Q,P)–ε≤FDE(Q,P)≤Chamfer(Q,P)+ε以 Pr≥ 1 – δ \text{Pr}\text{≥}1\text{–}\delta Pr≥1–δ概率成立,即给出了 FDE \textbf{FDE} FDE相似度的上下界
2.1. \textbf{2.1. } 2.1. 无投影无重复时 FDE \textbf{FDE} FDE相似度的上界
➡️ Chamfer \text{Chamfer} Chamfer后期交互的本质:一套针对 q i q_i qi向量的映射策略
![]()
- f ( q i ) f(q_i) f(qi)操作为将 q i q_i qi映射到某一个 p q i ∈ P = { p 1 , p 2 , . . . , p m } p_{q_i}\text{∈}P\text{=}\{p_1,p_2,...,p_m\} pqi∈P={p1,p2,...,pm}向量
- 在 Chamfer \text{Chamfer} Chamfer后期交互中 f ( q i ) = p q i = arg p ∈ P MaxSim ( q i , P ) f(q_i)\text{=}p_{q_i}\text{=}\arg\limits_{p\text{∈}P}\text{MaxSim}(q_i,P) f(qi)=pqi=p∈PargMaxSim(qi,P),即遍历 p q i ∈ P = { p 1 , p 2 , . . . , p m } p_{q_i}\text{∈}P\text{=}\{p_1,p_2,...,p_m\} pqi∈P={p1,p2,...,pm},最终选取使内积 ⟨ q i , p q i ⟩ \langle{q_i,p_{q_i}}\rangle ⟨qi,pqi⟩最大的 p q i p_{q_i} pqi
- 最终合并所有的内积 ∑ i =1 n ⟨ q i , p q i ⟩ \displaystyle{\sum_{i\text{=1}}^n}\langle{q_i,p_{q_i}}\rangle i=1∑n⟨qi,pqi⟩是为 Chamfer \text{Chamfer} Chamfer相似度,即 ∑ i =1 n ⟨ q i , p q i ⟩ =Chamfer ( Q , P ) = ⟨ Q , P ′ ⟩ \displaystyle{\sum_{i\text{=1}}^n}\langle{q_i,p_{q_i}}\rangle\text{=}\text{Chamfer}(Q,P)\text{=}\langle{Q,P^\prime}\rangle i=1∑n⟨qi,pqi⟩=Chamfer(Q,P)=⟨Q,P′⟩
➡️ Muvera \text{Muvera} Muvera分桶的本质:另一套针对 q i q_i qi向量的映射策略
![]()
- 对于单个桶 Bucket-k \text{Bucket-k} Bucket-k中有 ⟨ q ⃗ ( k ) , p ⃗ ( k ) ⟩ = ⟨ ∑ k i q k i , p ⃗ ( k ) ⟩ = ∑ k i ⟨ q k i , p ⃗ ( k ) ⟩ \langle{\vec{q}_{(k)},\vec{p}_{(k)}}\rangle\text{=}\left\langle{\displaystyle\sum_{k_i}{}q_{k_i},\vec{p}_{(k)}}\right\rangle\text{=}\displaystyle\sum_{k_i}{}\left\langle{q_{k_i},\vec{p}_{(k)}}\right\rangle ⟨q(k),p(k)⟩=⟨ki∑qki,p(k)⟩=ki∑⟨qki,p(k)⟩,相当于为每个落入该桶 k k k的 q k i q_{k_i} qki映射了一个 p ⃗ ( k ) \vec{p}_{(k)} p(k)
- 合并 B B B个桶的结果有 ∑ k = 1 B ⟨ q ⃗ ( k ) , p ⃗ ( k ) ⟩ = ∑ q i ∈ Q ⟨ q i , p ⃗ ( k q i ) ⟩ \displaystyle\sum_{k\text{=}1}^{B}\langle{\vec{q}_{(k)},\vec{p}_{(k)}}\rangle\text{=}\displaystyle\sum_{q_i\text{∈}Q}{}\left\langle{q_{i},\vec{p}_{(k_{q_i})}}\right\rangle k=1∑B⟨q(k),p(k)⟩=qi∈Q∑⟨qi,p(kqi)⟩,相当于为每个 q i ∈ Q q_i\text{∈}Q qi∈Q都映射一个 p ⃗ ( k q i ) = p q i ∗ ∈ { p ⃗ ( 1 ) , . . . , p ⃗ ( B ) } \vec{p}_{(k_{q_i})}\text{=}p_{q_i}^*\text{∈}\{\vec{p}_{(1)},...,\vec{p}_{(B)}\} p(kqi)=pqi∗∈{p(1),...,p(B)}
➡️对两种映射方案的对比:有 ⟨ q i , p q i ∗ ⟩ ≤ ⟨ q i , p q i ⟩ \langle{q_i,p_{q_i}^*}\rangle\text{≤}\langle{q_i,p_{q_i}}\rangle ⟨qi,pqi∗⟩≤⟨qi,pqi⟩
![]()
- p q i p_{q_i} pqi的得到有且只有一种情况,就是 p q i = arg max p ∈ P ⟨ q i , p ⟩ = arg p ∈ P MaxSim ( q i , P ) \displaystyle{}p_{q_i}\text{=}\arg\max_{p\text{∈}P}\langle{q_i,p}\rangle\text{=}\arg\limits_{p\text{∈}P}\text{MaxSim}(q_i,P) pqi=argp∈Pmax⟨qi,p⟩=p∈PargMaxSim(qi,P)
- p q i ∗ p_{q_i}^* pqi∗的得到分为三种情况,即 Case-0/Case-n \text{Case-0/Case-n} Case-0/Case-n(其中视 Case-0 \text{Case-0} Case-0为 Case-n \text{Case-n} Case-n种 n=1 \text{n=1} n=1的情形)
- Case-0 \text{Case-0} Case-0时,假设 q i q_i qi所落入的是桶 k k k,则选取离桶 k k k编码最近的一点 p ^ k \hat{p}_{k} p^k,即 p q i ∗ = p ⃗ ( k ) = p ^ k p_{q_i}^*\text{=}\displaystyle{}\vec{p}_{(k)}\text{=}\hat{p}_{k} pqi∗=p(k)=p^k
- Case-n \text{Case-n} Case-n时,假设 q i q_i qi所落入的是桶 k k k,则 p q i ∗ = p ⃗ ( k ) = ∑ φ ( p j ) = k 1 ∣ P ∩ φ – 1 ( k ) ∣ p j p_{q_i}^*\text{=}\displaystyle{}\vec{p}_{(k)}\text{=}\displaystyle{}\sum_{\varphi(p_j)\text{=}k}\cfrac{1}{\left|P\text{∩}\varphi^{–1}(k)\right|}p_j pqi∗=p(k)=φ(pj)=k∑∣P∩φ–1(k)∣1pj,相当于桶 k k k种所有 p p p向量的质心
- 定义集合 Pdt= { ⟨ q i , p 1 ⟩ , ⟨ q i , p 2 ⟩ , . . . , ⟨ q i , p m ⟩ } \text{Pdt=}\{\langle{q_i,p_{1}}\rangle,\langle{q_i,p_{2}}\rangle,...,\langle{q_i,p_{m}}\rangle\} Pdt={⟨qi,p1⟩,⟨qi,p2⟩,...,⟨qi,pm⟩},根据 p q i p_{q_i} pqi的定义可知 ⟨ q i , p q 1 ⟩ \langle{q_i,p_{q_1}}\rangle ⟨qi,pq1⟩是 Pdt \text{Pdt} Pdt集合的最大值
- Case-0 \text{Case-0} Case-0时,显然有 ⟨ q i , p q i ∗ ⟩ = ⟨ q i , p ^ k ⟩ ∈Pdt \langle{q_i,p_{q_i}^*}\rangle\text{=}\langle{q_i,\hat{p}_{k}}\rangle\text{∈Pdt} ⟨qi,pqi∗⟩=⟨qi,p^k⟩∈Pdt,因此 ⟨ q i , p q i ∗ ⟩ ≤ ⟨ q i , p q i ⟩ \langle{q_i,p_{q_i}^*}\rangle\text{≤}\langle{q_i,p_{q_i}}\rangle ⟨qi,pqi∗⟩≤⟨qi,pqi⟩即小于集合最大值
- Case-n \text{Case-n} Case-n时,对每个 p j p_j pj都有 ⟨ q i , p q i ∗ ⟩ = ⟨ q i , ∑ φ ( p j ) = k 1 ∣ P ∩ φ – 1 ( k ) ∣ p j ⟩ = ∑ φ ( p j ) = k 1 ∣ P ∩ φ – 1 ( k ) ∣ ⟨ q i , p j ⟩ \langle{q_i,p_{q_i}^*}\rangle\text{=}\left\langle{q_i},\displaystyle{}\sum_{\varphi(p_j)\text{=}k}\cfrac{1}{\left|P\text{∩}\varphi^{–1}(k)\right|}p_j\right\rangle\text{=}\displaystyle{}\sum_{\varphi(p_j)\text{=}k}\cfrac{1}{\left|P\text{∩}\varphi^{–1}(k)\right|}\langle{q_i,p_j}\rangle ⟨qi,pqi∗⟩=⟨qi,φ(pj)=k∑∣P∩φ–1(k)∣1pj⟩=φ(pj)=k∑∣P∩φ–1(k)∣1⟨qi,pj⟩
- 集合 Pdt \text{Pdt} Pdt中 ∑ φ ( p j ) = k 1 ∣ P ∩ φ – 1 ( k ) ∣ ⟨ q i , p j ⟩ \displaystyle{}\sum_{\varphi(p_j)\text{=}k}\cfrac{1}{\left|P\text{∩}\varphi^{–1}(k)\right|}\langle{q_i,p_j}\rangle φ(pj)=k∑∣P∩φ–1(k)∣1⟨qi,pj⟩是子集均值, ⟨ q i , p q i ⟩ \langle{q_i,p_{q_i}}\rangle ⟨qi,pqi⟩是全集最大值,后者必然更大
- 所以 ⟨ q i , p q i ∗ ⟩ = ∑ φ ( p j ) = k 1 ∣ P ∩ φ – 1 ( k ) ∣ ⟨ q i , p j ⟩ ≤ ⟨ q i , p q i ⟩ \langle{q_i,p_{q_i}^*}\rangle\text{=}\displaystyle{}\sum_{\varphi(p_j)\text{=}k}\cfrac{1}{\left|P\text{∩}\varphi^{–1}(k)\right|}\langle{q_i,p_j}\rangle\text{≤}\langle{q_i,p_{q_i}}\rangle ⟨qi,pqi∗⟩=φ(pj)=k∑∣P∩φ–1(k)∣1⟨qi,pj⟩≤⟨qi,pqi⟩
➡️两种相似评分的对比
![]()
- Chamfer \text{Chamfer} Chamfer相似度:根据定义为 Chamfer ( Q , P ) = ∑ i =1 n ⟨ q i , p q i ⟩ = ⟨ Q , P ⟩ \text{Chamfer}(Q,P)\text{=}\displaystyle{\sum_{i\text{=1}}^n}\langle{q_i,p_{q_i}}\rangle\text{=}\langle{Q,P}\rangle Chamfer(Q,P)=i=1∑n⟨qi,pqi⟩=⟨Q,P⟩
- FDE \text{FDE} FDE相似度:根据定义没经过投影和重复的 FDE \text{FDE} FDE相似度可写作 FDE ( Q , P ) = ∑ k = 1 B ⟨ q ⃗ ( k ) , p ⃗ ( k ) ⟩ = ∑ i =1 n ⟨ q i , p q i ∗ ⟩ = ⟨ Q , P ′ ⟩ \text{FDE}(Q,P)\text{=}\displaystyle\sum_{k\text{=}1}^{B}\langle{\vec{q}_{(k)},\vec{p}_{(k)}}\rangle\text{=}\sum_{i\text{=1}}^n\langle{q_i,p_{q_i}^*}\rangle\text{=}\langle{Q,P^\prime}\rangle FDE(Q,P)=k=1∑B⟨q(k),p(k)⟩=i=1∑n⟨qi,pqi∗⟩=⟨Q,P′⟩
- 已证了 ⟨ q i , p q i ∗ ⟩ ≤ ⟨ q i , p q i ⟩ \langle{q_i,p_{q_i}^*}\rangle\text{≤}\langle{q_i,p_{q_i}}\rangle ⟨qi,pqi∗⟩≤⟨qi,pqi⟩故 FDE ( Q , P ) ≤Chamfer ( Q , P ) \text{FDE}(Q,P)\text{≤}\text{Chamfer}(Q,P) FDE(Q,P)≤Chamfer(Q,P)
2.2. \textbf{2.2. } 2.2. 加上一定维度的投影后原上界近似成立
➡️ Muvera \text{Muvera} Muvera分桶 + \text{+} +投影的过程
![]()
- 同样首先相当于为每个 q i ∈ Q q_i\text{∈}Q qi∈Q都映射一个 p ⃗ ( k q i ) = p q i ∗ ∈ { p ⃗ ( 1 ) , . . . , p ⃗ ( B ) } \vec{p}_{(k_{q_i})}\text{=}p_{q_i}^*\text{∈}\{\vec{p}_{(1)},...,\vec{p}_{(B)}\} p(kqi)=pqi∗∈{p(1),...,p(B)},但是后续参与交互(内积)的从原始向量变为了投影向量
- 所以原有的 FDE ( Q , P ) \text{FDE}(Q,P) FDE(Q,P)为直接将内积 ⟨ q i , p q i ∗ ⟩ \left\langle{q_i,p^*_{q_i}}\right\rangle ⟨qi,pqi∗⟩累加,现在变成将各自投影后的内积 ⟨ ψ ( q i ) , ψ ( p q i ∗ ) ⟩ \left\langle{\psi(q_i),\psi\left(p^*_{q_i}\right)}\right\rangle ⟨ψ(qi),ψ(pqi∗)⟩累加
- FDE \text{FDE} FDE相似度(无投影):可写作 FDE orgn ( Q , P ) = ∑ k = 1 B ⟨ q ⃗ ( k ) , p ⃗ ( k ) ⟩ = ∑ i =1 n ⟨ q i , p q i ∗ ⟩ \text{FDE}_{\text{orgn}}(Q,P)\text{=}\displaystyle\sum_{k\text{=}1}^{B}\langle{\vec{q}_{(k)},\vec{p}_{(k)}}\rangle\text{=}\sum_{i\text{=1}}^n\langle{q_i,p_{q_i}^*}\rangle FDEorgn(Q,P)=k=1∑B⟨q(k),p(k)⟩=i=1∑n⟨qi,pqi∗⟩,并且已证明 FDE orgn ( Q , P ) ≤Chamfer ( Q , P ) \text{FDE}_{\text{orgn}}(Q,P)\text{≤}\text{Chamfer}(Q,P) FDEorgn(Q,P)≤Chamfer(Q,P)
- FDE \text{FDE} FDE相似度(无投影):可写作 FDE proj ( Q , P ) = ∑ k = 1 B ⟨ ψ ( q ⃗ ( k ) ) , ψ ( p ⃗ ( k ) ) ⟩ = ∑ i =1 n ⟨ ψ ( q i ) , ψ ( p q i ∗ ) ⟩ \text{FDE}_{\text{proj}}(Q,P)\text{=}\displaystyle\sum_{k\text{=}1}^{B}\left\langle{\psi\left(\vec{q}_{(k)}\right),\psi\left(\vec{p}_{(k)}\right)}\right\rangle\text{=}\sum_{i\text{=1}}^n\left\langle{\psi(q_i),\psi\left(p^*_{q_i}\right)}\right\rangle FDEproj(Q,P)=k=1∑B⟨ψ(q(k)),ψ(p(k))⟩=i=1∑n⟨ψ(qi),ψ(pqi∗)⟩
➡️事实 A2 \text{A2} A2表明了:给定 d proj = O ( 1 ε 2 log m δ ) d_{\text{proj}}\text{=}O\left(\cfrac{1}{\varepsilon^{2}}\log\cfrac{m}{\delta}\right) dproj=O(ε21logδm)则 ⟨ x , y ⟩ – ε ≤ ⟨ ψ ( x ) , ψ ( y ) ⟩ ≤ ⟨ x , y ⟩ + ε \langle{}x,y\rangle\text{–}\varepsilon\text{≤}\langle\psi(x),\psi(y)\rangle{}\text{≤}\langle{}x,y\rangle\text{+}\varepsilon ⟨x,y⟩–ε≤⟨ψ(x),ψ(y)⟩≤⟨x,y⟩+ε以 Pr≥ 1 – δ \text{Pr}\text{≥}1\text{–}\delta Pr≥1–δ概率成立
- 将 d proj = O ( 1 ε 2 log m δ ) d_{\text{proj}}\text{=}O\left(\cfrac{1}{\varepsilon^{2}}\log\cfrac{m}{\delta}\right) dproj=O(ε21logδm)作为前提引入,代入结论有 ∑ i =1 n ⟨ q i , p q i ∗ ⟩ – ∣ Q ∣ ε ≤ ∑ i =1 n ⟨ ψ ( q i ) , ψ ( p q i ∗ ) ⟩ ≤ ∑ i =1 n ⟨ q i , p q i ∗ ⟩ + ∣ Q ∣ ε \displaystyle\sum_{i\text{=1}}^n\langle{q_i,p_{q_i}^*}\rangle\text{–}|Q|\varepsilon\text{≤}\sum_{i\text{=1}}^n\left\langle{\psi(q_i),\psi\left(p^*_{q_i}\right)}\right\rangle\text{≤}\sum_{i\text{=1}}^n\langle{q_i,p_{q_i}^*}\rangle\text{+}|Q|\varepsilon i=1∑n⟨qi,pqi∗⟩–∣Q∣ε≤i=1∑n⟨ψ(qi),ψ(pqi∗)⟩≤i=1∑n⟨qi,pqi∗⟩+∣Q∣ε
- 概率调整为,以 Pr≥ ( 1 – δ ) n ≥ 1 – ∣ Q ∣ δ \text{Pr}\text{≥}(1\text{–}\delta)^n\text{≥}1\text{–}|Q|\delta Pr≥(1–δ)n≥1–∣Q∣δ概率成立
- 也可写作 FDE orgn ( Q , P ) – ∣ Q ∣ ε ≤ FDE proj ( Q , P ) ≤ FDE orgn ( Q , P ) + ∣ Q ∣ ε \displaystyle\text{FDE}_{\text{orgn}}(Q,P)\text{–}|Q|\varepsilon\text{≤}\text{FDE}_{\text{proj}}(Q,P)\text{≤}\text{FDE}_{\text{orgn}}(Q,P)\text{+}|Q|\varepsilon FDEorgn(Q,P)–∣Q∣ε≤FDEproj(Q,P)≤FDEorgn(Q,P)+∣Q∣ε
- 稍作调整,则有 FDE proj ( Q , P ) ≤ FDE orgn ( Q , P ) + ∣ Q ∣ ε ≤Chamfer ( Q , P ) + ∣ Q ∣ ε \text{FDE}_{\text{proj}}(Q,P)\text{≤}\text{FDE}_{\text{orgn}}(Q,P)\text{+}|Q|\varepsilon\text{≤}\text{Chamfer}(Q,P)\text{+}|Q|\varepsilon FDEproj(Q,P)≤FDEorgn(Q,P)+∣Q∣ε≤Chamfer(Q,P)+∣Q∣ε,以 Pr≥ 1 – ∣ Q ∣ δ \text{Pr}\text{≥}1\text{–}|Q|\delta Pr≥1–∣Q∣δ概率成立
- 再按常数 Q Q Q的比例调整 δ \delta δ(注意 ε \varepsilon ε为与后续统一故不调整),则有 FDE proj ( Q , P ) ≤Chamfer ( Q , P ) + ∣ Q ∣ ε \text{FDE}_{\text{proj}}(Q,P)\text{≤}\text{Chamfer}(Q,P)\text{+}|Q|\varepsilon FDEproj(Q,P)≤Chamfer(Q,P)+∣Q∣ε,以 Pr≥ 1 – δ \text{Pr}\text{≥}1\text{–}\delta Pr≥1–δ概率成立
2.3. \textbf{2.3. } 2.3. 已给出无重复 FDE \textbf{FDE} FDE上界👉还需证明什么才能给出无重复 FDE \textbf{FDE} FDE的上下界
➡️不妨假设需要证明的是 Chamfer ( Q , P ) – ∣ Q ∣ ε ≤FDE ( Q , P ) ≤Chamfer ( Q , P ) + ∣ Q ∣ ε \text{Chamfer}(Q,P)\text{–}|Q|\varepsilon\text{≤}\text{FDE}(Q,P)\text{≤}\text{Chamfer}(Q,P)\text{+}|Q|\varepsilon Chamfer(Q,P)–∣Q∣ε≤FDE(Q,P)≤Chamfer(Q,P)+∣Q∣ε以 Pr≥ 1 – δ \text{Pr}\text{≥}1\text{–}\delta Pr≥1–δ概率成立
- 令下界成立为事件 F L / F_L/ FL/上界成立为事件 F U / F_U/ FU/上下界都成立为事件 F F F,原问题变为要证明 Pr [ F ] ≥ 1 – δ \text{Pr}[F]\text{≥}1\text{–}\delta Pr[F]≥1–δ
- 不难得到当 ε ≤ 1 2 \varepsilon{}\text{≤}\cfrac{1}{2} ε≤21时,同时满足 Pr [ F L ] ≥ 1 – ε δ \text{Pr}[F_L]\text{≥}1\text{–}\varepsilon\delta Pr[FL]≥1–εδ和 Pr [ F U ] ≥ 1 – ε δ \text{Pr}[F_U]\text{≥}1\text{–}\varepsilon\delta Pr[FU]≥1–εδ,可推导出 Pr [ F ] ≥ 1 – δ \text{Pr}[F]\text{≥}1\text{–}\delta Pr[F]≥1–δ
- 由已证明的上界结论可知设定 d proj = O ( 1 ε 2 log ( m ε δ ) ) d_{\text{proj}}\text{=}O\left(\cfrac{1}{\varepsilon^2}\log{\left(\cfrac{m}{\varepsilon\delta}\right)}\right) dproj=O(ε21log(εδm))则有 Pr [ F U ] ≥ 1 – ε δ \text{Pr}[F_U]\text{≥}1\text{–}\varepsilon\delta Pr[FU]≥1–εδ
- 所以增加前提 d proj = O ( 1 ε 2 log ( m ε δ ) ) d_{\text{proj}}\text{=}O\left(\cfrac{1}{\varepsilon^2}\log{\left(\cfrac{m}{\varepsilon\delta}\right)}\right) dproj=O(ε21log(εδm))以及 ε ≤ 1 2 \varepsilon{}\text{≤}\cfrac{1}{2} ε≤21后,原问题变为要证明 Pr [ F L ] ≥ 1 – ε δ \text{Pr}\left[F_{L}\right]\text{≥}1\text{–}\varepsilon\delta Pr[FL]≥1–εδ
➡️对 Chamfer ( Q , P ) – ∣ Q ∣ ε ≤FDE ( Q , P ) \text{Chamfer}(Q,P)\text{–}|Q|\varepsilon\text{≤}\text{FDE}(Q,P) Chamfer(Q,P)–∣Q∣ε≤FDE(Q,P)的进一步变换
![]()
- 根据 Chamfer \text{Chamfer} Chamfer的定义有 Chamfer ( Q , P ) = ∑ q i ∈ Q MaxSim ( q i , P ) \text{Chamfer}(Q,P)\text{=}\displaystyle{}\sum_{q_i\text{∈}Q}\text{MaxSim}(q_i,P) Chamfer(Q,P)=qi∈Q∑MaxSim(qi,P)
- 又由于 FDE \text{FDE} FDE对 q i q_i qi的处理式独立线性的,所以 FDE ( Q , P ) = ∑ q i ∈ Q FDE ( q i , P ) \text{FDE}(Q,P)\text{=}\displaystyle{}\sum_{q_i\text{∈}Q}\text{FDE}(q_i,P) FDE(Q,P)=qi∈Q∑FDE(qi,P)
- 所以原问题变为需证 ∑ q i ∈ Q ( MaxSim ( q i , P ) – ε ) ≤ ∑ q i ∈ Q FDE ( q i , P ) \displaystyle{}\sum_{q_i\text{∈}Q}\left(\text{MaxSim}(q_i,P)\text{–}\varepsilon\right)\text{≤}\sum_{q_i\text{∈}Q}\text{FDE}(q_i,P) qi∈Q∑(MaxSim(qi,P)–ε)≤qi∈Q∑FDE(qi,P),以 Pr≥ 1 – ε δ \text{Pr}\text{≥}1\text{–}\varepsilon\delta Pr≥1–εδ概率成立
- 去掉求和号后,原问题变为需证 MaxSim ( q i , P ) – ε ≤FDE ( q i , P ) \text{MaxSim}(q_i,P)\text{–}\varepsilon\text{≤}\text{FDE}(q_i,P) MaxSim(qi,P)–ε≤FDE(qi,P),以 Pr≥ 1 – ε δ ∣ Q ∣ \text{Pr}\text{≥}1–\cfrac{\varepsilon\delta}{|Q|} Pr≥1–∣Q∣εδ概率成立
➡️后续称 FDE ( q i , P ) \text{FDE}(q_i,P) FDE(qi,P)的上下界为 FDE \text{FDE} FDE的子上下界
2.4. \textbf{2.4. } 2.4. 无投影无重复时 FDE \textbf{FDE} FDE相似度的子下界
➡️结合之前的分析,对于 FDE ( q i , P ) = ⟨ q i , p q i ∗ ⟩ \displaystyle\text{FDE}(q_i,P)\text{=}\left\langle{q_i,p_{q_i}^*}\right\rangle FDE(qi,P)=⟨qi,pqi∗⟩,在无投影时总存在一个子集 S ⊆ P S\text{⊆}P S⊆P,使得 p q i ∗ = 1 ∣ S ∣ ∑ p j ∈ S p j = S center p_{q_i}^*\text{=}\displaystyle{}\cfrac{1}{|S|}\sum_{p_j\text{∈}S}p_j\text{=}S_{\text{center}} pqi∗=∣S∣1pj∈S∑pj=Scenter
![]()
- 当单个 q i q_i qi落入的桶属于 Case-0 \text{Case-0} Case-0情形时, S S S是单个点 p ^ k = arg min p j ∈ P ∥ φ ( q i ) – φ ( p j ) ∥ 0 \displaystyle{}\hat{p}_k\text{=}\arg{}\min_{p_j\text{∈}P}\|\varphi(q_i)–\varphi(p_j)\|_{0} p^k=argpj∈Pmin∥φ(qi)–φ(pj)∥0,也就是离桶编码 φ ( q i ) \varphi(q_i) φ(qi)海明距离最近的点
- 当单个 q i q_i qi落入的桶属于 Case-n \text{Case-n} Case-n情形时, S S S元素需满足 φ ( q i ) = φ ( p j ) \varphi(q_i)\text{=}\varphi(p_j) φ(qi)=φ(pj),也就是所有落入该桶的 q j q_j qj点
➡️对事实 A4 \text{A4} A4给定 k sim = O ( 1 ε log ( m δ ) ) k_{\text{sim}}\text{=}O\left(\cfrac{1}{\varepsilon}\log{\left(\cfrac{m}{\delta}\right)}\right) ksim=O(ε1log(δm))时 ∀ p j ∈ P \forall{p_j}\text{∈}P ∀pj∈P有 ∣ ∥ φ ( q i ) – φ ( p j ) ∥ 0 k s i m – θ ( q i , p j ) π ∣ ≤ ε \left|\cfrac{\|\varphi(q_i)–\varphi(p_j)\|_0}{k_{\mathrm{sim}}}\text{–}\cfrac{\theta(q_i,p_j)}{\pi}\right|\text{≤}\sqrt{\varepsilon} ksim∥φ(qi)–φ(pj)∥0–πθ(qi,pj) ≤ε以 Pr> 1 – ε δ m 2 \text{Pr}\text{>}1–\cfrac{\varepsilon{}\delta{}}{m^2} Pr>1–m2εδ概率成立
![]()
- 对于 q i {q_i} qi和 p q 1 ∗ p^*_{q_1} pq1∗还有 p q 1 p_{q_1} pq1,总是会有 ∥ φ ( q i ) – φ ( p q 1 ∗ ) ∥ 0 k s i m ≤ ∥ φ ( q i ) – φ ( p q 1 ) ∥ 0 k s i m \cfrac{\|\varphi(q_i)–\varphi(p^*_{q_1})\|_0}{k_{\mathrm{sim}}}\text{≤}\cfrac{\|\varphi(q_i)–\varphi(p_{q_1})\|_0}{k_{\mathrm{sim}}} ksim∥φ(qi)–φ(pq1∗)∥0≤ksim∥φ(qi)–φ(pq1)∥0
- 对于 Case-0 \text{Case-0} Case-0情形, p q i ∗ p^*_{q_i} pqi∗为离编号为 φ ( q i ) \varphi(q_i) φ(qi)的桶海明距离最近的 p j p_j pj向量,所以 ∥ φ ( q i ) – φ ( p q i ∗ ) ∥ 0 ≤ ∥ φ ( q i ) – φ ( p q i ) ∥ 0 {\|\varphi(q_i)–\varphi(p^*_{q_i})\|_0}\text{≤}{\|\varphi(q_i)–\varphi(p_{q_i})\|_0} ∥φ(qi)–φ(pqi∗)∥0≤∥φ(qi)–φ(pqi)∥0
- 对于 Case-n \text{Case-n} Case-n情形, p q i ∗ p^*_{q_i} pqi∗为与 q i q_i qi同桶的所有 p j p_j pj向量的质心,即 S = { p j ∈ P ∣ φ ( q i ) = φ ( p j ) } S\text{=}\left\{p_j\text{∈}P|{\varphi}(q_i)\text{=}\varphi(p_j)\right\} S={pj∈P∣φ(qi)=φ(pj)}的质心 S center S_{\text{center}} Scenter
- 回忆 SimHash \text{SimHash} SimHash分桶的本质,就是用 k sim k_\text{sim} ksim个法平面将空间切割为 2 k sim 2^{k_\text{sim}} 2ksim个子空间
- 所有 p j ∈ S p_j\text{∈}S pj∈S都落入同一桶即在同一子空间,则其质心 p q i ∗ = S center p^*_{q_i}\text{=}S_{\text{center}} pqi∗=Scenter也一定在该子空间中,所以 φ ( q i ) = φ ( p j ) = φ ( p q i ∗ ) \varphi(q_i)\text{=}\varphi(p_j)\text{=}\varphi(p^*_{q_i}) φ(qi)=φ(pj)=φ(pqi∗)
- 所以 ∥ φ ( q i ) – φ ( p q i ∗ ) ∥ 0 = 0 ≤ ∥ φ ( q i ) – φ ( p q i ) ∥ 0 {\|\varphi(q_i)–\varphi(p^*_{q_i})\|_0}\text{=}0\text{≤}{\|\varphi(q_i)–\varphi(p_{q_i})\|_0} ∥φ(qi)–φ(pqi∗)∥0=0≤∥φ(qi)–φ(pqi)∥0
- p q 1 p_{q_1} pq1和 p q 1 ∗ p^*_{q_1} pq1∗都代入事实 A4 \text{A4} A4,则 θ ( q i , p q 1 ∗ ) π – ε ≤ ∥ φ ( q i ) – φ ( p q 1 ∗ ) ∥ 0 k s i m ≤ ∥ φ ( q i ) – φ ( p q i ) ∥ 0 k s i m ≤ θ ( q i , p q i ) π + ε \cfrac{\theta({q_i,p^*_{q_1}})}{\pi}–\sqrt{\varepsilon}\text{≤}\cfrac{\|\varphi(q_i)–\varphi(p^*_{q_1})\|_0}{k_{\mathrm{sim}}}\text{≤}\cfrac{\|\varphi(q_i)–\varphi(p_{q_i})\|_0}{k_{\mathrm{sim}}}\text{≤}\cfrac{\theta({q_i,p_{q_i}})}{\pi}\text{+}\sqrt{\varepsilon} πθ(qi,pq1∗)–ε≤ksim∥φ(qi)–φ(pq1∗)∥0≤ksim∥φ(qi)–φ(pqi)∥0≤πθ(qi,pqi)+ε以 Pr> 1 – ε δ m 2 \text{Pr}\text{>}1–\cfrac{\varepsilon{}\delta{}}{m^2} Pr>1–m2εδ概率成立
- 因此 ∣ θ ( q i , p q 1 ∗ ) – θ ( q i , p q i ) ∣ = O ( ε ) \left|\theta(q_i,p^*_{q_1})–\theta({q_i,p_{q_i}})\right|\text{=}O\left(\sqrt{\varepsilon}\right) θ(qi,pq1∗)–θ(qi,pqi) =O(ε)以 Pr> 1 – ε δ m 2 \text{Pr}\text{>}1–\cfrac{\varepsilon{}\delta{}}{m^2} Pr>1–m2εδ概率成立
- 上式可得 ∣ cos θ ( q i , p q 1 ∗ ) – cos θ ( q i , p q i ) ∣ < O ( ε ) \left|\cos\theta(q_i,p^*_{q_1})–\cos\theta({q_i,p_{q_i}})\right|\text{<}O(\varepsilon) cosθ(qi,pq1∗)–cosθ(qi,pqi) <O(ε)以 Pr> 1 – ε δ m 2 \text{Pr}\text{>}1–\cfrac{\varepsilon{}\delta{}}{m^2} Pr>1–m2εδ成立,其中 { cos θ ( q i , p q 1 ∗ ) = ⟨ q i , p q 1 ∗ ⟩ =FDE ( q i , P ) cos θ ( q i , p q i ) = ⟨ q i , p q i ⟩ =MaxSim ( q i , P ) \begin{cases}\cos\theta(q_i,p^*_{q_1})\text{=}\langle{q_i,p^*_{q_1}}\rangle\text{=}\text{FDE}(q_i,P)\\\\\cos\theta({q_i,p_{q_i}})\text{=}\langle{q_i,p_{q_i}}\rangle\text{=}\text{MaxSim}(q_i,P)\end{cases} ⎩ ⎨ ⎧cosθ(qi,pq1∗)=⟨qi,pq1∗⟩=FDE(qi,P)cosθ(qi,pqi)=⟨qi,pqi⟩=MaxSim(qi,P)
- 所以 MaxSim ( q i , P ) – O ( ε ) ≤FDE ( q i , P ) ≤MaxSim ( q i , P ) + O ( ε ) \text{MaxSim}(q_i,P)–O(\varepsilon)\text{≤}\text{FDE}(q_i,P)\text{≤}\text{MaxSim}(q_i,P)\text{+}O(\varepsilon) MaxSim(qi,P)–O(ε)≤FDE(qi,P)≤MaxSim(qi,P)+O(ε) 以 Pr> 1 – ε δ m 2 \text{Pr}\text{>}1–\cfrac{\varepsilon{}\delta{}}{m^2} Pr>1–m2εδ概率成立
➡️事实 A4 \text{A4} A4给出了 FDE ( q i , P ) \text{FDE}(q_i,P) FDE(qi,P)的子上下界
- 取结论的左半边 MaxSim ( q i , P ) – O ( ε ) ≤FDE ( q i , P ) \text{MaxSim}(q_i,P)–O(\varepsilon)\text{≤}\text{FDE}(q_i,P) MaxSim(qi,P)–O(ε)≤FDE(qi,P)并给概率放水 Pr> 1 – ε δ m 2 > 1 – ε δ ∣ Q ∣ \text{Pr}\text{>}1–\cfrac{\varepsilon{}\delta{}}{m^2}\text{>}1–\cfrac{\varepsilon\delta}{|Q|} Pr>1–m2εδ>1–∣Q∣εδ,便得到了一个更弱的结论
- 再对 O ( ε ) O(\varepsilon) O(ε)进行常数缩放为 ε \varepsilon ε,该更弱的结论就变成了所要证的 FDE \text{FDE} FDE相似度的子下界
2.5. \textbf{2.5. } 2.5. 加上一定维度的投影后原子下界近似成立
➡️投影前后的对比:用 FDE orgn \text{FDE}_{\text{orgn}} FDEorgn和 FDE proj \text{FDE}_{\text{proj}} FDEproj表示区分
![]()
- 投影前为 FDE orgn ( q i , P ) = ⟨ q i , p q i ∗ ⟩ = ⟨ q i , 1 ∣ S ∣ ∑ p j ∈ S p j ⟩ = 1 ∣ S ∣ ∑ p j ∈ S ⟨ q i , p j ⟩ \displaystyle{}\text{FDE}_{\text{orgn}}(q_i,P)\text{=}\left\langle{q_i,p_{q_i}^*}\right\rangle\text{=}\left\langle{}q_i,\cfrac{1}{|S|}\sum_{p_j\text{∈}S}p_j\right\rangle{}\text{=}\cfrac{1}{|S|}\sum_{p_j\text{∈}S}{\langle{}q_i,p_j\rangle} FDEorgn(qi,P)=⟨qi,pqi∗⟩=⟨qi,∣S∣1pj∈S∑pj⟩=∣S∣1pj∈S∑⟨qi,pj⟩
- 投影后为 FDE proj ( q i , P ) = ⟨ ψ ( q i ) , ψ ( p q i ∗ ) ⟩ = ⟨ ψ ( q i ) , ψ ( 1 ∣ S ∣ ∑ p j ∈ S p j ) ⟩ = 1 ∣ S ∣ ∑ p j ∈ S ⟨ ψ ( q i ) , ψ ( p j ) ⟩ \displaystyle{}\text{FDE}_{\text{proj}}(q_i,P)\text{=}\left\langle{\psi(q_i),\psi\left(p_{q_i}^*\right)}\right\rangle\text{=}\left\langle{}{\psi}(q_i),{\psi}\left(\cfrac{1}{|S|}\sum_{p_j\text{∈}S}p_j\right)\right\rangle{}\text{=}\cfrac{1}{|S|}\sum_{p_j\text{∈}S}\left\langle{\psi(q_i),\psi\left(p_j\right)}\right\rangle FDEproj(qi,P)=⟨ψ(qi),ψ(pqi∗)⟩=⟨ψ(qi),ψ ∣S∣1pj∈S∑pj ⟩=∣S∣1pj∈S∑⟨ψ(qi),ψ(pj)⟩
➡️应用事实 A2 \text{A2} A2:用 ε δ ∣ Q ∣ \cfrac{\varepsilon\delta}{|Q|} ∣Q∣εδ替代 δ \delta δ以设定 d proj = O ( 1 ε 2 log ( ∣ Q ∣ ε δ ) ) = O ( 1 ε 2 log ( m ε δ ) ) d_{\text{proj}}\text{=}O\left(\cfrac{1}{\varepsilon^{2}}\log\left(\cfrac{|Q|}{\varepsilon\delta}\right)\right)\text{=}O\left(\cfrac{1}{\varepsilon^{2}}\log\left(\cfrac{m}{\varepsilon\delta}\right)\right) dproj=O(ε21log(εδ∣Q∣))=O(ε21log(εδm))
- 则有内积的偏差 ⟨ x , y ⟩ – ε ≤ ⟨ ψ ( x ) , ψ ( y ) ⟩ ≤ ⟨ x , y ⟩ + ε \langle{}x,y\rangle\text{–}\varepsilon\text{≤}\langle\psi(x),\psi(y)\rangle{}\text{≤}\langle{}x,y\rangle\text{+}\varepsilon ⟨x,y⟩–ε≤⟨ψ(x),ψ(y)⟩≤⟨x,y⟩+ε以 Pr≥ 1 – ε δ ∣ Q ∣ \text{Pr}\text{≥}1\text{–}\cfrac{\varepsilon\delta}{|Q|} Pr≥1–∣Q∣εδ概率成立概率成立
- 将事实 A2 \text{A2} A2代回则有 1 ∣ S ∣ ∑ p j ∈ S ⟨ q i , p j ⟩ – ε ≤ 1 ∣ S ∣ ∑ p j ∈ S ⟨ ψ ( q i ) , ψ p j ⟩ ≤ 1 ∣ S ∣ ∑ p j ∈ S ⟨ q i , p j ⟩ + ε \displaystyle{}\cfrac{1}{|S|}\sum_{p_j\text{∈}S}{\langle{}q_i,p_j\rangle}\text{–}\varepsilon\text{≤}\cfrac{1}{|S|}\sum_{p_j\text{∈}S}\left\langle{\psi(q_i),\psi{}p_j}\right\rangle\text{≤}\cfrac{1}{|S|}\sum_{p_j\text{∈}S}{\langle{}q_i,p_j\rangle}\text{+}\varepsilon ∣S∣1pj∈S∑⟨qi,pj⟩–ε≤∣S∣1pj∈S∑⟨ψ(qi),ψpj⟩≤∣S∣1pj∈S∑⟨qi,pj⟩+ε,以 Pr≥ 1 – ε δ ∣ Q ∣ \text{Pr}\text{≥}1\text{–}\cfrac{\varepsilon\delta}{|Q|} Pr≥1–∣Q∣εδ概率成立概率成立
- 其中 FDE orgn ( q i , P ) = 1 ∣ S ∣ ∑ p j ∈ S ⟨ q i , p j ⟩ \displaystyle{}\text{FDE}_{\text{orgn}}(q_i,P)\text{=}\cfrac{1}{|S|}\sum_{p_j\text{∈}S}{\langle{}q_i,p_j\rangle} FDEorgn(qi,P)=∣S∣1pj∈S∑⟨qi,pj⟩,以及 FDE proj ( q i , P ) = 1 ∣ S ∣ ∑ p j ∈ S ⟨ ψ ( q i ) , ψ ( p j ) ⟩ \displaystyle{}\text{FDE}_{\text{proj}}(q_i,P)\text{=}\cfrac{1}{|S|}\sum_{p_j\text{∈}S}\left\langle{\psi(q_i),\psi\left(p_j\right)}\right\rangle FDEproj(qi,P)=∣S∣1pj∈S∑⟨ψ(qi),ψ(pj)⟩
- 所以 FDE orgn ( q i , P ) – ε ≤ FDE proj ( q i , P ) ≤ FDE orgn ( q i , P ) + ε \displaystyle{}\text{FDE}_{\text{orgn}}(q_i,P)\text{–}\varepsilon\text{≤}\text{FDE}_{\text{proj}}(q_i,P)\text{≤}\text{FDE}_{\text{orgn}}(q_i,P)\text{+}\varepsilon FDEorgn(qi,P)–ε≤FDEproj(qi,P)≤FDEorgn(qi,P)+ε,以 Pr≥ 1 – ε δ ∣ Q ∣ \text{Pr}\text{≥}1\text{–}\cfrac{\varepsilon\delta}{|Q|} Pr≥1–∣Q∣εδ概率成立概率成立
- 已证 MaxSim ( q i , P ) – O ( ε ) ≤ FDE proj ( q i , P ) ≤MaxSim ( q i , P ) + O ( ε ) \text{MaxSim}(q_i,P)–O(\varepsilon)\text{≤}\text{FDE}_{\text{proj}}(q_i,P)\text{≤}\text{MaxSim}(q_i,P)\text{+}O(\varepsilon) MaxSim(qi,P)–O(ε)≤FDEproj(qi,P)≤MaxSim(qi,P)+O(ε)以 Pr≥ 1 – ε δ ∣ Q ∣ \text{Pr}\text{≥}1–\cfrac{\varepsilon\delta}{|Q|} Pr≥1–∣Q∣εδ概率成立
- 代入得 FDE proj ( q i , P ) ≥ FDE orgn ( q i , P ) – ε ≥ MaxSim ( q i ) – ε – O ( ε ) \displaystyle\text{FDE}_{\text{proj}}(q_i,P)\text{≥}\text{FDE}_{\text{orgn}}(q_i,P)\text{–}\varepsilon\text{≥}\displaystyle{}\text{MaxSim}(q_i)\text{–}\varepsilon–O(\varepsilon) FDEproj(qi,P)≥FDEorgn(qi,P)–ε≥MaxSim(qi)–ε–O(ε)
- 对 ε \varepsilon ε进行常数因子缩放后,即证毕
3. \textbf{3. } 3. 定理 2.2 \textbf{2.2} 2.2证明的思路
3.0. \textbf{3.0. } 3.0. 定理 2.2 \textbf{2.2} 2.2的主要内容
👉条件 1 1 1:给定单个查询 Q Q Q以及多个段落 P = { P 1 , … , P n } P\text{=}\left\{P_{1},\ldots,P_{n}\right\} P={P1,…,Pn}并且 Q , ∀ P i ⊆ R d Q,\forall{}P_i\text{⊆}\mathbb{R}^{d} Q,∀Pi⊆Rd,并令 m = ∣ Q ∣ + max i ∈ [ n ] ∣ P i ∣ \displaystyle{}m\text{=}|Q|\text{+}\max_{i\text{∈}[n]}\left|P_{i}\right| m=∣Q∣+i∈[n]max∣Pi∣
👉条件 2 2 2:给定 ∀ ε > 0 \forall\varepsilon\text{>}0 ∀ε>0,设置参数 k sim = O ( log m ε ) , d proj = O ( 1 ε 2 log ( m ε ) ) , R reps = O ( 1 ε 2 log n ) k_{\text{sim}}\text{=}O\left(\cfrac{\log{m}}{\varepsilon}\right),d_{\text {proj}}\text{=}O\left(\cfrac{1}{\varepsilon^{2}}\log\left(\cfrac{m}{\varepsilon}\right)\right),R_{\text{reps}}\text{=}O\left(\cfrac{1}{\varepsilon^{2}}\log{n}\right) ksim=O(εlogm),dproj=O(ε21log(εm)),Rreps=O(ε21logn)
👉条件 3 3 3:令 i ∗ = arg max i ∈ [ n ] FDE ( Q , P i ) \displaystyle{}i^{*}\text{=}\arg\max_{i\text{∈}[n]}\text{FDE}(Q,P_i) i∗=argi∈[n]maxFDE(Q,Pi),即 P i ∗ P_{i^*} Pi∗是通过 Muvera \text{Muvera} Muvera方法找到的,与查询 Q Q Q最相似的段落
👉结论 1 1 1: 1 ∣ Q ∣ Chamfer ( Q , P i ∗ ) ≥ max i ∈ [ n ] 1 ∣ Q ∣ Chamfer ( Q , P i ) – ε \displaystyle{}\cfrac{1}{|Q|}\text{Chamfer}\left(Q, P_{i^{*}}\right)\text{≥}\max_{i\text{∈}[n]}\cfrac{1}{|Q|}\text{Chamfer}\left(Q, P_{i}\right)–\varepsilon ∣Q∣1Chamfer(Q,Pi∗)≥i∈[n]max∣Q∣1Chamfer(Q,Pi)–ε以 Pr= 1 – 1 poly ( n ) \text{Pr=}1\text{–}\cfrac{1}{\text{poly}(n)} Pr=1–poly(n)1概率成立
3.1. \textbf{3.1. } 3.1. 定理 2.2 \textbf{2.2} 2.2的主要内容
➡️在考虑重复 R reps R_{\text{reps}} Rreps次的情况下,对于每个重复 k ∈ [ R reps ] k\text{∈}\left[R_{\text{reps}}\right] k∈[Rreps],设定每次重复对最终相似度的贡献为 FDE k ( Q , P α ) \text{FDE}^k(Q,P_\alpha) FDEk(Q,Pα)
- 对于最终相似度,有 FDE ( Q , P α ) = ∑ k = 1 R reps FDE k ( Q , P α ) \text{FDE}(Q,P_\alpha)\text{=}\displaystyle\sum_{k\text{=}1}^{R_{\text{reps}}}\text{FDE}^k(Q,P_\alpha) FDE(Q,Pα)=k=1∑RrepsFDEk(Q,Pα),不妨设定随机变量 X k = 1 ∣ Q ∣ FDE k ( Q , P α ) X_k\text{=}\cfrac{1}{|Q|}\text{FDE}^k(Q,P_\alpha) Xk=∣Q∣1FDEk(Q,Pα)
➡️最关键的一步在于,对 X k X_k Xk尝试运用 Chernoff \text{Chernoff} Chernoff界限,即 ∀ X i ∈ [ a , b ] \forall{X_i}\text{∈}[a,b] ∀Xi∈[a,b]有 Pr [ ∣ 1 R ∑ i = 1 R X i – μ ∣ ≥ ε ] ≤ 2 e ( – 2 R ε 2 ( b – a ) 2 ) \displaystyle{}\text{Pr}\left[\left|\frac{1}{R}\sum_{i=1}^RX_i–\mu\right|\text{≥}\varepsilon\right]\text{≤}2e^{\left(–\frac{2R\varepsilon^2}{(b–a)^2}\right)} Pr[ R1i=1∑RXi–μ ≥ε]≤2e(–(b–a)22Rε2)
- 对 Chernoff \text{Chernoff} Chernoff界限中参数的确定
- 将上式中 R R R视作 R reps R_{\text{reps}} Rreps,并将 R reps = O ( 1 ε 2 log n ) R_{\text{reps}}\text{=}O\left(\cfrac{1}{\varepsilon^{2}}\log{n}\right) Rreps=O(ε21logn)作为前提引入
- 对于 μ \mu μ即均值,根据定理 2.1 \text{2.1} 2.1引入前提 k sim = O ( log m ε ) , d proj = O ( 1 ε 2 log ( m ε ) ) k_{\text{sim}}\text{=}O\left(\cfrac{\log{m}}{\varepsilon}\right),d_{\text {proj}}\text{=}O\left(\cfrac{1}{\varepsilon^{2}}\log\left(\cfrac{m}{\varepsilon}\right)\right) ksim=O(εlogm),dproj=O(ε21log(εm))后,有 E [ X k ] ∈ 1 ∣ Q ∣ Chamfer ( Q , P α ) ± ε \mathbb{E}[X_k]\text{∈}\cfrac{1}{|Q|}\text{Chamfer}(Q,P_\alpha)\text{±}\varepsilon E[Xk]∈∣Q∣1Chamfer(Q,Pα)±ε
- 对于 [ a , b ] [a,b] [a,b]即 X k X_k Xk的范围,不难得到 X k ∈ [ – m , m ] X_k\text{∈}[–m,m] Xk∈[–m,m]
- 将以上参数套用到 Chernoff \text{Chernoff} Chernoff界限则有
- 概率:以 Pr≥ 1 – 2 e ( – R reps ε 2 2 m 2 ) \text{Pr}\text{≥}1–2e^{\left(–\frac{R_{\text{reps}}\varepsilon^2}{2m^2}\right)} Pr≥1–2e(–2m2Rrepsε2)概率成立,可以进一步转化为 Pr≥ 1 – 2 e ( – R reps ε 2 2 m 2 ) ≥ 1 – 2 n C = 1 – 1 poly ( n ) \text{Pr}\text{≥}1–2e^{\left(–\frac{R_{\text{reps}}\varepsilon^2}{2m^2}\right)}\text{≥}1–\cfrac{2}{n^C}\text{=}1\text{–}\cfrac{1}{\text{poly}(n)} Pr≥1–2e(–2m2Rrepsε2)≥1–nC2=1–poly(n)1
- 事件: ∣ ∑ k = 1 R reps X k R reps – 1 ∣ Q ∣ Chamfer ( Q , P α ) ∣ ≤ 2 ε \displaystyle{}\left|\sum_{k\text{=}1}^{R_{\text{reps}}}\cfrac{X_k}{R_{\text{reps}}}–\cfrac{1}{|Q|}\text{Chamfer}(Q,P_\alpha)\right|\text{≤}2\varepsilon k=1∑RrepsRrepsXk–∣Q∣1Chamfer(Q,Pα) ≤2ε
- 代入 FED \text{FED} FED最相似文档 P α ∗ P_{\alpha^*} Pα∗后,有 1 ∣ Q ∣ Chamfer ( Q , P α ∗ ) ≥ 1 ∣ Q ∣ R reps FDE ( Q , P α ∗ ) – 2 ε = max α ∈ [ n ] 1 ∣ Q ∣ R reps FDE ( Q , P α ) – 2 ε \cfrac{1}{|Q|}\text{Chamfer}(Q,P_\alpha^*)\text{≥}\cfrac{1}{|Q|R_{\text{reps}}}\text{FDE}(Q,P_\alpha^*)–2\varepsilon\text{=}\displaystyle{}\max_{\alpha\text{∈}[n]}\cfrac{1}{|Q|R_{\text{reps}}}\text{FDE}(Q,P_\alpha)–2\varepsilon ∣Q∣1Chamfer(Q,Pα∗)≥∣Q∣Rreps1FDE(Q,Pα∗)–2ε=α∈[n]max∣Q∣Rreps1FDE(Q,Pα)–2ε
- 根据定理 2.1 \text{2.1} 2.1又可以知道 max α ∈ [ n ] 1 ∣ Q ∣ R reps FDE ( Q , P α ) ≥ max α ∈ [ n ] 1 ∣ Q ∣ Chamfer ( Q , P α ) – ε \displaystyle{}\max_{\alpha\text{∈}[n]}\cfrac{1}{|Q|R_{\text{reps}}}\text{FDE}(Q,P_\alpha)\text{≥}\max_{\alpha\text{∈}[n]}\cfrac{1}{|Q|}\text{Chamfer}(Q,P_\alpha)\text{–}\varepsilon α∈[n]max∣Q∣Rreps1FDE(Q,Pα)≥α∈[n]max∣Q∣1Chamfer(Q,Pα)–ε
- 所以最终原事件可转化为 1 ∣ Q ∣ Chamfer ( Q , P α ∗ ) ≥ max α ∈ [ n ] 1 ∣ Q ∣ Chamfer ( Q , P α ) – 3 ε \displaystyle{}\cfrac{1}{|Q|}\text{Chamfer}(Q,P_\alpha^*)\text{≥}\max_{\alpha\text{∈}[n]}\cfrac{1}{|Q|}\text{Chamfer}(Q,P_\alpha)\text{–}3\varepsilon ∣Q∣1Chamfer(Q,Pα∗)≥α∈[n]max∣Q∣1Chamfer(Q,Pα)–3ε,对 ε \varepsilon ε做常数倍变换即证毕