当前位置：首页 > article >正文

[论文精读]Pisces: Efficient Federated Learning via GuidedAsynchronous Training

article 2025/2/28 9:59:33

论文网址：Pisces | Proceedings of the 13th Symposium on Cloud Computing

论文代码：GitHub - SamuelGong/Pisces: [ACM SoCC'22] Pisces: Efficient Federated Learning via Guided Asynchronous Training

英文是纯手打的！论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误，若有发现欢迎评论指正！文章偏向于笔记，谨慎食用

1. 心得

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. Background and motivation

2.3.1. Synchronous federated learning

2.3.2. Participant selection

2.4. Pisces overview

2.5. Participant selection

2.5.1. Controlled concurrency

2.5.2. Utility profiling

2.6. Model aggregation

2.6.1. Bounded staleness

2.6.2. Adaptive pace control

2.7. Theoretical analysis

2.7.1. Convergence guarantee

2.7.2. Performance analysis

2.8. Implementation

2.9. Evaluation

2.9.1. Methodology

2.9.2. End-to-end performance

2.9.3. Sensitivity analysis

2.10. Discussion and future work

2.11. Related work

2.12. Conclusion

3. 知识补充

3.1. Oort

3.2. FedBuff

4. Reference

1. 心得

（1）至少从眼缘来看...这篇文章...是比其他某些基础文章要友好很多的...

（2）这是ccf b???吗？？？我感觉写的挺好的呢还是我搜索错了？

2. 论文逐段精读

2.1. Abstract

①Slow clients can slow down the overall server updating speed

stale adj.不新鲜的；陈腐的；(空气)污浊的；老掉牙的；厌倦的，腻烦的；(烟味)难闻的；没有新意的 n.（牛马、骆驼的）尿 v.使陈旧；使没有味道；用坏；变陈旧

2.2. Introduction

①An extreme scenario is that clients with small amounts of data have extremely fast transmission speeds, but clients with rich data transmit very slowly

②Pisces firstly consider the clients with high data quality based on the loss

③Platform: Plato

orchestrate vt.编排；策划；编配(或创作管弦乐曲)；精心安排；密谋

2.3. Background and motivation

2.3.1. Synchronous federated learning

①Sychronous and asynchronous FL, the subset of clients might be randomly chosen:

②Solutions of synchronization performance (time costs) bottleneck: a) periodic aggregation: limits the deadline of transmission, b) over-selection: choose more clients based on time. However, they two ignore the rich information in slow clients

straggler n.掉队者；落伍者；落在最后的人(或动物)

2.3.2. Participant selection

①Goal: balance time and data quality

②Relevant researches on trade off: FedCS, TiFL, Oort, and AutoFL

③The score calculated in Oort:

$U_{i}^{Oort}=\underbrace{|B_{i}|\sqrt{\frac{1}{|B_{i}|}\sum_{k\in B_{i}}Loss(k)^{2}}}_{Data \, quality}\times\underbrace{(\frac{T}{t_{i}})^{\mathbf{1}(T<t_{i})\times\alpha}}_{System \, speed}$

the second term penalize the delay with exponent $\alpha > 0$

④The authors executed experiment and identified that the penalty is over heavy, and the opportunity of being chosen of slow client is much less than fast one

⑤The time records that the accuracy achieves 95% under different penlaty coefficient $\alpha$ :

2.4. Pisces overview

①The workflow of Pisces:

②Coordinator’s control loop algorithm:

2.5. Participant selection

2.5.1. Controlled concurrency

①The time to achieve specific accuracy and the accumulated updating clients number curve under different maximum concurrency:

就是作者觉得最大并发得限制一下，少了会导致没几个模型更新而且达到好的精度耗时很长，但是多了又导致模型频繁更新然后又更难训练好了

2.5.2. Utility profiling

①The authors reckon that the harm is more than good for strictly penalizing

②Stale updating of high quality data might not be so useful

③The updating method of Pisces:

$U_{i}^{Pisces}=\underbrace{|B_{i}|\sqrt{\frac{1}{|B_{i}|}\sum_{k\in B_{i}}Loss(k)^{2}}}_{Data \, quality}\times\underbrace{\frac{1}{(\tilde{\tau_{i}}+1)^{\beta}}}_{Staleness}$

where $\tilde{\tau}_i\geq 0$ denotes the estimated staleness of the client’s updates, $\beta > 0$ is the staleness penalty factor

④The authors will blacklist clients who have lost outliers exceeding a certain time threshold: each client obtain $r$ reliability credits. DBSCAN clusters the updating pool in $\left \{ w_{t-k},w_{t-k+1},...,w_t \right \},k> 0$ based on associated losses. If a client experiences an outlier once, deduct 1 credit score. When the credit score is used up, this client is removed from the training pool（不过我不太知道怎么去界定异常值？）

⑤The staleness calculation:

$\tilde{\tau_i}=\frac{1}{k}\sum_{j=t-k+1}^t\tau_{i,j}$

⑥The staleness figure:

2.6. Model aggregation

2.6.1. Bounded staleness

①Buffered aggregation (BA) and Pisces's methods:

where FedBuff updates when the aggregation goal $K$ but it ignores the slowest client

2.6.2. Adaptive pace control

①Different from FedBuff, Pisces considers the most delayed client (with grey shadow) and adjusts aggregation interval

②Within 2 updating operators, all the runned clients have the same staleness bound

③Examination of aggregations in the control loop:

at beginning time $T_l$ , in the $l$ loop step, Pisces obtains the profiled latency $L_{max}$ of the slowest client. And define the aggregation interval $I=\frac{L_{max}}{b}$ and calculate elapsed time $e=T_{l}-t_{\tau_{j-1}}$ after the last aggregation. Only when $e> I$ , Pisces executes aggregation（感觉这个设计还是挺妙的啊，如果是首次提出的话）

2.7. Theoretical analysis

2.7.1. Convergence guarantee

（1）Notation

$T_l$	the start time of a loop step $l$
$I_l$	aggregation interval at step $l$
$n$	number of clients
$b$	staleness bound
$\triangledown F_i\left ( w \right )$	gradient of model $w$ with respect to the loss of client $i$ 's data
$f(w)=\frac{1}{n}\sum_{i=1}^{n}p_{i}F_{i}(w)$	global learning objective with $p_i> 0$ weighting each client’s contributio
$f^*$	the optimum of $f\left ( w \right )$
$g_i(w;\xi_i)$	stochastic gradient on client $i$ with randomness $\xi_i$
$Q$	number of steps in local SGD

（2）Why does Pisces achieve bounded staleness?

（3）What is the convergence guarantee?

①....参考原文吧，我不在这证明了

2.7.2. Performance analysis

①Computation cost: $O(n^2)$ , mostly taken by DBSCAN

②Communication cost: $O(n)$ , collected from $n$ clients

③Storage cost: $O(n^2)$ , DBSCAN（搁这全是DBSCAN来的？干嘛不换一个聚类）

2.8. Implementation

①Platform: Plato

2.9. Evaluation

2.9.1. Methodology

（1）Datasets and models

①Image classification datasets: CIFAR-10 with 60k colored images, MNIST with 60k grescale image in 10 classes, and FEMNIST with 805k grescale images in 63 classes

②Model on images: LeNet5

③Language modeling dataset: StackOverflow on 0.3 million clients

④Model on text: Albert

（2）Experimental Setup

①Simulated latency: based on Zipf distribution with a=1.2

（3）Hyperparameters

①Concurrency limit $C=20$

②Staleness penalty factor $\beta = 0.5$

③Optimizer: SGD with 0.9 momentum except for StackOverflow on Adam

④Other hyperparameters:

（4）Baseline Algorithms

①Compared baselines: Oort and FedBuff

（5）Metrics

①Evaluation metrics: time-to-acc, aggregation times, and network footprint

2.9.2. End-to-end performance

①Time-to-accuracy comparison table:

②Model aggregation times:

③Dataset scale perference:

④Convergence behaviours curve:

⑤Optimizations on participant selection are effective:

⑥Optimizations on aggregation adapt to various settings:

⑦Competitive in network overhead (total network footprint (GB)):

2.9.3. Sensitivity analysis

①Impact of participation scale:

②Impact of training loss outliers:

they manually insert inverse all the labels in subset of clients, which caused high loss. The corruption rate in (a) is set to 5%

③Impact of staleness penalty factor:

④Impact of optimizers:

⑤Impact of model architectures:

2.10. Discussion and future work

①Privacy compliance: the data might be disclosed in average training loss and local update

②They aimed to further introduce differential privacy (DP) to protect their data

2.11. Related work

①Lists regularized to local optimization methods such as FedAsynv and ASO-Fed; aggregation optimization methods such as HySync, Port and FedBuff; or principled participant selection（？） methods FeadAR and works by Chen

2.12. Conclusion

Pisces is speedy, robust and flexible

3. 知识补充

3.1. Oort

（1）参考学习1：【文献阅读】Oort:通过引导参与者选择实现高效联合学习_oort联邦学习-CSDN博客

（2）参考学习2：一种新型的联邦分布式架构（Oort）及未来研究方向 - 知乎

3.2. FedBuff

（1）参考学习1：【论文笔记 | 异步联邦】 FedBuff-CSDN博客

4. Reference

Jiang, Z. et al. (2022) 'Pisces: Efficient Federated Learning via Guided Asynchronous Training', ACM SoCC. doi: Pisces | Proceedings of the 13th Symposium on Cloud Computing

查看全文

http://www.kler.cn/a/370114.html