[论文精读]Pisces: Efficient Federated Learning via GuidedAsynchronous Training
论文网址:Pisces | Proceedings of the 13th Symposium on Cloud Computing
论文代码:GitHub - SamuelGong/Pisces: [ACM SoCC'22] Pisces: Efficient Federated Learning via Guided Asynchronous Training
英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用
目录
1. 心得
2. 论文逐段精读
2.1. Abstract
2.2. Introduction
2.3. Background and motivation
2.3.1. Synchronous federated learning
2.3.2. Participant selection
2.4. Pisces overview
2.5. Participant selection
2.5.1. Controlled concurrency
2.5.2. Utility profiling
2.6. Model aggregation
2.6.1. Bounded staleness
2.6.2. Adaptive pace control
2.7. Theoretical analysis
2.7.1. Convergence guarantee
2.7.2. Performance analysis
2.8. Implementation
2.9. Evaluation
2.9.1. Methodology
2.9.2. End-to-end performance
2.9.3. Sensitivity analysis
2.10. Discussion and future work
2.11. Related work
2.12. Conclusion
3. 知识补充
3.1. Oort
3.2. FedBuff
4. Reference
1. 心得
(1)至少从眼缘来看...这篇文章...是比其他某些基础文章要友好很多的...
(2)这是ccf b???吗???我感觉写的挺好的呢还是我搜索错了?
2. 论文逐段精读
2.1. Abstract
①Slow clients can slow down the overall server updating speed
stale adj.不新鲜的;陈腐的;(空气)污浊的;老掉牙的;厌倦的,腻烦的;(烟味)难闻的;没有新意的 n.(牛马、骆驼的)尿 v.使陈旧;使没有味道;用坏;变陈旧
2.2. Introduction
①An extreme scenario is that clients with small amounts of data have extremely fast transmission speeds, but clients with rich data transmit very slowly
②Pisces firstly consider the clients with high data quality based on the loss
③Platform: Plato
orchestrate vt.编排;策划;编配(或创作管弦乐曲);精心安排;密谋
2.3. Background and motivation
2.3.1. Synchronous federated learning
①Sychronous and asynchronous FL, the subset of clients might be randomly chosen:
②Solutions of synchronization performance (time costs) bottleneck: a) periodic aggregation: limits the deadline of transmission, b) over-selection: choose more clients based on time. However, they two ignore the rich information in slow clients
straggler n.掉队者;落伍者;落在最后的人(或动物)
2.3.2. Participant selection
①Goal: balance time and data quality
②Relevant researches on trade off: FedCS, TiFL, Oort, and AutoFL
③The score calculated in Oort:
the second term penalize the delay with exponent
④The authors executed experiment and identified that the penalty is over heavy, and the opportunity of being chosen of slow client is much less than fast one
⑤The time records that the accuracy achieves 95% under different penlaty coefficient :
2.4. Pisces overview
①The workflow of Pisces:
②Coordinator’s control loop algorithm:
2.5. Participant selection
2.5.1. Controlled concurrency
①The time to achieve specific accuracy and the accumulated updating clients number curve under different maximum concurrency:
就是作者觉得最大并发得限制一下,少了会导致没几个模型更新而且达到好的精度耗时很长,但是多了又导致模型频繁更新然后又更难训练好了
2.5.2. Utility profiling
①The authors reckon that the harm is more than good for strictly penalizing
②Stale updating of high quality data might not be so useful
③The updating method of Pisces:
where denotes the estimated staleness of the client’s updates, is the staleness penalty factor
④The authors will blacklist clients who have lost outliers exceeding a certain time threshold: each client obtain reliability credits. DBSCAN clusters the updating pool in based on associated losses. If a client experiences an outlier once, deduct 1 credit score. When the credit score is used up, this client is removed from the training pool(不过我不太知道怎么去界定异常值?)
⑤The staleness calculation:
⑥The staleness figure:
2.6. Model aggregation
2.6.1. Bounded staleness
①Buffered aggregation (BA) and Pisces's methods:
where FedBuff updates when the aggregation goal but it ignores the slowest client
2.6.2. Adaptive pace control
①Different from FedBuff, Pisces considers the most delayed client (with grey shadow) and adjusts aggregation interval
②Within 2 updating operators, all the runned clients have the same staleness bound
③Examination of aggregations in the control loop:
at beginning time , in the loop step, Pisces obtains the profiled latency of the slowest client. And define the aggregation interval and calculate elapsed time after the last aggregation. Only when , Pisces executes aggregation(感觉这个设计还是挺妙的啊,如果是首次提出的话)
2.7. Theoretical analysis
2.7.1. Convergence guarantee
(1)Notation
the start time of a loop step | |
aggregation interval at step | |
number of clients | |
staleness bound | |
gradient of model with respect to the loss of client 's data | |
global learning objective with weighting each client’s contributio | |
the optimum of | |
stochastic gradient on client with randomness | |
number of steps in local SGD |
(2)Why does Pisces achieve bounded staleness?
(3)What is the convergence guarantee?
①....参考原文吧,我不在这证明了
2.7.2. Performance analysis
①Computation cost: , mostly taken by DBSCAN
②Communication cost: , collected from clients
③Storage cost: , DBSCAN(搁这全是DBSCAN来的?干嘛不换一个聚类)
2.8. Implementation
①Platform: Plato
2.9. Evaluation
2.9.1. Methodology
(1)Datasets and models
①Image classification datasets: CIFAR-10 with 60k colored images, MNIST with 60k grescale image in 10 classes, and FEMNIST with 805k grescale images in 63 classes
②Model on images: LeNet5
③Language modeling dataset: StackOverflow on 0.3 million clients
④Model on text: Albert
(2)Experimental Setup
①Simulated latency: based on Zipf distribution with a=1.2
(3)Hyperparameters
①Concurrency limit
②Staleness penalty factor
③Optimizer: SGD with 0.9 momentum except for StackOverflow on Adam
④Other hyperparameters:
(4)Baseline Algorithms
①Compared baselines: Oort and FedBuff
(5)Metrics
①Evaluation metrics: time-to-acc, aggregation times, and network footprint
2.9.2. End-to-end performance
①Time-to-accuracy comparison table:
②Model aggregation times:
③Dataset scale perference:
④Convergence behaviours curve:
⑤Optimizations on participant selection are effective:
⑥Optimizations on aggregation adapt to various settings:
⑦Competitive in network overhead (total network footprint (GB)):
2.9.3. Sensitivity analysis
①Impact of participation scale:
②Impact of training loss outliers:
they manually insert inverse all the labels in subset of clients, which caused high loss. The corruption rate in (a) is set to 5%
③Impact of staleness penalty factor:
④Impact of optimizers:
⑤Impact of model architectures:
2.10. Discussion and future work
①Privacy compliance: the data might be disclosed in average training loss and local update
②They aimed to further introduce differential privacy (DP) to protect their data
2.11. Related work
①Lists regularized to local optimization methods such as FedAsynv and ASO-Fed; aggregation optimization methods such as HySync, Port and FedBuff; or principled participant selection(?) methods FeadAR and works by Chen
2.12. Conclusion
Pisces is speedy, robust and flexible
3. 知识补充
3.1. Oort
(1)参考学习1:【文献阅读】Oort:通过引导参与者选择实现高效联合学习_oort联邦学习-CSDN博客
(2)参考学习2:一种新型的联邦分布式架构(Oort)及未来研究方向 - 知乎
3.2. FedBuff
(1)参考学习1:【论文笔记 | 异步联邦】 FedBuff-CSDN博客
4. Reference
Jiang, Z. et al. (2022) 'Pisces: Efficient Federated Learning via Guided Asynchronous Training', ACM SoCC. doi: Pisces | Proceedings of the 13th Symposium on Cloud Computing