董少鹏, 杨晨阳, 刘婷婷. 基于联邦学习的无线任务:数据非IID一定影响性能?[J]. 信号处理, 2021, 37(8): 1365-1377. DOI: 10.16798/j.issn.1003-0530.2021.08.003
引用本文: 董少鹏, 杨晨阳, 刘婷婷. 基于联邦学习的无线任务:数据非IID一定影响性能?[J]. 信号处理, 2021, 37(8): 1365-1377. DOI: 10.16798/j.issn.1003-0530.2021.08.003
DONG Shaopeng, YANG Chenyang, LIU Tingting. Federated Learning based Wireless Tasks : Will Non-IID Data Have Impact on Performance?[J]. JOURNAL OF SIGNAL PROCESSING, 2021, 37(8): 1365-1377. DOI: 10.16798/j.issn.1003-0530.2021.08.003
Citation: DONG Shaopeng, YANG Chenyang, LIU Tingting. Federated Learning based Wireless Tasks : Will Non-IID Data Have Impact on Performance?[J]. JOURNAL OF SIGNAL PROCESSING, 2021, 37(8): 1365-1377. DOI: 10.16798/j.issn.1003-0530.2021.08.003

基于联邦学习的无线任务:数据非IID一定影响性能?

Federated Learning based Wireless Tasks : Will Non-IID Data Have Impact on Performance?

  • 摘要: 作为一种分布式训练框架,联邦学习在无线通信领域有着广阔的应用前景,也面临着多方面的技术挑战,其中之一源于参与训练用户数据集的非独立同分布(Independent and identically distributed,IID)。不少文献提出了解决方法,以减轻户数据集非IID造成的联邦学习性能损失。本文以平均信道增益预测、正交幅度调制信号的解调这两个无线任务以及两个图像分类任务为例,分析用户数据集非IID对联邦学习性能的影响,通过神经网络损失函数的可视化和对模型参数的偏移量进行分析,尝试解释非IID数据集对不同任务影响程度不同的原因。分析结果表明,用户数据集非IID未必导致联邦学习性能的下降。在不同数据集上通过联邦平均算法训练得到的模型参数偏移程度和损失函数形状有很大的差异,二者共同导致了不同任务受数据非IID影响程度的不同;在同一个回归问题中,数据集非IID是否影响联邦学习的性能与引起数据非IID的具体因素有关。

     

    Abstract: As a distributed training framework, federated learning has broad applications in wireless communications, which still faces many challenges. One of the challenges comes from the non-independent and non-identically distributed (non-IID) datasets at the participant users. Many solutions have been proposed in literature to mitigate the performance loss of federated learning from non-IID datasets. This paper strives to analyze the impact of non-IID datasets on the performance of federated learning, by taking the tasks of average channel gain prediction, quadrature amplitude modulated signal’s demodulation and image classification as examples. By resorting to loss function visualization and analyzing the difference of deviation of the model parameters obtained by federated average algorithm, we attempt to explain why the impact of non-IID datasets on federated learning differs among tasks. Our results show that non-IID datasets do not always incur the performance degradation of the federated average algorithm. The model parameters trained by federated average on different datasets are with different degree of deviations, and the shape of the loss surface for different tasks differs, both lead to the different impact of non-IID dataset on the performance of federated learning. For the same regression problem, whether or not the non-IID data affects federated learning depends on specific factors incurring the non-IID data.

     

/

返回文章
返回