Abstract:
As a distributed training framework, federated learning has broad applications in wireless communications, which still faces many challenges. One of the challenges comes from the non-independent and non-identically distributed (non-IID) datasets at the participant users. Many solutions have been proposed in literature to mitigate the performance loss of federated learning from non-IID datasets. This paper strives to analyze the impact of non-IID datasets on the performance of federated learning, by taking the tasks of average channel gain prediction, quadrature amplitude modulated signal’s demodulation and image classification as examples. By resorting to loss function visualization and analyzing the difference of deviation of the model parameters obtained by federated average algorithm, we attempt to explain why the impact of non-IID datasets on federated learning differs among tasks. Our results show that non-IID datasets do not always incur the performance degradation of the federated average algorithm. The model parameters trained by federated average on different datasets are with different degree of deviations, and the shape of the loss surface for different tasks differs, both lead to the different impact of non-IID dataset on the performance of federated learning. For the same regression problem, whether or not the non-IID data affects federated learning depends on specific factors incurring the non-IID data.