Incomplete Multi-View Clustering Based on Fairness Perception
-
Graphical Abstract
-
Abstract
Incomplete multi-view clustering is a technique for processing multi-source data that aims to identify consistent and complementary information across the data and segment it into distinct clusters. This method effectively addresses the challenges of unsupervised multi-source data analysis in complex environments, making it a topic of considerable discussion. However, existing algorithms for incomplete multi-view clustering have notable shortcomings. They often overlook differences in the data arising from sensitive attributes associated with specific groups. This oversight can lead to biases against these groups, resulting in fairness issues during clustering. Furthermore, missing samples that are repaired may lose their uniqueness. To tackle these challenges, this paper presents a fairness-perception-based incomplete multi-view clustering method. This approach aims to reduce the unfair treatment of underrepresented groups in unsupervised clustering tasks while addressing the issues of multi-view data consistency and missing data recovery. Initially, an automated codec is trained for each view, allowing the coherent fusion of embedded features through information theory. Simultaneously, a generative network is trained to recover the missing view data. When utilizing the embedded features for clustering, we constrain the distribution of sensitive groups within each cluster. This ensures that the distribution of these groups closely mirrors that of the entire dataset, promoting fairness in the algorithm. We conducted experiments comparing our method with five state-of-the-art incomplete multi-view clustering techniques across three widely used multi-view datasets. For instance, when the missing rate was 0.5 on the Bank dataset, our method achieved a 0.82% increase in Normalized Mutual Information (NMI) and a 3.03% increase in Balance compared to the second-best method. Additionally, on the Credit Card dataset, with a missing rate of 0, our method showed a 3.53% increase in NMI and a 5.62% increase in Balance compared to the second method. Visualization experiments on the Credit Card dataset further confirmed the performance and fairness of our clustering algorithm. Ablation studies demonstrated the effectiveness of our proposed multi-view consistency fusion and missing view recovery mechanisms. Our method not only addresses fairness concerns in unsupervised clustering within the context of incomplete multi-view data but also enhances the clustering performance of the algorithm.
-
-