Generate pairwise constraints from unlabeled data for semi-supervised clustering. Data & Knowledge Engineering, 101715
Pairwise constraint selection methods often rely on the label information of data to generate pairwise constraints. This paper proposes a new method of selecting pairwise constraints from unlabeled data for semi-supervised clustering to improve clustering accuracy. Given a dataset without any label information, it is first clustered by using the I-nice method into a set of initial clusters. From each initial cluster, a dense group of objects is obtained by removing the faraway objects. Then, the most informative object and the informative objects are identified with the local density estimation method in each dense group of objects. The identified objects are used to form a set of pairwise constraints, which are incorporated in the semi-supervised clustering algorithm to guide the clustering process toward a better solution. The advantage of this method is that no label information of data is required for selection pairwise constraints. Experimental results demonstrate that the new method improved the clustering accuracy and outperformed four state-of-the-art pairwise constraint selection methods, namely, random, FFQS, min–max, and NPU, on both synthetic and real-world datasets.
محدودیتهای جفتی را از دادههای فاقد برچسب برای خوشهبندی نیمه تحت نظارت ایجاد کنید. دیتا: مهندسی دانش، ۱۰۱۷۱۵