view in publisher's site
- لیست مقالات
Enhanced Processing of Input Data in Clustering Techniques of Data Mining Algorithms
Techniques of data mining and its applications have become significant in almost all domains. Each technique has its own significance in a given problem context. Likewise, clustering technique is in use to group elements with similar properties together. Objective of this research work is to apply multithreading approach on grouping the elements into various clusters. This multithreading technique determines the target cluster for multiple elements simultaneously. Thus, this approach improves overall performance of clustering technique which involves many iterations required for finding out target cluster. Implementation approach in this work involves, partitioning the input data elements and associating each of these partitions to multiple threads. Each thread is responsible for picking up the input element from the respective partition and determining the destination cluster. This implementation technique uses master–slave design. Master component makes various partitions of input data elements, creates instances of slave components, and invoke the services offered by slaves. Each slave runs in separate thread of computation and performs the task of determining target cluster of the given input element from the corresponding partition. Slaves will terminate its task once it finishes the determination of target cluster for the data elements present in the respective partition. We have seen that the above-designed technique does well when it comes to consumption of computational time. Partitioning of input data elements and number of data points in a partition is decided based on number of cores available for multithreading. It consumes less computational time as each partition is processed simultaneously on different available cores. Thereby, it eliminates the large computational time needed for existing techniques. We did an experiment sample dataset from the data mining open-source library source spfm. The size of the datasets we have used is closed to 50 MB. Our test environment created maximum of four simultaneous threads for the processing of entire input data points. When observed with the current approaches, experimental results tell that our approach provides a performance gain of 30% on an average.