Document Clustering Menggunakan Sequential Information Bottleneck Method

Fattoni Aji Purwanto

Document Clustering Menggunakan Sequential Information Bottleneck Method

Fattoni Aji Purwanto

Informasi Dasar

Document Clustering Menggunakan Sequential Information Bottleneck Method

Dilihat

340 kali

No. Katalog

113070139

Klasifikasi

005.1

Jenis katalog

Karya Ilmiah - Skripsi (S1) - Reference

Abstraksi

ABSTRAKSI: Masalah yang sering dihadapi dalam masalah document clustering adalah menentukan algoritma atau metode yang tepat dengan jumlah dokumen tertentu. Terkadang metode yang dengan hasil yang cukup akurat membutuhkan waktu yang lama untuk pemrosesan. Ada beberapa pendekatan yang dipakai untuk menyelesaikan masalah clustering ini, yaitu clustering dengan pendekatan partisi dan clustering dengan pendekatan hirarki. Masing-masing memiliki kekurangan dan kelebihan. Salah satu dari sekian banyak metode clustering yang bisa digunakan adalah sequential information bottleneck (sIB) method[5]. Algoritma sequential information bottleneck dalam document clustering memberikan jaminan ditemukannya solusi yang merupakan lokal maksimum dari fungsi target.

Pada tugas akhir ini dilakukan penerapan Algoritma sequential information bottleneck sebagai metode clustering. Hasil clustering kemudian diukur akurasi cluster-nya menggunakan micro-averaged precision dan micro averaged recall dengan memperhatikan perubahan parameter-parameter masukan yang digunakan seperti maximum loop (maxL), jumlah inisialisasi random cluster, dan nilai error. Dari hasil pengujian didapatkan bahwa Algoritma sequential information bottleneck sangat baik digunakan sebagai metode clustering, hal ini bisa dilihat dari hasil percobaan dimana nilai akurasi yang dicapai rata-rata diatas 70%.

Akurasi cluster semakin meningkat seiring dengan peningkatan nilai parameter maximum loop hingga sampai pada batas tertentu dimana kondisi perulangan sudah berhenti karena parameter lain (nilai error) sudah terpenuhi nilainya. karena hasil dari dokumen yang telah diproses menggunakan sequential information bottleneck method memiliki nilai presisi yang sangat tinggi, maka hasil clustering tersebut dapat digunakan sebagai training set untuk supervised classification method[5]. supervised classification method yang akan digunakan adalah naive bayes clasification method. Naive bayes clasification method yang menggunakan hasil clustering dengan sIB sebagai training set kemudian dibandingkan akurasinya dengan Naive bayes clasification method yang tidak menggunakan hasil clustering dengan sIB sebagai training.Kata Kunci : Sequential Information Bottleneck Method, Naive bayes classification method, clustering, klasifikasi.ABSTRACT: Problems that are often encountered in the document clustering is to determine the appropriate algorithm or method with a number of specific documents. Sometimes the method that has quite accurate results takes a long time for processing. There are several approaches used to solve this clustering problem, namely partitioning with clustering approach and hierarchical clustering approach. Each has advantages and disadvantages. One of the many clustering methods that can be used is the sequential information bottleneck (SIB) method [5]. The algorithm of sequential information bottleneck in document guarantees the discovery of solution which is a local maximum of the target function.

This paper applied sequential information bottleneck algorithm as a clustering method. The clustering results are then measured its accuracy using micro-averaged precision and micro-averaged recall by considering changes in the input parameters used, such as maximum loop (maxL), the number of random initialization of the cluster, and the error value. The test result obtained that sequential information bottleneck algorithm is best used as a clustering method, it can be seen from the results of experiments in which the value of accuracy is achieved on average over 70%.

Accuracy of the cluster increases with increasing parameter values until the maximum loop to some extent where the looping condition has stopped because the other parameters (error value) has fulfilled its value. as a result of documents that have been processed using sequential information bottleneck method has a very high precision value, then the clustering results can be used as a training set for supervised classification method [5]. supervised classification method to be used is the naive Bayes method clasification. Naive Bayes clasification method that uses the results of the sIB as a training set then compared its accuracy with Naive Bayes clasification method that is not using clustering with the sIB as a result of training.Keyword: Sequential Information Bottleneck Method, Naive bayes classification method, clustering, classification