[Home ] [Archive]   [ فارسی ]  
:: ::
Back to the articles list Back to browse issues page
Clustering scientific articles based on the k_means algorithm Case Study: Iranian Research Institute for information Science and Technology (IranDoc)
Adel Soleimani Nezhad , Mozhdeh Salajegheh , Elham Tayyebi Nia
Assistant professor Shahid Bahonar University of Kerman, Kerman, Iran
Abstract:   (512 Views)
With the increasing growth of Web-based resources and articles, the use of quick and inexpensive ways to access the texts is important from the vast collection of these documents. The main objective of this research is to cluster the base of Iranian Research Institute for information Science and Technology (IranDoc) based on text mining techniques. So that the articles are ivided into several clusters so that the articles of the different clusters have the maximum possible difference and the articles in each cluster have the most similarity. Articles on information technology related fields were selected. For this purpose, first all the keywords of information technology fields were selected based on their frequencies in base articles and then the articles of each keyword were extracted from the Iran Doc database. Then, using the notepad ++ software, the dataset was created. In this research, clustering of k_means algorithm and Euclidean distance function criterion were used to measure the similarity of clusters. Then the results of the clustering were analyzed to find the similarity and pattern among the papers. The pattern showed that the greatest similarity is found between articles in the two data mining clusters and the neural network with an Euclidean distance of 1.365, and the least similarity between the two cluster articles is optimization and image processing with a distance of 1.387. Research knowledge, clustering of articles related to the highest and the least degree of similarity with each other, finding a new pattern for quick and easy access to similar articles, and discovering hidden relationships among different subjects. This knowledge helps researchers to access topic-related articles related to specialization Identify themselves and the subject of the study in a more desirable way.
Keywords: text mining, clustering, k_means algorithm, Euclidean distance function criterion, Iran doc database.
Full-Text [PDF 1189 kb]   (139 Downloads)    
Type of Study: Research | Subject: Information Technology
Received: 2017/07/19 | Accepted: 2018/03/19 | Published: 2018/04/24
Send email to the article author

Add your comments about this article
Your username or Email:


XML   Persian Abstract   Print

Back to the articles list Back to browse issues page
پژوهشنامه پردازش و مدیریت اطلاعات Journal of Information processing and Management
Persian site map - English site map - Created in 0.28 seconds with 30 queries by YEKTAWEB 3749