نویسندگان
1 دانشگاه شیراز؛ شیراز، ایران؛ مسئول کتابخانه دانشکده پرستاری و مامایی؛ دانشگاه علوم پزشکی قزوین؛ قزوین، ایران؛
2 بخش علم اطلاعات و دانششناسی؛ دانشکده روانشناسی و علوم تربیتی؛ دانشگاه شیراز؛ شیراز، ایران؛
3 ؛ بخش مهندسی و علوم کامپیوتر و فناوری اطلاعات؛ دانشگاه شیراز؛ شیراز، ایران
4 بخش علم اطلاعات و دانششناسی؛ دانشکده روانشناسی و علوم تربیتی؛ دانشگاه شیراز؛ شیراز، ایران؛
5 دانشگاه فردوسی مشهد؛ مشهد، ایران؛ کارشناس بخش تست و توسعه؛ شرکت پارسآذرخش؛ تهران، ایران
6 گروه پزشکی اجتماعی؛ دانشکده پزشکی؛ دانشگاه علوم پزشکی قزوین؛ قزوین، ایران
7 دانشکده مهندسی کامپیوتر؛ دانشگاه صنعتی امیرکبیر؛ تهران، ایران؛
چکیده
کلیدواژهها
عنوان مقاله [English]
نویسندگان [English]
Keyword and phrase extraction is a prerequisite of many natural language processing tasks. However, a review on the related Persian and English literature showed that a few studies have already been done on how to extract keywords and phrases from Persian texts. Thus, aiming to shed light on the research status of Keyword and phrase extraction from Persian texts, the present study reviews the Persian and English publications which have assessed their research ideas over Persian texts. We also focus on each of the studies to challenge their methodologies, implementations and evaluation methods and measures.
To our knowledge, a total number of 14 Persian and 6 English papers exist which have worked on the extraction of Persian keywords and phrases. Investigating on the papers revealed that they were mostly based on statistical and linguistic information. A majority of the papers suffered from the lack of either appropriate methodologies or lucid explanation of their research ideas. They generally used non-standard datasets and vague or problematic metrics to evaluate the experimental systems. Generally speaking, except 3 papers that appropriately reported their proposed methods, the other papers lacked reproducibility and generalizability. Hence, their results cannot be confidently used as a benchmark in evaluating future works, and their proposed ideas cannot be employed in developing applications for extraction of key words and phrases from Persian texts.
کلیدواژهها [English]