Conference:
Accepted for publication in the Proceedings of the 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD 2017)
Location:
Guilin, China
Date:
Thursday, June 15, 2017
Abstract:
Building robust information revival systems demands employing efficient natural language processing and morphological analysis techniques. These techniques are commonly exploited to find syntactic and semantic matches between users’ queries and their corresponding documents. Word stemming is one those techniques that has been widely employed in Information Retrieval systems, namely to increase their recall. A lot of research work has been conducted to evaluate English stemming techniques. However, a little attention has been given to Arabic stemmers. In this research work, we present a comprehensive review of state-of-the-art Arabic stemming techniques and compare between them according to a variety of criteria. In addition, we classify existing Arabic stemmers into four categories: Root-based, Affix Removal, Rule-based, and Context-based techniques. We review seven of the most commonly used Arabic stemming algorithms that fall under these categories, and provide a comparative analysis and evaluation between them according to the goal, input, employed approach, and output of each technique. We conclude this study by proposing our idea of building a hybrid Arabic stemming approach that combines multiple stemmers and exploits a new set of rules to better stem Arabic words.
External Link: