AN IMPROVED ALGORITHM FOR THE EXTRACTION OF TRILITERAL ARABIC ROOTS

  • Raed Kanaan Management Information Systems Amman Arab University, Faculty of Computer Sciences and Informatics, Amman, Jordan
  • Ghassan Kanaan Computer Science Amman Arab University, Faculty of Computer Sciences and Informatics Amman, Jordan

Abstract

Stemming in the Arabic language is extracting the root form of the verb, removing inflectional affixes and derivational morphemes. Stemming is a share form of language processing in the systems of information retrieval. It is similar to the morphological processing used in natural language processing, but to some extent has different aims. Stemming is used to reduce word forms to common words. Stemming is the process of removing all affixes from a word to extract its root. This paper describes a stemming algorithm that has been developed for the Arabic language. The algorithm utilizes an important morphological aspect of the Arabic language. The algorithm examines the word and extracts its root. It examines the word letter by letter starting from the end of the word, i.e., from the last letter of the word to the first. The algorithm correctly stems most Arabic words that are derived from roots, and achieves high rate of accuracy. The algorithm has been tested on a corpus of 242 abstracts of Arabic documents from the Proceedings of the Saudi Arabian National Conference.

Downloads

Download data is not yet available.
Published
2014-01-31
How to Cite
Kanaan, R., & Kanaan, G. (2014). AN IMPROVED ALGORITHM FOR THE EXTRACTION OF TRILITERAL ARABIC ROOTS. European Scientific Journal, ESJ, 10(3). https://doi.org/10.19044/esj.2014.v10n3p%p