مرکز منطقه ای اطلاع رسانی علوم و فناوری فصلنامه فناوری اطلاعات و ارتباطات ایران 2717-0411 15 57 2023 9 19 Noor Analysis: A Benchmark Dataset for Evaluating Morphological Analysis Engines تحلیل نور: یک دادگان معیار برای ارزیابی روش‌های برچسب‌گذاری صرفی 153 164 fa هدی الشهیب دانشگاه علم و صنعت ایران بهروز مینایی . محمد ابراهیم شناسا . Sayyed Ali Hossayni . 2022 10 26 The Arabic language has a very rich and complex morphology, which is very useful for the analysis of the Arabic language, especially in traditional Arabic texts such as historical and religious texts, and helps in understanding the meaning of the texts. In the morphological data set, the variety of labels and the number of data samples helps to evaluate the morphological methods, in this research, the morphological dataset that we present includes about 22, 3690 words from the book of Sharia alـIslam, which have been labeled by experts, and this dataset is the largest in terms of volume and The variety of labels is superior to other data provided for Arabic morphological analysis. To evaluate the data, we applied the Farasa system to the texts and we report the annotation quality through four evaluation on the Farasa system. زبان عربی ریخت‌‌شناسی بسیار غنی و پیچیده‌ای دارد که برای تحلیل زبان عربی و به ویژه در متون عربی سنتی مانند متون تاریخی و مذهبی بسیار مفید است و در فهم معنای متون کمک می‌کند. در مجموعه داده‌های ریخت‌شناسی تنوع برچسب و تعداد نمونه‌های دادگان به ارزیابی روش‌های ریخت‌شناسی کمک بیشتری می‌کند، در این پژوهش مجموعه داده ریخت‌شناسی که ارائه می‌کنیم شامل حدود ۲۲۳۶۹۰ کلمه از كتاب شرائع الاسلام است که توسط متخصصین برچسب‌گذاری شده است که این مجموعه دادگان از نظر حجم و تنوع برچسب‌ها نسبت به سایر دادگان‌هایی که برای تحلیل ریخت‌شناسی عربی ارائه داده شده است برتر می‌باشد. برای ارزیابی دادگان، سامانه فراسه را بر روی متون اعمال کردیم و کیفیت حاشیه‌نویسی را از طريق چهار معيار بر روی سامانه فراسه گزارش می‌کنیم.

http://jour.aicti.ir/fa/Article/Download/44113