Fuzzy Cross Language Plagiarism Detection (Arabic-English) using WordNet in a Big Data environment
In: Proceedings of the 2018 2nd International Conference on Cloud and Big Data Computing, 2018-08-03
Online
unknown
Zugriff:
Cross-Language Plagiarism refers to the unacknowledged reuse of a text involving its translation from one natural language to another without proper referencing to the original source. One of the common problems in data processing is efficient large-scale text comparison, especially semantic based similarity due to the increase in the number of publications and the rate of suspicious documents sources of plagiarism. CLPD nature could be more complicated than simple copy+translate and paste, thus the detecting process exposes the need for a vague concept and fuzzy sets techniques in a big data environment to reveal dishonest practices in Arabic documents. In this paper, we propose a new Cross-Language Plagiarism Detection based on fuzzy-semantic similarity using WordNet and two semantic approaches WuP the work is done in a parallel way using Apache Hadoop with its distributed file system HDFS and the MapReduce programming model. The experimental results show that the Fuzzy Wu & Palmer have high performance than Fuzzy Lin.
Titel: |
Fuzzy Cross Language Plagiarism Detection (Arabic-English) using WordNet in a Big Data environment
|
---|---|
Autor/in / Beteiligte Person: | Oukessou, Mohamed ; Ezzikouri, Hanane ; Youness, Madani ; Erritali, Mohamed |
Link: | |
Zeitschrift: | Proceedings of the 2018 2nd International Conference on Cloud and Big Data Computing, 2018-08-03 |
Veröffentlichung: | ACM, 2018 |
Medientyp: | unknown |
DOI: | 10.1145/3264560.3264562 |
Schlagwort: |
|
Sonstiges: |
|