Hi, according to this community discussion, we would like to request addition of arwiki to work with CopyPatrol tool, since there is no objections for two weeks, greetings for you all.
Description
Related Objects
- Mentioned In
- T333595: Request for arwiki to be added to CopyPatrol
T273017: UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-3: ordinal not in range(256) - Mentioned Here
- T273017: UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-3: ordinal not in range(256)
T244665: CopyPatrol incorrectly encodes non-ASCII letters (with diacritics) in article titles, so the links do not work
Event Timeline
Just a note that this has been running for a few days now, there just apparently haven't been any copyvios yet. Once there are, https://copypatrol.toolforge.org/ar should magically start working.
While iThenticate, our plagiarism detection service, explicitly states they support Arabic, it's possible that it simply isn't that good. But from the logs and all indications, the bot is running. So let's just wait and see if anything shows up.
Thanks @MusikAnimal
Just a note that this has been running for a few days now, there just apparently haven't been any copyvios yet. Once there are, https://copypatrol.toolforge.org/ar should magically start working.
Is there any expected time when https://copypatrol.toolforge.org/ar will work? as until now (after 7 days) still give 404 Page Not Found: "The page you are looking for could not be found. Check the address bar to ensure your URL is spelled correctly. If all else fails, you can visit our home page at the link below."
Okay, I think there is a bug with the bot. Here is a stack trace of a recent error:
Traceback (most recent call last): File "/data/project/eranbot/gitPlagiabot/plagiabot/plagiabot.py", line 861, in <module> main() File "/data/project/eranbot/gitPlagiabot/plagiabot/plagiabot.py", line 856, in main bot.run() File "/data/project/eranbot/gitPlagiabot/plagiabot/plagiabot.py", line 617, in run self.report_uploads() # report checked edits File "/data/project/eranbot/gitPlagiabot/plagiabot/plagiabot.py", line 510, in report_uploads self.report_log.add_report(rep['new'], rep['diff_date'], rep['title_no_ns'], rep['ns'], rep['report_id'], rep['source']) File "/mnt/nfs/labstore-secondary-tools-project/eranbot/gitPlagiabot/plagiabot/report_logger.py", line 90, in add_report report)) File "/usr/lib/python2.7/dist-packages/MySQLdb/cursors.py", line 207, in execute args = tuple(map(db.literal, args)) File "/usr/lib/python2.7/dist-packages/MySQLdb/connections.py", line 304, in literal s = self.escape(o, self.encoders) File "/usr/lib/python2.7/dist-packages/MySQLdb/connections.py", line 222, in unicode_literal return db.literal(u.encode(unicode_literal.charset)) UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-3: ordinal not in range(256)
It was calling the add_report method, meaning it found a copyright violation, before it errored out due to an apparent encoding issue. That would explain why https://copypatrol.toolforge.org/ar still doesn't work!
There are other known encoding issues with CopyPatrol (T244665) that may or may not be related.
For the time being, it seems Arabic is not supported due to this bug. I have filed T273017 to investigate this further.
Sorry for the long wait!
@alaa Sorry for the long wait. The fix for T273017 appears to have worked :) There is now an Arabic feed: https://copypatrol.toolforge.org/ar
I will wait for confirmation from you that all looks good before resolving this task.
I will wait for confirmation from you that all looks good before resolving this task.
Thanks a lot @MusikAnimal, it's working well.
@alaa @Mohnd_Kh There has only been one case closed since we enabled Arabic CopyPatrol: https://copypatrol.toolforge.org/ar/leaderboard
Is anyone using it? Or perhaps you are forgetting to mark cases as "Fixed" or "No action needed"? CopyPatrol is expensive to maintain, so if you are not using it we would prefer to turn it off. Thanks for your understanding!
I have disabled the cron job for Arabic Wikipedia, so there will be no more cases added to the feed unless someone comes forth and says you're actively using CopyPatrol. I'll wait another week or so before removing the Arabic feed altogether.