Page MenuHomePhabricator

Fix Plagiabot to use new way of fetching API tokens
Closed, ResolvedPublic

Description

Problem

Plagiabot (which runs as User:EranBot on the wikis) requires Python 2. Around October 16, it stopped adding change tags to revisions that were copyvios, meaning they are no longer surfaced in Special:NewPagesFeed (aka Page Curation / Page Triage). This apparently is because it was using the now obsolete method of fetching tokens in the MediaWiki API. For now we have commented out the relevant part of the code, but we'd like to get the Page Curation integration working again.

Proposed solution

This task proposes a quick fix to implement the new method of fetching tokens directly into Plagiabot, since we are stuck on Python 2 at the moment and can't upgrade to the latest pywikibot. The real fix is to upgrade to Python 3, tracked at T293688: CopyPatrol: port Plagiabot to Python 3.


From T292343:

WARNING: API warning (paraminfo): The module "main" does not have a submodule "tokens".

This traces back to report_logger.py#L31. The code looks like it's doing things the right way, but the timing of the error makes me think it might be related to MediaWiki 1.37/Deprecation of legacy API token parameters.

When I start the enwiki job, I see that it's going through recent changes and querying iThenticate just fine. It's when it finds a copyvio and tries to add a tag to the revision that we hit the above warning.

Maybe pywikibot needs to be updated? Hopefully we can do this and still stay on Python 2. Pinging @eranroz @JJMC89 in case you have any ideas.

rPWBCc0cf17c196ca: [mw 1.37] Fix for removed action API token parameters is included in pywikibot 6.6.1; however, pywikibot hasn't supported python2 since 3.0.20200703.

We may need to patch in the new way of fetching tokens ourselves. I take this is the only interaction with the wiki that has a need for a token. If I was any good at Python/pywikibot I'd attempt this myself.

For now I have commented out the code that adds the revision tag, so hopefully the feed will start populating soon. The drawback is Special:NewPagesFeed will no longer indicate copyvios. I believe a good handful of users came to CopyPatrol from Special:NewPagesFeed, but surely having it recorded as a copyvio somewhere (in this case, the CopyPatrol web interface) is better than nowhere.

I've also gone ahead and filed T293688: CopyPatrol: port Plagiabot to Python 3 about porting the bot to Python 3, or rewriting it.

Event Timeline

Restricted Application added subscribers: Cyberpower678, Aklapper. · View Herald Transcript

I can try to backport c0cf17c onto 09e2747, which is the version in /data/project/eranbot/pywikibot/core, for a quick fix.

Apply /data/project/jjmc89-bot-dev/eranbot-pwb/T293692-1.patch to /data/project/eranbot/pywikibot/core.

Test in the python2 interpreter with

>>> import pywikibot
>>> site = pywikibot.Site('en', 'wikipedia')
>>> site.login()  # if not already logged in
>>> site.tokens['csrf']
u'yourcsrftokenshownhere+\\'

You may need to clear the cache in /data/project/eranbot/.pywikibot.

Thanks @JJMC89! I've applied the patch. I don't know if it's related but now I'm seeing:

Traceback (most recent call last):
  File "/data/project/eranbot/gitPlagiabot/plagiabot/plagiabot.py", line 861, in <module>
    main()
  File "/data/project/eranbot/gitPlagiabot/plagiabot/plagiabot.py", line 856, in main
    bot.run()
  File "/data/project/eranbot/gitPlagiabot/plagiabot/plagiabot.py", line 629, in run
    self.process_changes()
  File "/data/project/eranbot/gitPlagiabot/plagiabot/plagiabot.py", line 429, in process_changes
    pywikibot.output("Error occurred - skipping: %s" % str(e))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 34: ordinal not in range(128)

You may need to clear the cache in /data/project/eranbot/.pywikibot

This may or may not be reason why... pardon my ignorance, but how do I clear the cache? Should I delete the contents of .pywikibot/apicache-py2/ ?

While we're at it, with all the help you've been providing, I see no reason not to add you to the eranbot Toolforge account. So... done! Feel free to login yourself and tweak things as needed. Thanks again!

That is an unrelated preexisting issue. I've applied this patch as a quick workaround.

diff --git a/plagiabot.py b/plagiabot.py
index 3f7dfb9..2654b76 100644
--- a/plagiabot.py
+++ b/plagiabot.py
@@ -426,7 +426,10 @@ class PlagiaBot(object):
                 if ignore_regex.match(comment):
                     continue
             except Exception as e:
-                pywikibot.output("Error occurred - skipping: %s" % str(e))
+                try:
+                    pywikibot.output("Error occurred - skipping: %s" % str(e))
+                except UnicodeDecodeError:
+                    pywikibot.output("Error occurred - skipping: Unknown [cannot decode]")
                 continue

             diffy = difflib.SequenceMatcher()

I've reenabled the page triage tagging by reverting the below. If it goes down again, please redisable.

diff --git a/report_logger.py b/report_logger.py
index 8d39e31..51dfcc6 100644
--- a/report_logger.py
+++ b/report_logger.py
@@ -28,18 +28,7 @@ class ReportLogger(object):
             self.page_triage_copyvio(diff)

     def page_triage_copyvio(self, diff):
-        token = self.site.tokens['csrf']
-        params = {
-            'action': 'pagetriagetagcopyvio',
-            'token': token,
-            'revid': diff
-        }
-        request = self.site._request(parameters=params, use_get=False)
-        try:
-            response = request.submit()
-        except APIError as e:
-            # silently drop it
-            pywikibot.output('Triage triage {}: {}'.format(diff, str(e)))
+        pywikibot.output('Page triage integration temporarily disabled; see T292343#7435933')

     @property
     def page_triage(self):
JJMC89 claimed this task.
00:52, 20 October 2021 EranBot talk contribs marked revision 1050806727 on Edward Gabkwet as a potential copyright violation Tag: PageTriage
23:40, 19 October 2021 EranBot talk contribs marked revision 1050798808 on Austin Powers (pinball) as a potential copyright violation Tag: PageTriage
22:37, 19 October 2021 EranBot talk contribs marked revision 1050791048 on Draft:Jehsayan The Man Himself as a potential copyright violation Tag: PageTriage
22:30, 19 October 2021 EranBot talk contribs marked revision 1050790054 on Asharaf (Seville) as a potential copyright violation Tag: PageTriage
21:33, 19 October 2021 EranBot talk contribs marked revision 1050781429 on Draft:Lea Castle, England as a potential copyright violation Tag: PageTriage
21:26, 19 October 2021 EranBot talk contribs marked revision 1050780995 on 2021 British Academy Scotland Awards as a potential copyright violation Tag: PageTriage
21:22, 19 October 2021 EranBot talk contribs marked revision 1050780462 on Draft:Cybersecurity Canon as a potential copyright violation Tag: PageTriage