Page MenuHomePhabricator

Copyvio: Make Eranbot call PageTriage with copyvio info
Closed, ResolvedPublic

Description

When Eranbot detects a copyright violation for a revision, it should call PageTriage to let it know.

The specific API to call is created as part of T202041

Eranbot will need a bot account that is a member of a specific group, also defined in T202041

Consider changing the 50% threshold currently used by Eranbot ( ping @MMiller_WMF) Note: This can be moved out if we're not too sure by the time this task is being done.

Event Timeline

MMiller_WMF renamed this task from Copyvio: Erabot to Copyvio: Eranbot.Aug 2 2018, 7:32 PM
SBisson renamed this task from Copyvio: Eranbot to Copyvio: Make Eranbot call PageTriage with copyvio info.Aug 16 2018, 7:59 PM
SBisson updated the task description. (Show Details)

Change 454285 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Raise tolerance for time-related unit tests to 10 seconds

https://gerrit.wikimedia.org/r/454285

@eranroz The API to call is merged and should be deployed to enwiki tomorrow. It needs to be posted with a token and a revid. You previously expressed interest in making the change. Let us know if you plan to do it. Thanks!

The bot will need an account and have the pagetriage-copyvio user-right. @Catrope: who can put that in place?

Change 455739 had a related patch set uploaded (by Catrope; owner: Catrope):
[mediawiki/extensions/PageTriage@master] Add user group for copyvio bots

https://gerrit.wikimedia.org/r/455739

The bot will need an account and have the pagetriage-copyvio user-right. @Catrope: who can put that in place?

After the attached patch is merged and deployed, there will be a new user group on English Wikipedia called "Copyright violation bots". A bureaucrat on enwiki can then put EranBot in that group.

Change 455739 merged by jenkins-bot:
[mediawiki/extensions/PageTriage@master] Add user group for copyvio bots

https://gerrit.wikimedia.org/r/455739

@eranroz The API to call is merged and should be deployed to enwiki tomorrow. It needs to be posted with a token and a revid. You previously expressed interest in making the change. Let us know if you plan to do it. Thanks!

The bot will need an account and have the pagetriage-copyvio user-right. @Catrope: who can put that in place?

Thanks! I'll get to expand the bot in few days/week.

Thanks! I'll get to expand the bot in few days/week.

Great! I'll re-assign to you. I started looking into it but you are the best person for the job.

FWIW, I was able to call the API using the following code but it was my first time working with pywikibot so there's probably a much better way to do that. If so, please ignore this.

site.login()
token = site.tokens['csrf']
params = {'action': 'pagetriagetagcopyvio', 'token': token, 'revid': <diff id here>}
request = site._request(parameters=params, use_get=False)
response = request.submit();
print response['pagetriagetagcopyvio']['result'] == 'success'

@Etonkovidova can you please grant EranCopyvioTest this right on https://en.wikipedia.beta.wmflabs.org ? thanks!

copyright violation bot right was granted to EranCopyvioTest in https://en.wikipedia.beta.wmflabs.org. Let me know if something else is needed.

Hi @eranroz, how are things going? Let us know if you hit any blocker or if we can help in any way. Thanks!

Done (commit) and updated the tool on labs.

To enable it technically :

  1. The bot needs copyviobot group
  2. Add extra parameter to command line (-pagetriagetag)

PS: note that any suspected diff that is written to DB is also reported to the API.
If the diff doesn't belong to a page in the PageTriage (page triage is for new pages rather the recent changes) it will result in error from the API which will be ignored by the bot

Hi @eranroz,

I tried testing the -pagetriagetag flag but I got the following error:

Traceback (most recent call last):
  File "/data/project/eranbot/gitPlagiabot/plagiabot/plagiabot.py", line 843, in <module>
    main()
  File "/data/project/eranbot/gitPlagiabot/plagiabot/plagiabot.py", line 831, in main
    report_log.page_triage = page_triage
  File "/mnt/nfs/labstore-secondary-tools-project/eranbot/gitPlagiabot/plagiabot/report_logger.py", line 50, in page_triage
    if val and not site.has_group('copyviobot'):
NameError: global name 'site' is not defined

Is it possible that this line needs self.site instead of just site?

I also just realized that the bot user doesn't belong to the right group yet. @Catrope is it something you can do?

Is it possible that this line needs self.site instead of just site?

Yes, fixed.

I also just realized that the bot user doesn't belong to the right group yet. @Catrope is it something you can do?

According to enwiki bot policy, the bot should first pass review https://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval/EranBot_3#Discussion
It seems like T204455 was a blocker task for getting it done.

Hi @eranroz,

In order to test and move the BRFA forward, I've enabled EranBot on testwiki using the following crontab entry

# run on testwiki (temporarily)
*/10 * * * * jsub -N testwiki -mem 500m -l h_rt=4:05:00 -once -quiet -o /data/project/eranbot/outs python /data/project/eranbot/gitPlagiabot/plagiabot/plagiabot.py -lang:test -blacklist:User:EranBot/Copyright/Blacklist -live:on -reportlogger -pagetriagetag

It's been running for a while, there's even a outs/testwiki-0xxxx file generated but so far it's empty and I can't really tell if it's picking up any of the "bad" revisions I've created on testwiki.

Please let me know if I did something wrong with the command above or if there's another log to look at to know what is going on.

Thanks!

I wouldn't run it as a continuous job against test wiki as it means every diff there will be checked and I'm not sure there are copyvios there (and specifically for new pages appear in the new pages feed). It can run on specific page (or other pywikibot generators) that way:
plagiabot.py -lang:test -page:"Los Angeles Lakers B" -pagetriagetag
(I did a minor fix to support it)

Note also that I never run the bot on test wiki, so the userconfig (~/.pywikibot/user-config.py) should updated to include also test (added now).

With the new long page (Los Angeles Lakers B), we can see the log there:
https://test.wikipedia.org/wiki/Special:Log?type=pagetriage-copyvio&user=&page=&wpdate=&tagfilter=

@eranroz FYI we've enabled -pagetriagetag for enwiki but haven't seen anything being reported yet.

Looks like reporting on enwiki is working now: https://en.wikipedia.org/wiki/Special:Log?type=pagetriage-copyvio

We can probably turn it off for test wiki now.

We can probably turn it off for test wiki now.

It's not running on testwiki anymore.

I'm moving this task into QA given that the code is working as intended in production.