Create cronjob for regular captcha regeneration
Closed, ResolvedPublic

Description

https://wikitech.wikimedia.org/wiki/Generating_CAPTCHAs

Running the following in a cronjob should be mostly trivial

sudo -u apache mwscript extensions/ConfirmEdit/maintenance/GenerateFancyCaptchas.php aawiki --wordlist=/home/aaron/words --font=/usr/share/fonts/truetype/freefont/FreeMonoBoldOblique.ttf --blacklist /home/aaron/badwords --fill=10000 --verbose
Reedy created this task.Nov 4 2016, 4:42 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 4 2016, 4:42 PM
Reedy added a parent task: Restricted Task.Nov 4 2016, 4:43 PM

Change 319892 had a related patch set uploaded (by Reedy):
Add cronjob for regenerating captchas

https://gerrit.wikimedia.org/r/319892

greg awarded a token.Nov 4 2016, 4:58 PM
Florian added a subscriber: Florian.
Reedy added a comment.Nov 4 2016, 7:07 PM

Related to the other tasks.. T125132 and T141490

Is a fill of 10,000 enough?

Do we need to delete captchas somehow? If so, how do we do so?

https://gerrit.wikimedia.org/r/#/c/319900 and https://gerrit.wikimedia.org/r/#/c/319901 need merging into the extension before this can be merged too

Reedy added a subscriber: aaron.Nov 16 2016, 2:53 PM

@aaron Do we need to delete the old ones etc too?

aaron added a comment.Nov 16 2016, 6:37 PM

AFAIK this only makes sense when either using:
a) The delete-on-solve config variable (we'd have to make sure the cron runs often enough).
b) The script is modified to list the storage (destination) files and later delete them after making the new ones. In this case, we wouldn't use --fill though.

Reedy added a comment.Nov 16 2016, 7:21 PM

I believe the original idea was in an attempt to defeat the people that would have the current list precomputed, and could defeat them this way

The fact that they've not been regenerated in a while makes this easier for them

When Tim's changes are decided "ok" by the community, we will want to regenerate with the newer python script

T150049 is for delete on solve. Making it run much more regularly on the cronjob is just a case of choosing the frequency and adjusting the puppet definition

Will need some stats of how quickly/frequently they are used, to work that out. Or we just run daily with --fill. And/or we precomputed even more than 10k too

So basically, for the time being, I need to modify the script to optionally delete the old ones after its successfully created the need ones

Reedy added a comment.Nov 21 2016, 8:59 PM

T151244 and https://gerrit.wikimedia.org/r/322735 for adding a --delete option to delete all the current catpchas after putting new ones in place

fgiunchedi triaged this task as "Normal" priority.Nov 29 2016, 11:57 PM

Change 319892 merged by Dzahn:
mediawiki: Add cronjob for regenerating captchas

https://gerrit.wikimedia.org/r/319892

Change 327059 had a related patch set uploaded (by Dzahn):
mediawiki: disable 'generate captcha' maintenance job

https://gerrit.wikimedia.org/r/327059

Dzahn added a subscriber: Dzahn.Dec 13 2016, 9:39 PM

merged, has been created on terbium (not created on wasat as configured).

follow-up https://gerrit.wikimedia.org/r/#/c/327057/

disable for now until Jan 2017 https://gerrit.wikimedia.org/r/#/c/327059/

Change 327059 merged by Dzahn:
mediawiki: disable 'generate captcha' maintenance job

https://gerrit.wikimedia.org/r/327059

Ok, puppet code merged and done for now. We have the desired situation now, which is:

  • both terbium and wasat have the word files in /etc/fancycaptcha/ (so it can be used for manual runs of the maintenance script and they never hurt)
  • puppet code for cron exists but is disabled
    • on terbium because hieradata/role/eqiad/mediawiki/maintenance.yaml says so and we want to activate it after January 1st 2017 per Reedy
    • on wasat because hieradata/role/codfw/mediawiki/maintenance.yaml says so for _all_ the maintenance cron jobs unless we switch over data centers
    • (none of this is hardcoded by hostname, but strictly per role and dc)

@Reedy go ahead ^ /me sets reminder for Jan 2nd (?)

Change 330250 had a related patch set uploaded (by Reedy):
Revert "mediawiki: disable 'generate captcha' maintenance job"

https://gerrit.wikimedia.org/r/330250

Change 330250 merged by Dzahn:
Revert "mediawiki: disable 'generate captcha' maintenance job"

https://gerrit.wikimedia.org/r/330250

Mentioned in SAL (#wikimedia-operations) [2017-01-03T17:44:40Z] <mutante> terbium - Notice: /Stage[main]/Mediawiki::Maintenance::Generatecaptcha/Cron[generatecaptcha]/ensure: created(T150029)

Reedy closed this task as "Resolved".Feb 9 2017, 8:52 PM
Reedy claimed this task.