Page MenuHomePhabricator

arclamp-generate-svgs overlaps itself
Closed, DuplicatePublic

Description

It starts every 15min from cron. Back in 2015 it took only a minute or two to complete, even if it was down for an hour and all files were in need of refreshing SVGs it wouldn't take more than 5min.

Now, three major things have changed:

  • More traffic overall.
  • Switched from HHVM/Xenon to PHP7/Excimer, during which we decided to significant shorten the sampling interval (about 10X more data now).
  • Parsoid/JS was ported to Parsoid/PHP and the new /w/rest.php entry point from the wtp* servers receive quite a significant amount of traffic (about 7-10% overall in total).

All that adds up to the .log files being much larger to begin with and thus take longer to process.

[17:00 UTC] krinkle at webperf1002.eqiad.wmnet in /srv/xenon/svgs/daily
$ ps aux  | grep generate
xenon    15805  0.0  0.0   4276   712 ?        Ss   16:45   0:00 /bin/sh -c /usr/local/bin/arclamp-generate-svgs > /dev/null
xenon    15807  0.6  0.0  14176  6108 ?        S    16:45   0:06 /bin/bash /usr/local/bin/arclamp-generate-svgs
krinkle  18146  0.0  0.0  12780   980 pts/0    S+   17:01   0:00 grep --color=auto generate
xenon    27138  0.0  0.0   4276   720 ?        Ss   17:00   0:00 /bin/sh -c /usr/local/bin/arclamp-generate-svgs > /dev/null
xenon    27139  1.1  0.0  14168  5948 ?        S    17:00   0:00 /bin/bash /usr/local/bin/arclamp-generate-svgs

Here you can see that a second cron started at 17:00 while the previous was still running.

By itself this doesn't conflict per se, given that we check the mtime within the loop (not outside) and it merely means we get parallel processing, which isn't bad per se.

Although 1) it makes it harder to predict how it behaves given we don't control the concurrency explicitly, and 2) I believe the script has a small potential for both to be chrunching the same log file, which we should avoid at the very least.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

The older instance was done after about ~ 20 minutes, so by 17:05 it was back to just one:

$ ps aux  | grep generate
krinkle   4998  0.0  0.0  12780   968 pts/0    S+   17:05   0:00 grep --color=auto generate
xenon    27138  0.0  0.0   4276   720 ?        Ss   17:00   0:00 /bin/sh -c /usr/local/bin/arclamp-generate-svgs > /dev/null
xenon    27139  0.2  0.0  14168  5948 ?        S    17:00   0:00 /bin/bash /usr/local/bin/arclamp-generate-svgs

From what I can see, it didn't needlessly regenerate files that were already finished by the old instance unless they changed again since. So all good for now. This task is as TODO for making sure two instances can't process the same file, and maybe to limit in some way how much concurrency we want to allow.

I think the consensus is that this should be running via a better resource scheduler than cron in a VM.

This is being tracked in T227026; the first step is moving to Swift for storing the data.