It starts every 15min from cron. Back in 2015 it took only a minute or two to complete, even if it was down for an hour and all files were in need of refreshing SVGs it wouldn't take more than 5min.
Now, three major things have changed:
- More traffic overall.
- Switched from HHVM/Xenon to PHP7/Excimer, during which we decided to significant shorten the sampling interval (about 10X more data now).
- Parsoid/JS was ported to Parsoid/PHP and the new /w/rest.php entry point from the wtp* servers receive quite a significant amount of traffic (about 7-10% overall in total).
All that adds up to the .log files being much larger to begin with and thus take longer to process.
[17:00 UTC] krinkle at webperf1002.eqiad.wmnet in /srv/xenon/svgs/daily $ ps aux | grep generate xenon 15805 0.0 0.0 4276 712 ? Ss 16:45 0:00 /bin/sh -c /usr/local/bin/arclamp-generate-svgs > /dev/null xenon 15807 0.6 0.0 14176 6108 ? S 16:45 0:06 /bin/bash /usr/local/bin/arclamp-generate-svgs krinkle 18146 0.0 0.0 12780 980 pts/0 S+ 17:01 0:00 grep --color=auto generate xenon 27138 0.0 0.0 4276 720 ? Ss 17:00 0:00 /bin/sh -c /usr/local/bin/arclamp-generate-svgs > /dev/null xenon 27139 1.1 0.0 14168 5948 ? S 17:00 0:00 /bin/bash /usr/local/bin/arclamp-generate-svgs
Here you can see that a second cron started at 17:00 while the previous was still running.
By itself this doesn't conflict per se, given that we check the mtime within the loop (not outside) and it merely means we get parallel processing, which isn't bad per se.
Although 1) it makes it harder to predict how it behaves given we don't control the concurrency explicitly, and 2) I believe the script has a small potential for both to be chrunching the same log file, which we should avoid at the very least.