Running this on Terbium would probably give mw:thumbor access to the containers of all private wikis:
For Beta I'm going to use Wikisource for testing: https://upload.beta.wmflabs.org/wikisource/en/thumb/6/62/Wind_in_the_Willows_%281913%29.djvu/page1-862px-Wind_in_the_Willows_%281913%29.djvu.jpg
legacy HTTP/1 UAs may suffer due to UA limits
It looks like this might be achievable with a combination of Apache, MPM prefork and mod_wsgi. Apache would run a fixed amount of child processes (eg. as many as there are cores), not use worker threads, and mod_wsgi would run a copy of the thumbor app in each process. Number or processes capped. When it's reached, Apache should be waiting until the next available process to treat the next request.
Mon, Feb 19
I can't find the definition of last visual change, does it include things below the fold?
Now that HTTP/2 is a thing and Zero won't be anymore, could we put connection coalescing for upload.wikimedia.org back on the table?
I'll add a note to T123582: Use "preconnect" resource hint for thumbnail host to state that the header variant should be used if this gets implemented.
Wed, Feb 14
The problem is that the mechanics of EventLogging aren't collecting just aggregate counts, they're keeping individual records about requests. If the virtual pageviews were being sent to a backend that only keeps aggregate counts, incrementing on each hit, then that might be fine. You essentially want to turn a blind eye to the detailed data EventLogging records because you won't use it, but it's there, it's being recorded. The definition of DNT is loose enough that people keep making compromises to them. But IMHO in this case if you want to record virtual pageviews and only care about aggregate counts, you should be sending the data to a backend that only records aggregates.
- Use a separate Swift user for private containers access. This will allow avoiding access leak mistakes where a new private wiki is created and the whitelist in puppet not updated.
- Check that the wiki creation maintenance script doesn't give access to the regular Thumbor user when it's a private wiki being created. Instead, give access to the private-specific thumbor user
Tue, Feb 13
Fix copyright year and variable name
I've confirmed via testing on Vagrant that this phenomenon is real.
I think this concludes the review, I've filed a task for the real solution to that problem: https://phabricator.wikimedia.org/T187203
OK, it's definitely not the nginx timeouts. 404s and 200s are affected, now that I can see 200s. And I'm seeing now that it's consistently 59 seconds, not 60 seconds.
And another one where it's a minute: https://logstash.wikimedia.org/app/kibana#/doc/logstash-*/logstash-2018.02.13/logback?id=AWGOuxL5WMYqG9UiQ1bY&_g=(refreshInterval:('$$hashKey':'object:5953',display:'5%20seconds',pause:!f,section:1,value:5000),time:(from:now-15m,mode:quick,to:now))
Now that 200s log the Thumbor-Request-Date, I've found an example of a 200 with 40s of delay between nginx and thumbor: https://logstash.wikimedia.org/app/kibana#/doc/logstash-*/logstash-2018.02.13/logback?id=AWGOuA7cWMYqG9UiQxzA&_g=()
This cleanup is no longer necessary, as thumb.php requests are now proxied to Thumbor on public wikis.
Fixed on Beta, I has just modified the wrong PrivateSettings.php
Mon, Feb 12
Deployment was successful on prod, seems to have broken thumb.php on Beta. I'll look into that tomorrow. Not high priority because thumb.php requests are highly unusual because they need to be manually crafted (and would thus be even more unusual on Beta).
I'm tempted to bump the only 2 nginx timeouts that currently default to 60s (we override proxy_read_timeout to 180s):
Actually doesn't show up for 200s yet, but I think that's because the swift proxy filtering kicks in only in this case (which is interesting in itself - error responses aren't subject to the header filtering).