Page MenuHomePhabricator

Thumbnail scaling broken on beta
Closed, ResolvedPublic

Error retrieving thumbnail from scaling server: couldn't connect to host

Seems to affect all thumbnails of local files unless there is a cached thumbnail for that size.

Event Timeline

Tgr raised the priority of this task from to Needs Triage.
Tgr updated the task description. (Show Details)
Tgr added a subscriber: Tgr.

@Tgr: do you know where this might be breaking down? We recently rebuilt the tmh host in Beta (deleted the deployment-videoscaler01 instance, rebuilt as deployment-tmh01 on Trusty/HHVM, see: T110707), if that has any effect?

Tgr claimed this task.

So apparently a thumbnail request on goes like this: deployment-cache-upload04 (dual layer of varnish) -> deployment-upload ngnix -> deployment-upload 404 handler at /data/project/upload7/scripts/thumb-handler.php -> curl request to thumb.php on some hardcoded IP which once belonged to deployment-cache-text02 (which is another dual layer of varnish) -> pass to thumb.php on one of the deployment-mediawiki instances.

So this is a bit fragile (and now I remember going through the same debugging steps a while ago and opening T84950 about it). For now, I fixed the immediate problem by updating the IP address for the text varnish. It would be nice to put the customized thumb handler script in version control so that the next person googling for the error message can save themselves an hour of config file digging (and also because accidental deletion of the file would be a big problem right now). Do you know who would be the right person to ask to look through deployment-upload:/data/project/upload7/scripts/ and and tell if there is anything private in there? (There is a readme file there saying "Refer to Antoine/Ariel for more details" so that might be a good start.) At a glance I don't see passwords or anything like that.