Page MenuHomePhabricator

ThumbnailRender jobs not including the width in the fetch URL
Closed, ResolvedPublic

Description

Starting on 2018-08-01, ThumbnailRender jobs started failing because the transformation width stopped being included in the fetch URL even though they are being as seen in logstash.

Event Timeline

mobrovac created this task.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

A quick investigation of merged commits for includes/jobqueue/jobs/ThumbnailRender.php, includes/file/File.php and related includes/media/* files (which are used in the generation of the URL) did not turn up any recent changes.

Weird. ImageHandler::makeParamString would throw an exception if the width wouldn't be specified, and that's not happening.

Where did you find out the reason is specifically that the width is not included in the final request URI? I can't find this info anywhere

The logstash errors (the error field, not the message) are all along the lines of ThumbnailRenderJob::run: incorrect HTTP status 404 when hitting //ms-fe.svc.eqiad.wmnet/wikipedia/commons/thumb/1/1c/Foo.tif/-Foo.tif.png

[...] //ms-fe.svc.eqiad.wmnet/wikipedia/commons/thumb/1/1c/Foo.tif/-Foo.tif.png

... which should be //ms-fe.svc.eqiad.wmnet/wikipedia/commons/thumb/1/1c/Foo.tif/${width}-Foo.tif.png. Kudos to @fgiunchedi for noticing the missing piece.

Weird. ImageHandler::makeParamString would throw an exception if the width wouldn't be specified, and that's not happening.

Weird indeed. I went through all the code paths that generate the file name and couldn't find one that would either omit or silently drop the width from the file name.

The problem persists, btw. There were 8k+ failures in the last 24h.

Dinoguy1000 renamed this task from ThambnailRender jobs not including the width in the fetch URL to ThumbnailRender jobs not including the width in the fetch URL.Aug 7 2018, 11:13 AM

One more data point: most of these failures come from wmf-15 code (less than 1% of the messages come from wmf-14 and wmf-13 combined), which seems to suggest a change in wmf-15 is causing this. Alas, after going through the diff, I couldn't find any (obvious) candidates.

Running the job manually from mwscript works fine. Does this failure happen on every job run or just some fraction of the attempts?

@Tgr

Does this failure happen on every job run or just some fraction of the attempts?

According to Grafana we've got ~1 job/s so it's almost 1000 jobs per 15 minutes for TumbnailRender job, according to kibana we get 50 failures per 15 minutes - so no, not all of them fail definitely.

Interesting fact is that trying to do the failed job again manually seems to succeed:

curl -i -XPOST -H 'content-type: application/json' --data '{"database": "commonswiki", "mediawiki_signature": "82391e33b714f7fc1408c298730cb2a6cc25cb75e5cb16e51b38dcc1cbf5fdda", "meta": {"domain": "commons.wikimedia.org", "dt": "2018-08-07T17:41:45+00:00", "id": "2784b115-9a69-11e8-86e2-14187761316a", "request_id": "W2nZ1gpAAEMAAFSG1csAAACT", "schema_uri": "mediawiki/job/2", "topic": "mediawiki.job.ThumbnailRender", "uri": "https://commons.wikimedia.org/wiki/File:BSicon_lINTACC-M.svg"}, "page_namespace": 6, "page_title": "File:BSicon_lINTACC-M.svg", "params": {"requestId": "W2nZ1gpAAEMAAFSG1csAAACT", "transformParams": {"width": 1280}}, "type": "ThumbnailRender"}' https://jobrunner.discovery.wmnet/rpc/RunSingleJob.php

HTTP/2 200

Change 451332 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/core@master] Add temporary debug logging to ThumbnailRenderJob

https://gerrit.wikimedia.org/r/451332

Change 451332 abandoned by Gergő Tisza:
Add temporary debug logging to ThumbnailRenderJob

Reason:
should be done on a deploy branch instead

https://gerrit.wikimedia.org/r/451332

Change 451663 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/core@wmf/1.32.0-wmf.16] Add temporary debug logging to ThumbnailRenderJob

https://gerrit.wikimedia.org/r/451663

Change 451666 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[operations/mediawiki-config@master] Add temporary debug logging to ThumbnailRenderJob

https://gerrit.wikimedia.org/r/451666

Change 451663 merged by Niharika29:
[mediawiki/core@wmf/1.32.0-wmf.16] Add temporary debug logging to ThumbnailRenderJob

https://gerrit.wikimedia.org/r/451663

Mentioned in SAL (#wikimedia-operations) [2018-08-09T18:58:11Z] <niharika29@deploy1001> Synchronized php-1.32.0-wmf.16/includes/jobqueue/jobs/ThumbnailRenderJob.php: Add temporary debug logging T201305 (duration: 01m 11s)

Change 451666 merged by jenkins-bot:
[operations/mediawiki-config@master] Add temporary debug logging to ThumbnailRenderJob

https://gerrit.wikimedia.org/r/451666

Mentioned in SAL (#wikimedia-operations) [2018-08-09T19:02:18Z] <niharika29@deploy1001> Synchronized wmf-config/InitialiseSettings.php: Add temporary debug logging to ThumbnailRenderJob T201305 (duration: 00m 57s)

Change 450033 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/core@master] ThumbnailRenderJob: normalize parameters before generating thumb URL

https://gerrit.wikimedia.org/r/450033

Change 450033 merged by jenkins-bot:
[mediawiki/core@master] ThumbnailRenderJob: normalize parameters before generating thumb URL

https://gerrit.wikimedia.org/r/450033

Pchelolo claimed this task.

After the fix was deployed, the original issue have disappeared from the logs, although now we still have some occasional timeouts, which are retried, and sometimes we get 429 from Thumbor.

I will close this task and open a new one regarding 429 responses as it seems unrelated