Page MenuHomePhabricator

JPEG thumbnails being generated in a corrupted state with horizontal lines across them
Closed, ResolvedPublic

Description

I just uploaded some pictures, and most thumbnails have horizontal black lines. Original size is not affected.

Concrete exemples (includes one of mine) already reported here: https://commons.wikimedia.org/wiki/Commons:Upload_help#Thumbs_have_horizontal_lines

Many of the new files seem to be affected as well: https://commons.wikimedia.org/wiki/Special:NewFiles (interestingly, we don't see any horizontal line in most of these thumbnails, but if you choose randomly a picture, the bigger thumbnail depicted in the file's page will likely have these horizontal lines).

A few hours ago, I uploaded a picture without any problem.

Event Timeline

Fouky created this task.Aug 10 2016, 9:09 PM
Restricted Application added subscribers: Poyekhali, Steinsplitter, Aklapper. · View Herald TranscriptAug 10 2016, 9:09 PM
Fouky removed Fouky as the assignee of this task.Aug 10 2016, 9:15 PM
Josve05a triaged this task as Unbreak Now! priority.Aug 10 2016, 10:12 PM
Josve05a awarded a token.
Josve05a added a project: Operations.
Restricted Application added subscribers: Luke081515, TerraCodes. · View Herald TranscriptAug 10 2016, 10:12 PM
tomasz added a subscriber: tomasz.
Josve05a added a subscriber: Josve05a.EditedAug 10 2016, 10:17 PM
799pxoriginal
LinkLink
  1. Add new image (with this issue) to random Wikipedia article
  2. A "new thumb size" of that file is generated
  3. Note lines in that version of the file in preview
Restricted Application added a project: Multimedia. · View Herald TranscriptAug 10 2016, 10:44 PM

From server admin log
13:54 moritzm: depooling image scaler mw1298 for some local tests with huge SVGs
12:26 moritzm: depooling image scalers mw2086-mw2089 for reimaging with jessie

Restricted Application added a subscriber: Matanya. · View Herald TranscriptAug 10 2016, 10:45 PM
jrbs added a subscriber: jrbs.Aug 10 2016, 10:46 PM
greg moved this task from To Triage to Active Situation on the Wikimedia-Incident board.
greg added subscribers: Gilles, greg.

I've added Wikimedia-Incident as this is pretty bad.

@MoritzMuehlenhoff @Gilles help asap please. What can we revert to get us back to a good state?

After a fix has been made/patch reverted, we will need to purge/reparse all thumbnails generated during this time which may be "corrupted".

@Bawolff or @brion can either of you help out here, per chance?

Jdforrester-WMF renamed this task from Commons bug: Thumbnail generation with horizontal lines to Thumbnails being generated in a corrupted state with horizontal lines across them.Aug 10 2016, 11:03 PM

@Dereckson in T142638#2542157, depooling a scaler shouldn't matter, and mw2086-mw2089 supposedly are dormant.

I expect the current scalers are mw129[3-8].eqiad.wmnet, which are jessie since 2016-07-04

Mentioned in SAL [2016-08-10T23:17:08Z] <reedy@tin> rebuilt wikiversions.php and synchronized wikiversions files: Revert to .13 to attempt to fix T142638

OK, the rollback looks to have fixed this. Meh.

greg added a comment.Aug 10 2016, 11:20 PM

After a fix has been made/patch reverted, we will need to purge/reparse all thumbnails generated during this time which may be "corrupted".

Someone: How can we generate that list?

Yann removed a subscriber: Yann.Aug 10 2016, 11:26 PM

Mentioned in SAL [2016-08-10T23:40:50Z] <reedy@tin> Synchronized php-1.28.0-wmf.14/extensions/VipsScaler: Remove old broken config causing T142638 (duration: 00m 50s)

Update from IRC:

  • We rolled back the train (wmf.14) to wmf.13, which fixed the issue from getting worse.
  • We're running a maintenance script, PurgeChangedFiles, to delete the bad thumbnails.
  • We found that this happened because MediaWiki-extensions-VipsScaler (which handles JPEG scaling) over-rides configuration for image scaling with its own very poor defaults. In the new train of MediaWiki we changed the code that used that config, which meant it merged the config very poorly. This is why thumbnailing of JPEGs was broken but other types were fine.
  • We've removed the defaults from VipsScaler and showed that that worked on test. Now rolling the train back out.

Mentioned in SAL [2016-08-10T23:46:37Z] <reedy@tin> rebuilt wikiversions.php and synchronized wikiversions files: Reinstate .14 as T142638 is fixed

Jdforrester-WMF renamed this task from Thumbnails being generated in a corrupted state with horizontal lines across them to JPEG thumbnails being generated in a corrupted state with horizontal lines across them.Aug 10 2016, 11:47 PM
Jdforrester-WMF assigned this task to Reedy.
Jdforrester-WMF lowered the priority of this task from Unbreak Now! to High.
Reedy closed this task as Resolved.Aug 10 2016, 11:54 PM

This is confirmed fixed. If you come across any images with the problem, please purge them in the first instance

Mentioned in SAL [2016-08-10T23:59:03Z] <AaronSchulz> Running purgeChangedFiles.php on all wikis on a terbium screen (T142638)

Fouky removed a subscriber: Fouky.Aug 10 2016, 11:59 PM
Tgr added a subscriber: Tgr.Aug 11 2016, 12:32 AM
  • We're running a maintenance script, PurgeChangedFiles, to delete the bad thumbnails.

Which will only affect files that have been changed, not ones that have been manually purged or just did not have a thumbnail in the request size. I don't think there is a way to identify those though.

matmarex raised the priority of this task from High to Unbreak Now!.Aug 11 2016, 2:08 AM
  • We're running a maintenance script, PurgeChangedFiles, to delete the bad thumbnails.

Which will only affect files that have been changed, not ones that have been manually purged or just did not have a thumbnail in the request size. I don't think there is a way to identify those though.

Is there a way from Swift?

Other than that, page_touched value between some dates, and NS = NS_FILE?

If no thumbnail in the request size, aren't they just served the original anyway?

Gilles added a subscriber: fgiunchedi.EditedAug 11 2016, 11:39 AM

I don't know if we log things at the swift level that would let us know which images were pushed to swift during a certain period. @fgiunchedi might know. He's on vacation this week and back on Monday.

Varnish logs might be another place to look, namely purge requests and varnish backend misses. But I don't know if we keep a durable record of those. @BBlack might know

tomasz removed a subscriber: tomasz.Aug 11 2016, 11:40 AM
Gilles added a subscriber: BBlack.Aug 11 2016, 12:12 PM

I don't know if we log things at the swift level that would let us know which images were pushed to swift during a certain period. @fgiunchedi might know. He's on vacation this week and back on Monday.

we keep swift access logs on ms-fe machines, though only for 5 days given their size :(