bzimport added a project: Wikimedia-Media-storage.Via ConduitNov 22 2014, 3:17 AM
bzimport added a subscriber: wikibugs-l.
bzimport set Reference to bz64622.
Yann created this task.Via LegacyApr 29 2014, 7:30 PM
Billinghurst added a comment.Via ConduitApr 29 2014, 11:33 PM

Comments:

  • Purging files at Commons has no effect
  • Clicking the "Other resolutions:" gives the error

Error generating thumbnail
Error creating thumbnail: File missing

  • Full image appears to display okay

https://upload.wikimedia.org/wikipedia/commons/thumb/4/4d/Revue_des_Deux_Mondes_-_1843_-_tome_3.djvu/page970-2840px-Revue_des_Deux_Mondes_-_1843_-_tome_3.djvu.jpg

gerritbot added a comment.Via ConduitApr 30 2014, 6:53 AM

Change 130563 had a related patch set uploaded by Aaron Schulz:
Removed "GetLocalFileCopy" pool counter entry

https://gerrit.wikimedia.org/r/130563

gerritbot added a comment.Via ConduitApr 30 2014, 6:53 AM

Change 130563 merged by jenkins-bot:
Removed "GetLocalFileCopy" pool counter entry

https://gerrit.wikimedia.org/r/130563

Aklapper added a comment.Via ConduitApr 30 2014, 2:54 PM

Aaron: You are fast. Thank you!

Yann added a comment.Via ConduitMay 3 2014, 3:38 PM

Similar issue again now: I get a message "Error generating thumbnail

As an anti-spam measure, you are limited from performing this action too many times in a short space of time, and you have exceeded this limit. Please try again in a few minutes."

in at least about one in every 3 pages.

Nemo_bis added a comment.Via ConduitMay 3 2014, 6:13 PM
  • Bug 64801 has been marked as a duplicate of this bug. ***
Nemo_bis added a comment.Via ConduitMay 3 2014, 6:14 PM

Changing summary; the error is widespread across all sorts of users of Commons.

Aklapper added a comment.Via ConduitMay 3 2014, 6:19 PM

555: Resetting blocker and immediate; see [[mw:Bugzilla/Fields#Priority]]

bzimport added a comment.Via ConduitMay 3 2014, 6:22 PM

mail wrote:

I as well get currently frequent error 500's after requesting a thumbnail image:
"Error generating thumbnail - As an anti-spam measure, you are limited from performing this action too many times in a short space of time, and you have exceeded this limit. Please try again in a few minutes."
This happens quite fast (I requested perhaps around 100 thumbnails in the last few hours). But it also resolves quite fast. Retrying it shortly after, usually results in a OK 200.

Yann added a comment.Via ConduitMay 4 2014, 3:54 PM

It seems this is getting more and more frequent. When is a fix expected? Thanks.

Ciencia_Al_Poder added a comment.Via ConduitMay 4 2014, 3:59 PM

This seems to be hitting $wgRateLimits['renderfile']. See [1]. Those rate limits are disabled by default, so maybe WMF has set them up recently.


[1] https://www.mediawiki.org/wiki/Manual:$wgRateLimits

Nemo_bis added a comment.Via ConduitMay 4 2014, 5:20 PM

(In reply to Jesús Martínez Novo (Ciencia Al Poder) from comment #13)

This seems to be hitting $wgRateLimits['renderfile']. See [1]. Those rate
limits are disabled by default, so maybe WMF has set them up recently.

$ git blame InitialiseSettings.php | grep -A 4 renderfile
c78a54c9 (Aaron Schulz 2013-10-16 16:14:35 -0700 6390) 'renderfile' => array(
02f3863a (Aaron Schulz 2014-01-21 12:40:42 -0800 6391) 1400 new thumbnails per minute
02f3863a (Aaron Schulz 2014-01-21 12:40:42 -0800 6392) 'ip' => array( 700, 30 ),
02f3863a (Aaron Schulz 2014-01-21 12:40:42 -0800 6393) 'user' => array( 700, 30 ),
c78a54c9 (Aaron Schulz 2013-10-16 16:14:35 -0700 6394) ),
9643d682 (Aaron Schulz 2014-04-21 09:30:53 -0700 6395) 'renderfile-nonstandard' => array(
9643d682 (Aaron Schulz 2014-04-21 09:30:53 -0700 6396)
140 new thumbnails per minute
9643d682 (Aaron Schulz 2014-04-21 09:30:53 -0700 6397) 'ip' => array( 70, 30 ),
9643d682 (Aaron Schulz 2014-04-21 09:30:53 -0700 6398) 'user' => array( 70, 30 ),
9643d682 (Aaron Schulz 2014-04-21 09:30:53 -0700 6399) ),

555 added a comment.Via ConduitMay 5 2014, 11:47 AM

For what reason an experienced developer was set such very low limit in an environment with the size of Wikimedia, with each category view listing tons of media files and in the exact time that an international upload contest (Wiki Loves Earth) is running??

bzimport added a comment.Via ConduitMay 5 2014, 6:46 PM

wieralee wrote:

It makes our work on wikisource.pl twice slower. It crashes our work :-(
% of proofread pages loads without the scans. Very, very tiring.

bzimport added a comment.Via ConduitMay 5 2014, 6:47 PM

wieralee wrote:

(In reply to wieralee from comment #16)
40 %

555 added a comment.Via ConduitMay 5 2014, 9:49 PM

50 hours since the initial report and no single action directly related on fixing it.

Why the change that is *broking all Wikisource wikis* (we *really* rely on ProofreadPage and ProofreadPage relies on image resing!) isn't simply reverted until a sysadmin found the desidered setup? A config intended only to optimize server usage (I'm unable to found any report mentioning that this change is really needed at this moment) is really necessary if it breakes features that are working for years?

hashar added a comment.Via ConduitMay 6 2014, 8:46 AM

From a mail sent to MediaWiki core list:

By looking at the udp2log limiter.log file, the renderfile-nonstandard
limit is reached by:

$ fgrep renderfile limiter.log |cut -d\: -f4|sort|uniq -c|sort -n

378  10.64.0.168 tripped! mediawiki
405  10.64.0.167 tripped! mediawiki
476  10.64.32.92 tripped! mediawiki
498  10.64.16.150 tripped! mediawiki

$

They are the media server frontends ms-fe1001 to ms-fe1004. We probably
want to restrict the end user IP instead.

I suspect the media servers are not properly passing the X-Forwarded-For
header down to the thumbnail renderer. Seems the logic is in
operations/puppet.git file ./files/swift/SwiftMedia/wmf/rewrite.py

Would need someone with more informations about Swift/Thumb handling
than me :-(


I have poked Faidon about it, the X-Forwarded-For headers seems to be passed by the Swift proxies, we need their IP to be trusted by MediaWiki.

gerritbot added a comment.Via ConduitMay 6 2014, 8:48 AM

Change 131669 had a related patch set uploaded by Hashar:
Trust Swift proxies XFF headers

https://gerrit.wikimedia.org/r/131669

gerritbot added a comment.Via ConduitMay 6 2014, 8:49 AM

Change 131670 had a related patch set uploaded by Faidon Liambotis:
Add Swift frontends to squid.php

https://gerrit.wikimedia.org/r/131670

gerritbot added a comment.Via ConduitMay 6 2014, 8:52 AM

Change 131669 abandoned by Hashar:
Trust Swift proxies XFF headers

Reason:
Abandoned in favor of Faidon change https://gerrit.wikimedia.org/r/#/c/131670/

https://gerrit.wikimedia.org/r/131669

gerritbot added a comment.Via ConduitMay 6 2014, 8:54 AM

Change 131671 had a related patch set uploaded by Hashar:
Mention ms-fe servers need to be XFF trusted by MW

https://gerrit.wikimedia.org/r/131671

gerritbot added a comment.Via ConduitMay 6 2014, 8:56 AM

Change 131670 merged by jenkins-bot:
Add Swift frontends to squid.php

https://gerrit.wikimedia.org/r/131670

gerritbot added a comment.Via ConduitMay 6 2014, 8:56 AM

Change 131671 merged by Faidon Liambotis:
Mention ms-fe servers need to be XFF trusted by MW

https://gerrit.wikimedia.org/r/131671

faidon added a comment.Via ConduitMay 6 2014, 9:07 AM

Hashar was correct in identifying the root cause. This was a long-standing (~2 years) configuration error that in combination with the recent per-IP thumb limits broke generation for many users.

The above changes have been merged and deployed, so this should be working for everyone now. The logs suggest so, but let's give it some time..

TheDJ added a comment.Via ConduitMay 6 2014, 9:39 AM

Can we do anything to make the cause of such incidents more easily visible/debuggable in the future ?

TheDJ added a comment.Via ConduitMay 6 2014, 9:40 AM

perhaps including the IP being limited in the error ?

Aklapper added a comment.Via ConduitMay 6 2014, 11:40 AM

Hashar / Faidon: Thanks for your work and investigation!

Yann added a comment.Via ConduitMay 6 2014, 5:40 PM

Works for me now.
However, as 555 said, I wish that such an issue which breaks all Wikisource work, to be better handled in the future. Thanks for fixing this.

hashar added a comment.Via ConduitMay 6 2014, 8:55 PM

Derk-Jan Hartman: to customize the error message, I guess you want to fill another bug :-) We be easier to handle.

Yann Forget: the bug did get escalated to the mw-core weekly meeting (Monday 10pm UTC). Got fixed whenever we managed to wake up. If the issue is critical, your best bet is to raise it on wikitech-l which most people with cluster access read even during week-ends.

If there is no more suspicious entries in limiter.log, I guess we can mark this bug as fixed finally.

hashar added a comment.Via ConduitMay 7 2014, 9:30 AM

I posted a rather long postmortem describing:

  • the timeline for the resolution
  • the root cause analysis and how we caused the issue
  • suggestion improvements

http://lists.wikimedia.org/pipermail/mediawiki-core/2014-May/000068.html

The media servers are no more limited according to limiter.log. Whitelisting them as trusted XFF solved the issue.

Gilles added a project: Multimedia.Via WebDec 4 2014, 9:26 AM
Gilles moved this task to Closed on the Multimedia workboard.Via WebDec 4 2014, 10:10 AM

Add Comment

Column Prototype
This is a very early prototype of a persistent column. It is not expected to work yet, and leaving it open will activate other new features which will break things. Press "\" (backslash) on your keyboard to close it now.