Page MenuHomePhabricator

InstantCommons should cache remote images
Open, MediumPublic

Description

Per T145496, caching of remote images (from Commons) has been disabled permanently. The side effect of this is that page load time can increase from ~800ms to over 22000ms, because files have to be fetched from Commons. This causes problems for non-Wikimedia setups and I could also imagine it affects Wikimedia's cache_upload cluster.

Unless someone could explain why it is very important to keep caching disabled and just accept this performance hit, the requested solution: would be reverting https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/336675/

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Performance has no opposition to enabling caching here for third parties. Makes sense to me. I've tagged CPT to further look at what the InstantCommons feature's needs are and whether it is functionally safe to re-enable, and what other (not perf-related) reasons there may be for keeping it off.

If this is expected to impact prod traffic for WMF (besides reducing traffic), feel free to move back to our Inbox.

So, it seems to me and @CCicalese_WMF that this has impact on our Commons servers, and would be helpful for remote MediaWiki sites using Instant Commons. It's not clear why turning off caching was helpful for anyone. @Tgr can you clear this up for us? I think we'd like to see this improved for remote sites and for Commons.

To quote the commit message of that patch,

There is no point in local thumb caching when we set apibase and thumbUrl to the remote wiki. It will only confuse MediaWiki into creating local thumbnails for certain LinksUpdate tasks even though the rendered page will reference thumbnails hosted on Commons.

@Tgr that sounds like a good reason to decline this request to revert https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/336675/. Is there an alternative that would reduce load time from Commons on non-Wikimedia wikis without introducing the issue you describe?

Why would be the load time larger for non-Wikimedia wikis than Wikimedia ones? And why would it be larger than loading from your own wiki (which probably has a less performant image server than our large and somewhat geographically distributed reverse proxy farms)? I expect that's either a misconfiguration or a misunderstanding (ie. the initial rendering of thumbnails that don't exist yet in Swift can take long, but there isn't really a way around that).

Thank you for the additional information, @Tgr. @Southparkfan, based upon this information, I am going to decline this task.

I feel my point was completely missed, regardless of the cause or fix: this change has a performance impact of roughly 28x. And since I have new questions, reopening.

To quote the commit message of that patch,

There is no point in local thumb caching when we set apibase and thumbUrl to the remote wiki. It will only confuse MediaWiki into creating local thumbnails for certain LinksUpdate tasks even though the rendered page will reference thumbnails hosted on Commons.

Eh? I may not understand this, but with caching disabled I definitely see more and more traffic going to Wikimedia servers, on our MediaWiki servers. If hitting the Wikimedia services is completely unnecessary there is a serious bug somewhere, regardless of the cause (be it Miraheze or MediaWiki).

Why would be the load time larger for non-Wikimedia wikis than Wikimedia ones?

Because traffic goes internally for you (even more so if you are hitting a DC also hosting the Swift cluster) instead of third-party -> Wikimedia. Consider this MTR:

 Host                                Loss%   Snt   Last   Avg  Best  Wrst StDev
 1. ???
 2. 2001:978:2:6e::19:1               0.0%    72    1.9   9.4   1.0  86.2  14.5
 3. te0-2-0-1.rcr21.rtm01.atlas.coge  0.0%    72    1.8  11.3   1.3  97.5  20.1
 4. be3385.ccr42.ams03.atlas.cogentc 19.4%    72    2.8  12.7   2.1 165.1  25.9
 5. be2440.agr21.ams03.atlas.cogentc 11.1%    72    2.9   7.6   2.1  47.8   9.6
 6. 2001:978:2:2c::66:2               0.0%    72    2.1  17.1   1.9 558.3  65.8
 7. ams13-peer-1.ae5-unit0.tele2.net  0.0%    72    2.8   9.5   2.1 104.8  15.7
 8. 2a00:800:0:10::3:2                0.0%    72   35.8  25.5  13.0 171.1  26.6
 9. ae1-403.cr2-esams.wikimedia.org   0.0%    72   10.2  12.6   3.8  55.6  12.2
10. text-lb.esams.wikimedia.org       0.0%    71    3.8  14.6   3.7 150.4  24.1

compared to our current file server that is also hosted somewhere else (reasons for that are out of scope for this ticket, though I agree you would have this in-DC ideally):

 Host                                Loss%   Snt   Last   Avg  Best  Wrst StDev
 1. ???
 2. te0-0-2-3.nr11.b061555-0.rtm01.a  0.0%    71    1.0   3.9   0.8  59.4   8.7
 3. te0-5-0-17.rcr21.rtm01.atlas.cog  0.0%    71    1.1   3.1   1.0  66.0   8.0
 4. be3384.ccr41.ams03.atlas.cogentc  5.6%    71    2.4   3.6   1.9  21.5   3.3
 5. be2519.rcr21.b015960-1.ams03.atl  0.0%    71    2.9   4.8   2.5  42.8   7.0
 6. ams-5-a9.nl.eu                    0.0%    70    4.2   5.5   3.6  29.0   4.3
 7. be104.rbx-g2-nc5.fr.eu            0.0%    70    9.5  14.5   9.3  54.4   9.4
 8. be100-1042.ldn-5-a9.uk.eu         0.0%    70   12.4  13.9  12.3  25.9   2.5
 9. be101.lon1-eri1-g2-nc5.uk.eu      0.0%    70   12.8  15.6  12.6 136.3  14.8
10. ???
11. ???
12. ???
13. ???
14. ns3102680.ip-54-36-165.eu         0.0%    70   12.4  13.2  12.3  22.8   1.7

and our old file server, ie the one used when I reported this, was inside our own DC, which has even better performance:

Host                                Loss%   Snt   Last   Avg  Best  Wrst StDev
1. ???
2. <old filesrv>                     0.0%    70    0.5   1.3   0.3  13.9   1.9

For the record, this MediaWiki server is hosted in NL, just like esams. Only about 80 km away. There is a lot of extra work to be carried out when requesting thumbnails a thousand times instead of just one time and serving the cached result.

And why would it be larger than loading from your own wiki (which probably has a less performant image server than our large and somewhat geographically distributed reverse proxy farms)?

That is an assumption. In our case we definitely do not have the best storage infrastructure, but in that case the buck is on us. We cannot control Wikimedia's infrastructure. Also, if our image servers are less performant than Wikimedia's (which may be excellent I/O, CPU and memory wise, but not so much latency wise, as you can see above), why do I see a huge performance improvement after enabling caching? When reading this I expect an increase in response time, not decrease.

I expect that's either a misconfiguration or a misunderstanding

That's definitely possible.

I definitely see more and more traffic going to Wikimedia servers, on our MediaWiki servers.

That does sound like a bug, yes. Can you tell what exactly that traffic is? Do thumbnails on your webpage link to Commons or to your wiki?

Hi, yes some of our users wikis link to commons thumbnails.

For the traffic question i'll need to ask around.

So, it seems to me and @CCicalese_WMF that this has impact on our Commons servers, and would be helpful for remote MediaWiki sites using Instant Commons. It's not clear why turning off caching was helpful for anyone.

I don't know why it was done, but I know one reason to do it: legal implications. If a 3rd party wiki pulls an image from commons under the assumption that it's under a free license, but it's not and because of that it gets deleted from commons, it would also have to be deleted from the 3rd party site. Afaik, we have no mechanism for that. If the 3rd party site is sued and found liable, they may hold WMF accountable.

At least, that's my understanding as I remember it from discussions in the past.

daniel triaged this task as Medium priority.Nov 11 2019, 8:05 PM

I also see this problem on wikis on Patchdemo (see T279119#6968347). It seems that we disabled not only local image thumbnails, but also caching of the URLs of image thumbnails on Commons. So on every page parse, for every image on the page, MediaWiki makes 3 prop=imageinfo API requests (for 1x, 1.5x and 2x image sizes), to aquire the upload.wikimedia.org URL of the image thumbnail.