Page MenuHomePhabricator

Special:Search: "429 Too Many Requests" when number of returned search results set to more than 50
Closed, ResolvedPublic

Description

Three cases of thumbnails not being displayed were reported in the following comment - all three reported cases are on commons wiki:

I'm reporting a comment from User:Pelikana (original diff on Commons):

To see if special search works well enough, I will search for "SVG" in some combinations like svg+paris and notice that a page with a file list of 500 results may take 30 seconds to load. list 500 svg results, Svg+Paris. That's too long. A more realistic search would be 500 jpg list Here the page loads fast enough but it fails to show all thumbs in one go.

(1) Search terms - SVG* list 500 svg results
(2) Search terms - SVG+Paris Svg+Paris.
(3) Search terms - Breitner+"RP-T" 500 jpg list

Steps to reproduce:

  • follow the links above and have the number of search results on a page to be set to 500 (or, if all results get displayed properly, click on the Next 500).
  • while a page with search results is loading, there are no indication to a user that the request was submitted
  • when a result page loads, some thumbnails image won't be displayed

e.g. for the above example (1)

Screen Shot 2022-10-17 at 1.28.46 PM.png (1×3 px, 533 KB)

  • the Console will show
GET
scheme https
host upload.wikimedia.org
filename /wikipedia/commons/thumb/7/71/Handbook_of_ornament%3B_a_grammar_of_art%2C_industrial_and_architectural_designing_in_all_its_branches%2C_for_practical_as_well_as_theoretical_use_%28IA_handbookoforname1900meye%29.pdf/page1-120px-thumbnail.pdf.jpg
Status 429 Too Many Requests
Version HTTP/2
Transferred 2.88 KB (1.78 KB size)
Referrer Policyorigin-when-cross-origin
Request Priority Low
NOTE: The issue is not specific to commons wiki. The steps above (with the same search terms) on enwiki, for example, would result in the same 429 Too Many Requests error.

Event Timeline

I re-run the steps in the task description for some of the search queries reported in the task.

The tests were on commons wmf.6 Special:Search (no throttling) for search terms SVG*

number of search results to be returned (selected on Special:Search pageTime for search results (page) to load
20 (default)5.48s
10014.9s
25022.6s

When the number of results is 500, then the search times out

Screen Shot 2022-10-24 at 1.51.40 PM.png (282×1 px, 35 KB)

I got the timeout error twice - when I set the number of returned search results in user Preferences and when I set it on Special:Search page.

  • [202bf565-85cf-48cd-92d7-0581310d4ad2] 2022-10-24 20:32:37: Fatal exception of type "Wikimedia\RequestTimeout\RequestTimeoutException"
  • [9406839b-812c-4fad-af05-ab07f9d4d8b0] 2022-10-24 20:49:32: Fatal exception of type "Wikimedia\RequestTimeout\RequestTimeoutException"

For the search side of things we can attach &cirrusDumpResult to any Special:Search request, this will short-circuit the request once elasticsearch responds and tell us how long the search engine took. In my quick testing:

It does seem like something after the search engine is taking significant time. Editors in particular have use cases for long result lists. I might suggest having a cutoff for thumbnailing, only show thumbnails for the first 20 or so results?

I put together a sample bit of code at P36132 to test the timing of different bits of the query process. This measures the parts independantly, no overlap between the parts. The search portion is longer than the earlier example with &cirrusDumpQuery as it includes various post-processing cirrus does that was skipped before.

500 results, SVG*

search1.17s
thumbs32s
less thumbs (20)0.95s

500 results, SVG Paris

search1.33s
thumbs28s
less thumbs (20)1.24s

Digging a bit more into this...

If i replace the line in SearchResultThumbnailProvider::buildSearchResultThumbnailFromFile

$localPath = $thumb->getLocalCopyPath();

with

$localPath = false

Then the thumbnailing runs in ~300ms instead of 30s. Is that part necessary, i suspect it's not worth all the time involved?

Change 849022 had a related patch set uploaded (by Matthias Mullie; author: Matthias Mullie):

[mediawiki/core@master] Don't calculate size of thumb until requested

https://gerrit.wikimedia.org/r/849022

Change 849022 merged by jenkins-bot:

[mediawiki/core@master] Don't calculate size of thumb until requested

https://gerrit.wikimedia.org/r/849022

Change 849022 merged by jenkins-bot:

[mediawiki/core@master] Don't calculate size of thumb until requested

https://gerrit.wikimedia.org/r/849022

It seems the only caller to SearchResultThumbnail::getSize is the Rest/SearchHandler exposing it as thumbnail.size. If I understand correctly, the way it works is by server-side downloading every thumbnail out of Swift into a tmp directory on the appserver (or if missing, generating a thumbnail on-demand in PHP, possibly bypassing Thumbor/wgThumbProxyUrl?) and then computing the file size of that thumbnail.

I assume with the above this lazy-getter is not invoked within Special:Seach, but I'm still curious what it is or was for. What use case do REST ALI clients have for the bytesize of the thumbnail? This notion is somewhat incompatible with the way thumbnails work (as evidenced by the cost it took to compute) as we generally perceive thumbnails as emphemeral and referenced by 404-rendered URL only, not as something we can query and measure.

If we wanted to offer this officially we'd have to rethink some of that, and e.g. offer a FileBackend primitive for fetching/estimating (without in-process downloading) the thumbnail size. In the case of Swift this could be implemented as an HTTP call that asks the Swift cluster for the size of that file, similar to what we do already when we fetch metadata for the original file size. Ideally, this would then also be batched as our (currently imagined) latency budgets and strategy don't really account for multiple Swift downloads in the same web request. Similar to Parsoid invocations, we generally expect one of those in a given web request in terms of resource cost and latency. We'd want this lookup to be batched (which Swift supports) and perhaps also made optional so that the majority of API clients that don't need it don't incur the cost.

As example, when we expect Parser-derived metadata to be needed in bulk, we don't use the ParserCache, not even in a bulk lookup, as ParserCache may miss and is fairly costly to fetch. Instead, that data would be stored in page_props or other derived tables for high volume access.

I recognise this is all still new and plans may already be in place. I'm only asking as a curiosity after the fact, to maintain a mental model of what we have and why it exists. An answer like "it's experimental, don't worry yet" is totally fine :)

This particular part is not "new" or "experimental" - far from it.

Some brief background:
class SearchResultThumbnail (which includes the getSize method) has existed since 2020, and was introduced along with the thumbnails hook (which was then in the Rest namespace) in https://gerrit.wikimedia.org/r/c/mediawiki/core/+/588999/
At the same time, PageImages started subscribing to that hook and immediately doing the filesize check; in https://gerrit.wikimedia.org/r/c/mediawiki/extensions/PageImages/+/591050
Since 2020, thumbnails (via the hook + PageImages) have been available via the Rest API. We now wanted to start showing these same thumbnails on Special:Search, so a bunch of that (up until that point) Rest-specific code got moved around recently (& also started affecting Special:Search)

I'm not sure why that filesize check exists - I've only briefly looked at the commits & related phab tickets and couldn't immediately find anything about it.
Perhaps it's unused and we could remove it (it's been around and part of the public API since 2020, though, so it's going to be a little painful), but one would have to dig into that in a bit more detail than I have to figure out whether there are (known) clients relying on this data.

Checked on wmf.8 deployment - the issue is fixed.

This particular part is not "new" or "experimental" - far from it. […]

Perhaps it's unused and we could remove it (it's been around and part of the public API since 2020, though, so it's going to be a little painful), but one would have to dig into that in a bit more detail than I have to figure out whether there are (known) clients relying on this data.

2020 is relatively new, but point taken. I thought of it as part of the Vector 22 skin which is opt-in on most wikis. I don't know how much we promote this API outside WMF yet, I assumed it was effectivley internal at this point. But yeah it is in a stable rest.php/v1 endpoint. Thanks for clarifying, I had forgotten it was v1.

Swift thumbnail size information is imho fundamentally information we cannot serve in a batch format like search. And more generally should not expose anywhere even for a single file as it isn't information that logically exists imho.

Thumbnails are architected as an ephemeral concept, and for scaling and security reasons we allow its rendering and serving to be decoupled from MediaWiki at scale such that e.g. Thumbor and Swift can serve, and (re)generate these as needed based on demand and 404 status. Akin to the Parser, this is information that can be accessed in a web request if it is the subject of that page, but an API endpoint that would require accessing N thumbnails or N wiki pages to parse, goes contrary to the current architecture and is thus likely an attack vector in the worst case, or in the best case something with latency behaviour we can't commit to very well. For the Parser, this is why we store derived information in page_props and link tables. We don't iterate and query ParserCache entries, since those may be missing.

The size of original files is primary data and indexed in the database. The size of thumbnails is not deterministic and can vary over time, by client, and by configuration. For example, do we measure JPG or WebP? With or without gzip? If the thumbnail was recently purged, do we now render it on-demand in the API call? If so, how? We no longer provision the full imagescaler packages on appservers, and for security reasons aren't meant to generate images and write to Swift during high-traffic API calls.

Going back to it being v1-stable, we'd probably have an easier time pulling off a quick deprecation now than 2-3 years in the future.

I'll leave it at this and consider your team sufficiently informed. Happy to help further :)

I had a quick look and AFAICT, no consumer of this API even uses size.
It's also already documented to be nullable (and for most thumbnails, this ends up being the value)
I think we can safely stop calculating size and always return null since, afaict, there is no consumer, and in the unlikely case that there is one, it should already expect null.

I created T323125 (to remove the computation & always return null - should be quick and painless) and T323126 (complete removal)
I'll get started on the former since that one's quick and has the biggest impact.
I may come back to the latter at some point, but this is not actually in my/our wheelhouse and we have other fish to fry ATM - but at least there is a ticket for follow up.