Page MenuHomePhabricator

Performance review of Improved Commons Search
Closed, ResolvedPublic

Description

Description

(Please provide the context of the performance review, and describe how the feature or service works at a high level technically and from a user point of view, or link to documentation describing that.)

The structured data team is working to improve search in Commons as the first phase of the Structured Data across Wikimedia (SDAW) project, and we hope to release a new search page in Commons in Q4. The work may spill into Q1 of next year. We are still working through the design research and design phases and don't have details about implementation yet. We have an initial concept design in the first slide here but we are not likely to implement it exactly this way.

We also have an initial prototype/proof of concept here: T216625

Preview environment

We will be releasing this on production with a whitelist, since it cannot be tested elsewhere.

(Insert one or more links to where the feature can be tested, e.g. on Beta Cluster.)

Hosting the changes on Beta Cluster is a requirement prior to performance review. Please ensure that the feature can be used directly on the link(s) provided, without any data entry such as having to create an article. Any sample content needed should already be present.

If the changes cannot be hosted on Beta Cluster, explain why and provide links to an alternate public environment instead where the Performance Team can also SSH into. Links to code only is insufficient for a performance review.

TBD

Which code to review

(Provide links to all proposed changes and/or repositories. It should also describe changes which have not yet been merged or deployed but are planned prior to deployment. E.g. production Puppet, wmf config, or in-flight features expected to complete prior to launch date, etc.).

T216625

TBD

Performance assessment

Please initiate the performance assessment by answering the below:

  • What work has been done to ensure the best possible performance of the feature? TBD
  • What are likely to be the weak areas (e.g. bottlenecks) of the code in terms of performance? TBD
  • Are there potential optimisations that haven't been performed yet? TBD
  • Please list which performance measurements are in place for the feature and/or what you've measured ad-hoc so far. If you are unsure what to measure, ask the Performance Team for advice: performance-team@wikimedia.org. TBD

Event Timeline

in Commons in Q4. The work may spill into Q1 of next year

Hi @CBogen, just to clarify: Does that mean October 2020 to March 2021? Or is this implicitly about some financial year and means April to September 2020?

@Aklapper I'm referring to the WMF financial year, so Q4 is April-June 2020, and Q1 of next year is July-September 2020. I was following the language from the e-mail Gilles sent out asking for performance review requests for Q4 to be submitted by March 10, 2020.

Thanks for the clarification!

Krinkle renamed this task from Performance review of <improved Commons search> to Performance review of Improved Commons Search.Mar 10 2020, 5:22 PM
Krinkle added a project: Discovery-Search.
Gilles triaged this task as Medium priority.May 4 2020, 11:09 AM

@CBogen is the prototype running somewhere where I can try it out? Or not yet?

Gilles changed the task status from Open to Stalled.May 4 2020, 11:44 AM

@Gilles the prototype is running at https://commons.wikimedia.org/wiki/Special:MediaSearch but it is currently behind a whitelist. If you give me your WMF/Commons account I can add you to the whitelist.

There are significant improvements that should go out on this week's train, so it likely doesn't make sense to do any review until next week.

How does the performance review process work? As in, at what point in development do you usually perform the review?

@CBogen sounds good! Let know when you think the live version is in a good state to review.

We prefer to be involved early and we are happy to do multiple performance reviews of the same product at different stages of its development. Our recommendations are sometimes painful to apply to a completely finished product that might have made hard-to-undo decisions early on in their creation. Involving us when things are still malleable is better and generally makes a follow-up review prior to wide deployment a fast formality.

@Gilles Now that last week's train is once again rolling, the live version is in a good (early!) state to begin review.

https://commons.wikimedia.org/wiki/Special:MediaSearch

This is behind a whitelist, so let me know which Commons usernames you need access for review and I'll have them added. Thanks!

Gilles changed the task status from Stalled to Open.May 12 2020, 11:19 AM

@Gilles However, the team is going to overhaul the way this works after some discussions with the search team yesterday, so it may not make sense to do a performance review just yet. I'll update when I have more information on the timeline for those changes.

Any update? By the way, I don't see a difference in how the page behaves when I'm logged in or not.

Gilles changed the task status from Open to Stalled.Jun 12 2020, 1:35 PM

This is blocked on @CBogen pointing me to/giving me access to the new version of the feature to be reviewed.

Hi @Gilles - the page is now public so you don't need to log in to see it anymore.

We are waiting for T252685 before we're ready to do the performance review, since this is a significant rewrite of the feature. The last estimate I got was about two more weeks of work.

@Gilles - looks like we'll continue to work on this next quarter (Q1 2020-2021), and now we're also going to re-write the front end into vue.js (T251940). It probably doesn't make sense to do a performance review until that is done, so it's looking like we won't be ready until next quarter. I'll keep you updated!

Hi @Gilles! We're expecting to be ready for the performance review for this by the end of this month, August 2020. Does that work with your timeline?

Yes, that works for us, thanks!

Media search is essentially 2 separate parts:

  1. there as a new, media-focused, UI which uses the search API
  2. there is a multimedia-specific search profile to alter the actual elastic query

After next deployment train hits Commons (Wed 16 Sept, if things go well), both will be wired together & can be tested at https://commons.wikimedia.org/wiki/Special:MediaSearch?quickview=1

1/ UI
Page renders in PHP initially by executing the exact same API call (FauxRequest) that the frontend will (https://github.com/wikimedia/mediawiki-extensions-WikibaseMediaInfo/blob/master/src/Special/SpecialMediaSearch.php#L205)
Initial render is based off of this template: https://github.com/wikimedia/mediawiki-extensions-WikibaseMediaInfo/blob/master/templates/mediasearch/SERPWidget.mustache%2Bdom
Once the wikibase.mediainfo.mediasearch.vue JS module (https://github.com/wikimedia/mediawiki-extensions-WikibaseMediaInfo/tree/master/resources/mediasearch-vue) is ready, a Vue component will take over the DOM and replace the initial render (based on templates embedded in the Vue components.)
Vue will re-use the existing results (via mw.config.get( 'wbmiInitialSearchResults' ) ) so won't perform additional API calls until the user starts to interact.

When a user clicks on a result, a side-popup will open with more image details, for which an additional API request (props info|imageinfo|pageterms; imageinfo includes extmetadata) is needed.

When a user scroll down, the frontend will execute a similar (as was performed serverside) API to fetch the next batch of results. It will only do so automatically a few times, after which it becomes a button to load more.

2/ Search profile
A custom cirrussearch profile is being registered for searches in NS_FILE only: https://github.com/wikimedia/mediawiki-extensions-WikibaseMediaInfo/blob/master/src/WikibaseMediaInfoHooks.php#L706
Right now, in addition to searching only NS_FILE, the url must also include ?mediasearch=1 when searching via Special:Search or the search API.
Only plain full text searches are currently handled; phrase queries and AND/OR/negation are not yet supported and will fall back to the default search profile.

The actual search is pretty similar to plain cirrussearch (albeit different weights) except this also includes captions & SDC statements.
A search using this profile will first perform an API request to wikidata to find entities that match the term (https://github.com/wikimedia/mediawiki-extensions-WikibaseMediaInfo/blob/master/src/Search/MediaQueryBuilder.php#L414)
This results in an array of entity ids (["Q123", "Q456", ...]) that will be included in the elasticsearch query, in addition to the fulltext search.
Note 1: This might eventually end up being internalised as another query directly to elastic (cutting out the cross-wiki mediawiki API call)
Note 2: These results are not currently cached - they probably should (filed T263238)
Note 3: In the future, we might expand the list of matching statement ids to include "children" (e.g. "british shorthair" being a subclass of "cat" would also be a valid statement) - this would likely require a separate index

Gilles changed the task status from Stalled to Open.Sep 15 2020, 7:32 AM

Small update - we've removed the need for the ?quickview=1URL flag, which should be on this week's train - see T262860.

The current "fluid" image layout implementation is highly problematic in terms of performance. Completely arbitrary thumbnail widths are being requested by the feature (eg. 135px, 147px, etc.).

Thumbnails are rendered on demand when a size that hasn't been encountered before is requested. This means that using this feature, since thumbnail width values are pretty arbitrary to make them fit the screen, the likelihood of requesting an uncached thumbnail is very high. That size won't have been requested by anyone else before, unless someone with the same screen size searched for the same terms as you using this feature before.

Since you're already doing client-side resizing on top of the requested image size when resizing the page, you might as well stick to "standard" thumbnail sizes and downscale them to the dimensions you need. While the images will be slightly heavier in terms of file weight it will ensure that:

  1. they are more likely to be "hot" and served as a much smaller webp version for browsers that support it. We only serve webp variants for thumbnails that get a certain amount of traffic. By requesting arbitrary sizes, you're almost never requesting a popular one.
  2. latency is greatly improved overall, as the likelihood of cache hit will be orders of magnitude higher. Getting a thumbnail from our edge caches is a whole lot faster than generating a thumbnail from scratch.
  3. users will hit our thumbnailing limits less often. Currently I've had several thumbnails hit the throttling limit and returning an HTTP 429 just casually using the feature. I'm sure it's very common when someone searches for something new.

While there's no standard set of thumbnail sizes to use, I advise picking from the most common ones seen in thumbnail storage. Last time we looked at this in 2018, the top 25 most common thumbnail widths were:

180 120 320 800 640 1024 240 1280 300 150 200 48 100 2880 220 1920 250 500 75 600 400 1200 80 450 160

They're ordered by most common first in the list above.

If you round the requested thumbnail widths up to those values and downscale them client side, you should see a drastic improvement in performance in regards to how fast images appear.

Not a performance concern, but a UX one. First of all, great job on making this work without JS using server-side rendering! It's important that features for readers are developed in a progressive enhancement fashion. It also ensures a better performance for the initial paint of the page.

While using it with JS disabled, I saw that the only way to navigate results is with a "load more" button. While this works fine, browser history seems to be the only way to go back to previous results. I think it should be trivial to have at least a previous page/next page set of buttons in the no-JS experience instead of "load more"?

I'm seeing a lot of requests to:

https://www.wikidata.org/w/api.php?action=query&format=json&origin=https%3A%2F%2Fcommons.wikimedia.org&meta=userinfo%7Ctokens

For every keystroke in the search box, basically. What does this do? What do you need these tokens for? I don't see the tokens returned by this API call used in subsequent API requests.

I think that you could use some debouncing on the search requests triggered by keystrokes. While the in-flight requests are correctly aborted when more typing happens, you could probably avoid a lot of requests altogether by having a debounce of a few milliseconds on keydown events (which I presume is what you're using here). While it does delay the ultimate request slightly, users don't expect anything to happen until they released the key, and as such there are already a few milliseconds between keydown and keyup that aren't considered to be a waiting period and could be leveraged here. By avoiding requests that get cancelled anyway, you're also ensuring that there's no network concurrency issue for the request you actually want to make when users are done typing (or make a small pause).

While the image lazy loading is a great idea, it seems like it's not doing enough "look ahead", images only start loading when they get into the viewport. By using an Intersection Observer with a properly set rootMargin parameter, you can actually have images that are slightly below the viewport get loaded. This would ensure a lot smoother scrolling experience for users with reasonably fast internet connections, who would never see a loading thumbnail.

Using mw.track, for example, I would advise setting up some custom performance metrics for this feature. This will allow you to get a better sense of actual performance in the wild and discover potential outliers.

For your feature, examples of important user-centric timings might be:

  • time between the typing settling and the corresponding suggestion dropdown being displayed
  • time between a confirmed search (clicked suggestion or enter pressed) and all images initially above the fold being loaded

And if you set up metrics like that before the changes I've suggested previously, you will be able to see the magnitude of the changes they bring. Since the feature is already out in the wild, you can see if the image dimension change is a net gain overall ,etc.

It seems like search API requests aren't cached at all, neither server-side or client-side. Is this deliberate? Even caching for a few minutes would be beneficial.

I will do another pass of review looking at the code (hopefully tomorrow), so far it was all black box testing by browsing the site. There are already a lot of things to consider, discuss and potentially file tasks about in what I've found so far.

adding @egardner @AnneT @Cparle to read @Gilles comments above and follow along :)

Change 631170 had a related patch set uploaded (by Matthias Mullie; owner: Matthias Mullie):
[mediawiki/extensions/WikibaseMediaInfo@master] Rewrite thumbnail urls to serve more common widths

https://gerrit.wikimedia.org/r/631170

Change 631170 merged by jenkins-bot:
[mediawiki/extensions/WikibaseMediaInfo@master] Rewrite thumbnail urls to serve more common widths

https://gerrit.wikimedia.org/r/631170

In the MediaSearchWidget we use an API call to action=query&meta=siteinfo to read the thumblimits config.

https://en.wikipedia.org/w/api.php?action=query&format=json&meta=siteinfo
... "thumblimits":[120,150,180,200,220,250,300,400] ...

I think this is a better approach than hard-coding an installation-specific list of sizes

Bear in mind that in practice we serve 1.5x and 2.x versions of these sizes as well on articles to devices with the corresponding pixel density/zoom level, which makes them warm too.

There are already a lot of things to consider, discuss and potentially file tasks about in what I've found so far.

Correct. All of this feedback has either been addressed or is currently being worked on.

I will do another pass of review looking at the code (hopefully tomorrow), so far it was all black box testing by browsing the site.

Did this happen, or should we expect more to come?

I'm noticing an unthrottled listener to "resize" events: https://github.com/wikimedia/mediawiki-extensions-WikibaseMediaInfo/blob/master/resources/mediasearch-vue/components/SearchResults.vue#L490

That's generally a bad idea, it's going to be very resource-intensive and slow down the browser on slow devices. That event listener should be debounced.

It would be even better to find a CSS solution to whatever layout issue it's trying to solve, to avoid a resize listener altogether.

Likewise, for other layout purposes you're using the expensive getBoundingClientRect() function https://github.com/wikimedia/mediawiki-extensions-WikibaseMediaInfo/blob/master/resources/mediasearch-vue/components/SearchResults.vue#L337

Each call to this function (and others) causes a reflow of the browser.

You should seek pure CSS solutions or tradeoffs for whatever it's doing as well, if possible. Some of that logic seems to be trying to figure out if something is in the viewport, which you can use an IntersectionObserver for. Which you use in https://github.com/wikimedia/mediawiki-extensions-WikibaseMediaInfo/blob/master/resources/mediasearch-vue/components/SearchFilters.vue already.

offsetWidth is used here: https://github.com/wikimedia/mediawiki-extensions-WikibaseMediaInfo/blob/master/resources/mediasearch-vue/components/QuickView.vue#L556 which is also a reflow-inducing call. Like before, CSS solutions should be favoured if possible.

Thanks, @Gilles – I'm going to create a few new tickets based on these suggestions and will investigate each scenario in turn. In general we have tried to rely on CSS (flexbox) as much as possible for the image grid layout, but there are always edge cases or at least trade-offs that need to be made. Debouncing the resize listener should be an easy fix though.

It sounds like we may be doing some layout thrashing when we open or close the "quickview" panel in the search results – this will naturally cause the grid to re-flow (we change the amount of space available for it), but if we're causing those expensive operations to happen more than they need then that's not ideal. Sounds like getting a performance profile during that process may be the next step.

I want this page to serve as a model for how to develop UIs using Vue in our projects, so I'd like to optimize things as much as possible.

Also, if we know that certain functions will always cause layout thrashing (like getClientBoundingRect), would it make sense to update our linters so that developers get a warning (not an error) to the effect of: "this method causes layout thrashing, use it sparingly"?

Yes, it would be a great idea for calls to these functions to issue warnings like that! Either during local linting and/or CI. I've actually suggested to browser vendors that they should also be more visible in dev tools as something that should be avoided if possible.