Page MenuHomePhabricator

Performance review of Wikidata Bridge
Closed, ResolvedPublic

Description

Description

The Wikidata Bridge (formerly known as “client editing”) is a project aiming to make it possible to edit Wikidata’s data directly from Wikipedia. This will be achieved by an interface, connected to the infobox, that users can access directly from their local wiki. The development will be made in several phases, working together with the communities to understand their needs and build a tool that will connect the data and the communities in a better way.

Preview environment

https://en.wikipedia.beta.wmflabs.org/wiki/Wikidata_Bridge_Showcase
https://en.wikipedia.beta.wmflabs.org/wiki/Data_bridge

Which code to review

(Provide links to all proposed changes and/or repositories. It should also describe changes which have not yet been merged or deployed but are planned prior to deployment. E.g. production Puppet, wmf config, or in-flight features expected to complete prior to launch date, etc.).

  1. Everything in the client/data-bridge directory in the Wikibase extension (the file paths below are relative to this directory
  1. The modules that contain data-bridge or DataBridge in client/resources/Resources.php in the Wikibase extension
  1. Config changes (git log -G DataBridge):
Performance assessment

Please initiate the performance assessment by answering the below:

What work has been done to ensure the best possible performance of the feature?

a) Splitting the code to be loaded into a thin wrapper/module “init” that checks if there is actually a bridge enabled link on the page and a second, conditionally and lazily loaded module “app” containing the actual application:

  • the “init” module is built for target app
  • the “app” module is built as library (commonjs) to allow for runtime dispatching by other code (“init” module)
    • The vue dependency is externalized
    • Currently not tree-shaken or minified (see “potential optimisations”)

b) Combining API requests as much as possible (code)

c) Load items from Special:EntityData rather than via the API (amendment T240223)

  • There is a trade-off here. The output of Special:EntityData is cached in Varnish, reducing load on the app servers; on the other hand, with the wbgetentities API it is possible to limit the amount of data one receives (e. g. “I only need statements, not labels or sitelinks”). See also the related bullet point of the “weak areas” section. (Note, however, that wbgetentities currently does not support limiting the data to only statements for a certain property.)

What are likely to be the weak areas (e.g. bottlenecks) of the code in terms of performance?

  • It needs a thin wrapper to check whether there is a bridge-enabled link on the page dist/data-bridge.init.js which is run, in the client side, on every article impression
  • There is a time delay between the page being interactable and our init-code being called. That code attaches link listeners to Bridge-enabled links that disable the default behavior, i.e. the linking to Wikidata. In this time delay, if the user clicks a bridge enabled link expecting to see the bridge app, they instead get send to Wikidata
  • It loads, when “app” is dispatched, the entire entity from Wikidata, which might be quite large, but uses only the statements of a single property

Are there potential optimisations that haven't been performed yet?

  • Reducing the size of the dist/data-bridge.common.js file/module that is loaded if a bridge-enabled link is detected on the page T228857
    • One possible idea would be to use uglify.js in the build step (mediawiki/extensions/Wikibase/+/571482)
      • Optional: allow debug=true (cf.) for unminified version
    • Apply tree shaking
    • Externalize more dependencies shared with other micro frontends (e.g. vue-class-component - also used in termbox)
  • Reducing the data actually transmitted when saving by sending only the statement(s) for the actually edited property instead of all statements for all properties T230343
  • Using svgo via cssnano to further decrease the size of our assets T234070
  • Enable storing ResourceLoader modules on Firefox LocalStorage again once the next gen LocalStorage is enabled and stable in Firefox mediawiki/core/+/544183
  • Reuse vue ResourceLoader module from MediaWiki instead of shipping our own copy (T247519)
  • Reuse vuex ResourceLoader module from MediaWiki instead of shipping our own copy (T250264)
  • Explore differential loading to further reduce payload sizes for modern browsers (again, the difference is in the build step, not the implementation)
  • Determine and integrate a performance budget into project CI

Please list which performance measurements are in place for the feature and/or what you've measured ad-hoc so far.

  • There are two performance related measurements in place so far, both for the performance of the initialization step:
  • Calls to Special:EntityData (used by Data Bridge to load the entity being edited) should be virtually unaffected – we do not expect Bridge requests to make up a significant volume of these, compared to the many requests that are already being made.
  • Calls to wbgetentities (used by Data Bridge to load the label of the property being edited, in content language) should likewise be virtually unaffected.
  • Wikidata edits are also not expected to change significantly, seeing as most of the edits come from various automated and semi-automated systems (including bulk editing tools on behalf of editors).
  • We offer a shy CTA for the user to edit on the local wiki under some circumstances (T235753) but do not expect this to have measurable impact on the number of edits performed

Event Timeline

hey @aaron, @Gilles,

could you, please, give us an update on this task? could you also tell us something we could tackle proactively that may help?

We are currently implementing the last stories and that information would help us enormously to shape our next steps.

Thanks a lot in advance!

hey @aaron, @Gilles,

could you, please, give us an update on this task? could you also tell us something we could tackle proactively that may help?

We are currently implementing the last stories and that information would help us enormously to shape our next steps.

Thanks a lot in advance!

Sorry, I was reading through the documentation before a 1 week vacation, and then getting distracted by debugging T249069 and coordinating on T236414 and T250205 related issues. Getting back to this is my next priority after fixing T249069.

hey @aaron, @Gilles,

could you, please, give us an update on this task? could you also tell us something we could tackle proactively that may help?

We are currently implementing the last stories and that information would help us enormously to shape our next steps.

Thanks a lot in advance!

Sorry, I was reading through the documentation before a 1 week vacation, and then getting distracted by debugging T249069 and coordinating on T236414 and T250205 related issues. Getting back to this is my next priority after fixing T249069.

Understood. Thanks a lot!

I've been looking at this from time to time, and haven't found anything real problems yet. Some of the things I'm looking out for are:

  • Pageview critical path effects:
    • Bytes (JS)
    • Bytes (CSS+images)
    • Page load delay
    • First input delay
    • Content reflows
  • Post-load pageview interaction effects:
    • Hover delays
    • Input delays
  • Backend effects
    • DB I/O usage
    • DB contention issues
    • Search index store usage
    • Key/value store usage
    • Cache usage

I'll be looking at the frontend some more this weekend, but I expect to sign-off as "LGTM" monday.

From the perspective of popular/major articles, likely to have infoboxes, the extra 42.1 KB for loading the "app" JS doesn't seem crazy. I've looked through code several times and it seems reasonable. Testing with fast/slow 3G doesn't reveal obnoxious reflows or delay either. Having the edit link go directly to a Q<X> page when the JS hasn't fully loaded felt somewhat jarring, though I don't image that happening often. I don't see much editing at all given how discrete the icon is (a good thing).

The bootstrapping "init" JS is also pretty tiny. It does have a fair number of module references in the using() call for pages with editable elements. OTOH, those seem to be loaded anyway, with the "app" being the only new thing triggered. The DOM search for editable entity links is just a simple CSS selector call with reasonable metadata extraction. I don't see (nor did I perceive) any CPU use or long task issue there.

The client <=> api.php layer looks reasonable and well abstracted. Given the low-key nature of the GUI, I don't foresee any obvious edit rate, DB overhead, nor contention issues.

I don't see any reason to block the wikidata-bridge deployment and consider this task resolved from my end.

From the perspective of popular/major articles, likely to have infoboxes, the extra 42.1 KB for loading the "app" JS doesn't seem crazy. I've looked through code several times and it seems reasonable. Testing with fast/slow 3G doesn't reveal obnoxious reflows or delay either. Having the edit link go directly to a Q<X> page when the JS hasn't fully loaded felt somewhat jarring, though I don't image that happening often. I don't see much editing at all given how discrete the icon is (a good thing).

The bootstrapping "init" JS is also pretty tiny. It does have a fair number of module references in the using() call for pages with editable elements. OTOH, those seem to be loaded anyway, with the "app" being the only new thing triggered. The DOM search for editable entity links is just a simple CSS selector call with reasonable metadata extraction. I don't see (nor did I perceive) any CPU use or long task issue there.

The client <=> api.php layer looks reasonable and well abstracted. Given the low-key nature of the GUI, I don't foresee any obvious edit rate, DB overhead, nor contention issues.

I don't see any reason to block the wikidata-bridge deployment and consider this task resolved from my end.

That sounds awesome! thanks a lot for the assessment :)