Page MenuHomePhabricator

Add user journey performance tests for Vector's Legacy and Vue.js search
Closed, ResolvedPublic5 Estimated Story Points

Description

How do we want to compare the current search performance to the new one? This task is to make sure we have the proper user journeys defined, implementing them in the Performance test repo, and setting up a dashboard.

This task does not cover any behavioral regression tests such as "the new UI is confusing."

User journeys

Legacy Vector:

For anon cached and uncached, and logged-in uncached scenarios:

  1. Visit the Barack Obama page.
  2. Tap search.
  3. Type "b".
  4. Wait for a response.
  5. Type "ananb" one character at a time.
  6. Wait for a response.
  7. Type "<backspace>a" one character at a time.
  8. Wait for a response.
  9. Tap the first result.
  10. Wait for the page to load.

Latest Vector:

For anon cached and uncached, and logged-in uncached scenarios:

  1. Visit the Barack Obama page.
  2. Tap search.
  3. Wait for the form to load.
  4. Type "b".
  5. Wait for a response.
  6. Type "ananb" one character at a time.
  7. Wait for a response.
  8. Type "<backspace>a" one character at a time.
  9. Wait for a response.
  10. Tap the first result.
  11. Wait for the page to load.

Questions

  • Confirm with @ovasileva and @alexhollender that these journeys make sense.
  • Is the Barack Obama page a good test for the tests wikis? The Performance team has also used the Sweden and Facebook pages. Is there a universal page that works on the test wikis (French Wikipedia and Wiktionary, Hebrew Wikipedia, Portuguese Wikiversity, Basque Wikipedia, and Persian Wikipedia)? Answer: there is no universal page. The Obama page is usually a good example of a large page. See T251544#6103760.
  • Do we want to also test Special:BlankPage? Answer: no, but we may want to test an empty page in the user namespace. See T251544#6103760.
  • Are parameters possible or do these break caching (see T215088#4993170)? E.g., safemode=1, useskin=vector&useskinversion=2, and banner=null. If not, how do we set the skin version and enable features?
  • Do we need to test loading the largest dependencies (e.g., Vue) without using them (see T250336#6075053)?

Dashboards

For Legacy and Latest modes, mimic at least the metrics described in the reference dashboard:

  • TTFB
  • FirstVisualChange
  • Heading
  • LargestImage
  • LastVisualChange
  • SpeedIndex
  • PerceptualSpeedIndex
  • Fully Loaded
  • ...

Both reports should be comparable on the same dashboard.

Acceptance criteria

  • Legacy and latest reports are comparable on the test wikis.
  • Chores are updated to include the new dashboards
  • The wiki (either a link off of the Vue.js page or one of the Performance pages) is updated with recommendations for future Vue.js projects
  • Performance team signs off on tests and dashboards

QA steps

Measurement definitions:

mwVector{Legacy/Vue}SearchLoadStartToLoadEnd: Intended to measure the time it takes to load and execute the search module. Note, I've noticed this measurement can be the most skewed of the three measurements because the load end time was implemented using the "mw.loader.using" callback which places that code in a queue that can be influenced by other modules that are loading (e.g. ULS module). Although not desirable, this problem affects both Vue and legacy search.

mwVector{Legacy/Vue}SearchLoadStartToFirstRender: Intended to measure the time it takes from the start of lazy loading the search module to when the search results are visible to the user.

mwVector{Legacy/Vue}SearchQueryToRender: Intended to measure the time it takes from the start of the ajax request to fetch search results to when the search results are visible to the user.

Because of its technical depth, QA here is probably best done by a dev/someone familiar with the performance timeline. Here is how I'd suggest QAing it:

Legacy Search
  1. Clear localstorage/all cache and visit https://fr.wikipedia.org/wiki/Barack_Obama?useskinversion=1 with Chrome
  2. Go to the performance tab and hit the record button
  3. Focus the search input and type the letter "a" in the input
  4. Wait for the search results to be shown
  5. Stop profiling
  6. Using the "Network", "Timings", and "Main" sections in the performance timeline, ensure the bars in the "Timings" section make sense according to the measurement definitions above. I've attached a screenshot of a timeline I did for Vue search which is annotated and might be useful as a reference

vue-timeline.png (1×3 px, 683 KB)

Vue search

Repeat steps 1-6 but now measure Vue search using https://fr.wikipedia.org/wiki/Barack_Obama?useskinversion=2

Dashboard at: https://grafana.wikimedia.org/d/GivPpdsGk/vue-vs-legacy-search?orgId=1

QA Results - Beta

ACStatusDetails
1T251544#6914967

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

What's the expectation/next step for @Peter and the perf team? Reviewing the patches linked here that aren't [WIP] nor [search]? Do you need anything else?

@Gilles, these are patches I was hoping the performance team could review:

Add client-side performance metrics for legacy search
Add client-side performance metrics for searchLoader
Revise legacyObama search (Note: Peter already +2d this, I just had a small revision)
WIP: Add Vue search script (Note: This is currently not ready for review as it is blocked on our release of the Vue search component, but it will eventually need review)

If your team reviews those, I think our team will be in good shape to review the rest of the patches.

Change 624824 merged by jenkins-bot:
[mediawiki/core@master] Add client-side performance metrics for legacy search

https://gerrit.wikimedia.org/r/624824

Change 624888 merged by jenkins-bot:
[mediawiki/skins/Vector@master] Add client-side performance metrics for searchLoader

https://gerrit.wikimedia.org/r/624888

\o/ All the patches that were merge candidates have now been merged (thank you Performance team!). The remaining work on this is blocked on the Vue search component work:

  1. Emit fetch-start and fetch-end events (from TypeaheadSearch component)
  2. Add metrics in App.vue that integrates with events from step 1 (pending)
  3. WIP: Add Vue search script (small patch in the synthetic performance repo)

Change 624825 merged by jenkins-bot:
[performance/synthetic-monitoring-tests@master] Revise legacyObama search

https://gerrit.wikimedia.org/r/624825

Now that the typeahead search component has been merged, the next patch up for review (to be reviewed by the Web team) is:

https://gerrit.wikimedia.org/r/c/wvui/+/628240

Change 628240 merged by jenkins-bot:
[wvui@master] [search] Emit fetch-start and fetch-end events

https://gerrit.wikimedia.org/r/628240

  1. WIP: Add Vue search script (small patch in the synthetic performance repo)

AIUI this is the only patch that's left. I'm being bold and moving this back to Doing.

Final piece remaining on this depends on the integration bit (T264355) so I have moved this to blocked

Change 644631 had a related patch set uploaded (by Nray; owner: Nray):
[mediawiki/skins/Vector@master] [search] Add Vue search performance instrumentation

https://gerrit.wikimedia.org/r/644631

The above patch is dependent on the integration patch and part of @phuedx 's patch "Instrument Vue.js-based search widget" . It's not ready for merge yet, but if anyone from the web team has bandwidth feel free to review :)

Change 644631 merged by jenkins-bot:
[mediawiki/skins/Vector@master] [search] Add Vue search performance instrumentation

https://gerrit.wikimedia.org/r/644631

nray added a subscriber: Edtadros.

Now that the Vector/WVUI performance metric recording code has been merged, the next (and hopefully last) patch is the one in the synthetic performance repo that adds a test to collect these Vue search metrics:

https://gerrit.wikimedia.org/r/c/performance/synthetic-monitoring-tests/+/624826

However, it needs a url to run the test against. Since we have not launched the new search anywhere yet, this is moved back to the blocked column.

Change 624826 merged by jenkins-bot:
[performance/synthetic-monitoring-tests@master] Add Wvui search script and point legacy/Wvui search tests to beta

https://gerrit.wikimedia.org/r/624826

Today I met with @Peter to talk about the steps remaining for this ticket. Peter was very helpful and now we have a really cool dashboard thanks to him!

https://grafana.wikimedia.org/d/GivPpdsGk/vue-vs-legacy-search?orgId=1

Note that this dashboard is currently collecting metrics for legacy and Wvui search on the beta cluster. I will update it to point to production when we have deployed there. It is also worth noting that these are synthetic performance tests [1] meaning that they are running automated tests rather than collecting metrics from real users (although the same code could in theory be used to test real users if desired)

[1] https://wikitech.wikimedia.org/wiki/Performance/Synthetic_testing

  • Chores are updated to include the new dashboards

A chore has been added https://www.mediawiki.org/wiki/Reading/Web/Chores

  • The wiki (either a link off of the Vue.js page or one of the Performance pages) is updated with recommendations for future Vue.js projects

Not exactly sure what was desired here, but I documented our search performance setup at https://www.mediawiki.org/wiki/Vue.js#Projects_using_Vue

nray removed Edtadros as the assignee of this task.
nray moved this task from Doing to QA on the Web-Team-Backlog (Kanbanana-FY-2020-21) board.
This comment was removed by nray.

Change 667938 had a related patch set uploaded (by Nray; owner: Nray):
[performance/synthetic-monitoring-tests@master] Point Vue/Legacy search tests to frwiki and update query string

https://gerrit.wikimedia.org/r/667938

^^ New patch points tests to frwiki since it is a production wiki with wvui search enabled and because beta has (oddly) stopped showing search suggestions unless an exact match has been typed

Change 667938 merged by jenkins-bot:
[performance/synthetic-monitoring-tests@master] Point Vue/Legacy search tests to frwiki and update query string

https://gerrit.wikimedia.org/r/667938

This is ready for qa, I just need to write qa steps

nray removed Edtadros as the assignee of this task.
nray updated the task description. (Show Details)
nray moved this task from Doing to QA on the Web-Team-Backlog (Kanbanana-FY-2020-21) board.

Moving to QA but as mentioned in the QA steps, this one is probably best done by a dev

Test Result - Beta

Status: ✅ PASS
Environment: beta
OS: macOS Big Sur
Browser: Chrome
Device: MBP
Emulated Device: NA

Test Artifact(s):

QA Steps

Measurement definitions:

mwVector{Legacy/Vue}SearchLoadStartToLoadEnd: Intended to measure the time it takes to load and execute the search module. Note, I've noticed this measurement can be the most skewed of the three measurements because the load end time was implemented using the "mw.loader.using" callback which places that code in a queue that can be influenced by other modules that are loading (e.g. ULS module). Although not desirable, this problem affects both Vue and legacy search.

mwVector{Legacy/Vue}SearchLoadStartToFirstRender: Intended to measure the time it takes from the start of lazy loading the search module to when the search results are visible to the user.

mwVector{Legacy/Vue}SearchQueryToRender: Intended to measure the time it takes from the start of the ajax request to fetch search results to when the search results are visible to the user.

Because of its technical depth, QA here is probably best done by a dev/someone familiar with the performance timeline. Here is how I'd suggest QAing it:

Legacy Search
  1. Clear localstorage/all cache and visit https://fr.wikipedia.org/wiki/Barack_Obama?useskinversion=1 with Chrome
  2. Go to the performance tab and hit the record button
  3. Focus the search input and type the letter "a" in the input
  4. Wait for the search results to be shown
  5. Stop profiling
  6. Using the "Network", "Timings", and "Main" sections in the performance timeline, ensure the bars in the "Timings" section make sense according to the measurement definitions above. I've attached a screenshot of a timeline I did for Vue search which is annotated and might be useful as a reference
Timeline
Screen Shot 2021-03-14 at 7.53.38 PM.png (798×3 px, 163 KB)
Load Profile
mwVectorLegacySearchLoadStartToLoadEnd
Screen Shot 2021-03-14 at 7.56.15 PM.png (63×394 px, 15 KB)
mwVectorLegacySearchLoadStartToFirstRender
Screen Shot 2021-03-14 at 7.57.44 PM.png (50×347 px, 13 KB)
mwVectorLegacySearchQueryToRender
Screen Shot 2021-03-14 at 7.59.04 PM.png (48×308 px, 9 KB)
Vue search

Repeat steps 1-6 but now measure Vue search using https://fr.wikipedia.org/wiki/Barack_Obama?useskinversion=2

Timeline
Screen Shot 2021-03-14 at 7.37.44 PM.png (766×2 px, 153 KB)
Load Profile
mwVectorVueSearchLoadStartToLoadEnd
Screen Shot 2021-03-14 at 8.01.22 PM.png (53×337 px, 11 KB)
mwVectorVueSearchLoadStartToFirstRender
Screen Shot 2021-03-14 at 8.01.48 PM.png (46×325 px, 10 KB)
mwVectorVueSearchQueryToRender
Screen Shot 2021-03-14 at 8.02.11 PM.png (58×286 px, 7 KB)

Thank you for looking at this @Edtadros . I've had a good time looking at your provided JSON profiles, and it looks like the discrepancies between your results (where most of the Vue metrics are faster than legacy) and the dashboard (where Vue metrics are slower than legacy) could be explained by network fluctuations and the timing of the "a" keypress:

  • Your profiles show the "a" key was pressed ~470ms after Vue search load "ended" and the "a" key was pressed ~713ms after Legacy seach load "ended" . This difference in timing (although no fault of your own) affects the "SearchLoadStartToFirstRender" metric and, combined with the second point below, helps explain why your results suggest Vue being ~400ms faster for this metric. The dashboard tests focus and immediately type in the input with no delay.

vue-keypress.png (1×2 px, 233 KB)

legacy-keypress.png (958×2 px, 234 KB)

  • Your profiles show a really fast ajax request to fetch the search results for Vue (21 ms and likely a cached response from the edge) vs. a relatively slow ajax request for legacy (248ms). That network request affects both the "SearchLoadStartToFirstRender" and "SearchQueryToRender" metrics, and I'd expect that number to fluctuate based on network conditions. The "SearchQueryToRender" metric on the dashboard suggests that over time the legacy ajax request is usually faster than Vue given that the paint times are relatively much smaller in magnitude (~5 ms). Legacy and Vue use different APIs which might explain the difference.

vue-ajax.png (1×2 px, 194 KB)

legacy-ajax.png (914×1 px, 131 KB)

I didn't see any significant red flags in the QA results so I'm moving this to sign off. To whoever signs this off, the following comment might help recap the work that was done on this ticket:

https://phabricator.wikimedia.org/T251544#6875594