[Spike 2.5h] Did the new mobile header treatment break the search experience?
Open, NormalPublic

Description

Recently, @chelsyx asked:

Also, the dashboard shows there was a drop in the number of events at the end of March. Is there any change on the mobile web side that you know of may cause this drop?

The date of the sharp downtick is Wednesday, 29th March. On Thursday, 30th March there's a sharp uptick. Readers Web deployed the new mobile header treatment (including the search box) on that Wednesday. However, the uptick isn't as large as the downtick (circa 14 k events/day vs circa 20 k events/day).

IIRC there was an issue around migrating between the old and new versions of the header. This may explain the downtick on the day of the deploy and then the uptick the day after (at which point, the majority of the text cache had been invalidated).


Separately, @Nirzar noted that the number of Special:Search pageviews increased at around the same time. This uptick happened on Friday, 24th February.

Possible Investigations

  1. Per T176464#3636190, see if there was a change in the top N browsers visiting Special:Search around Friday, 24th February and Wednesday, 29th March.

Outcomes

Actionable by Software Engineers

  • The timeline for deploying the new mobile header treatment ("the treatment") is well understood and documented.
  • We test the hypothesis that implementing the treatment broke the instrumentation.
  • See T176464#3657672.
  • Any bugs in the treatment and/or MobileWebSearch instrumentation are created.

Actionable by Research/PO

  • The reason for the increase of Special:Search pageviews is well understood and documented.
phuedx created this task.Sep 22 2017, 10:00 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 22 2017, 10:00 AM
ovasileva triaged this task as High priority.Sep 22 2017, 12:52 PM
ovasileva moved this task from To Triage to Upcoming on the Readers-Web-Backlog board.

Marking as high priority based on timeline. @phuedx, @chelsyx - feel free to go down to normal if you think it's more appropriate.

Jdlrobson added a subscriber: Jdlrobson.EditedSep 26 2017, 5:06 PM

It would help to show these statistics per browser (https://tools.wmflabs.org/pageviews/?project=en.wikipedia.org&platform=mobile-web&agent=user&start=2016-09-01&end=2017-08-31&pages=Special:Search only shows all page views)

The new header on mobile would have increased traffic to Special:Search for any browsers which run in grade C mode as it requires 2 page views now. Navigate to any page on Opera Mini for example and you will see a search icon, that when clicked will take you to Special:Search. To search requires typing in the box on the page and visiting Special:Search again. Can we look at these results for pages with a blank search query e.g. https://en.m.wikipedia.org/w/index.php?search=?

It's worth mentioning IE9 became a grade C browser around this time - this would have driven a large amount of traffic to this page as Internet Explorer 9 was no longer supported
for any JS driven searches and probably can be attributed to this spike.

phuedx updated the task description. (Show Details)Sep 27 2017, 8:52 AM

Thanks for your insight, @Jdlrobson.

@phuedx is the idea that reading web will do this spike in a future sprint or should this be in tracking?

phuedx added a comment.Oct 3 2017, 9:07 AM

@Jdlrobson: Yes. We authored the instrumentation and the old/new versions of the header.

I'm not sure if the stats consider hits to https://en.m.wikipedia.org/w/index.php?search=Hagszfz ? The mobile site uses that url not Special:Search so shouldn't impact these graphs?

Thus, fairly confident, it's due to IE9 support drop. We can prove that by running 2 hive queries before and after (pivot doesn't go back that far).

phuedx added a comment.Oct 3 2017, 4:07 PM

I'm not sure if the stats consider hits to https://en.m.wikipedia.org/w/index.php?search=Hagszfz ? The mobile site uses that url not Special:Search so shouldn't impact these graphs?

Thus, fairly confident, it's due to IE9 support drop. We can prove that by running 2 hive queries before and after (pivot doesn't go back that far).

To be clear, are you suggesting that dropping support for IE9 also caused the drop in MobileWebSearch events?

Yes. MobileWebSearch requires JS. Thus the drop of JS support in IE9 (and other related browsers that do not support ES5 JavaScript) will have significantly dented what we log by removing a large chunk of browsers (http://caniuse.com/#search=es5)

phuedx added a comment.Oct 3 2017, 5:30 PM

This is a compelling hypothesis.

However, rMW09fcee611061: startup: Drop JavaScript support for ES3-only browsers was merged on Thursday, 2nd March and would've ridden the train to the group2 wikis on Thursday, 9th March, which is two weeks before the dip happened. Can this be explained?

You're right (although dates are wrong). https://gerrit.wikimedia.org/r/340893 was merged Tuesday 4th April. The spike started on 30th March and desktop traffic stayed same. I think we can safely assume the ES3 hypothesis may not be the correct hypothesis.

There's a lot of alternative hypotheses I can think to test with regards to an increase in Special:Search traffic and decrease in JS events - there was a spike in traffic from Opera Mini; the search became more/less discoverable; people stopped accidentally clicking the header and invoking search due to the smaller more obvious tap area (notice how the click through rate is pretty stable); seasonal (notice the increase in October, maybe due to kids returning back to school); maybe a chance was made relating to ES3 the week before which temporarily broke a bunch of browsers; maybe there were some issues with EventLogging; sampling rate changed (either on purpose or by accident); how we sampling changed etc...

I think debugging this is going to take a lot longer than our usual 8hrs here (consider the page previews EventLogging duplicate issue). Personally to get to the bottom of this and answer all the questions I'd want to spend an entire week looking into this. It feels like a very deep rabbit hole.

phuedx added a comment.Oct 4 2017, 6:08 AM

You're right (although dates are wrong).

Mibad. I read the commit date and not the merge date.

phuedx updated the task description. (Show Details)Oct 4 2017, 10:22 AM

@Jdlrobson: I take your point that there are plenty of theories that explain a drop in the number of search sessions at around the time that the new mobile header was deployed. I trust that Research/PO generate similar theories and hypotheses (preferably) but also apply Occam's Razor where necessary 🙂

I've updated the task to clarify that from an engineering standpoint, we're only interested in whether we introduced a regression in the instrumentation when implementing the treatment. Does my change clarify the task?

We have not touched the instrumentation so I have no reason to believe that's the cause and we only changed html not js. Maybe EventLogging had some problems but I assume that's out of scope.

chelsyx lowered the priority of this task from High to Normal.EditedOct 4 2017, 8:09 PM

Sorry I'm late for the party.

To be clear, MobileWebSearch only tracks prefix search, which is done through api calls and won't create any search result page views (full-text search), unless users click "search within pages". And currently we don't have any Eventlogging schema tracking users' full-text search behaviors on mobile web, i.e. users' actions after they click "search within pages", or actions on Special:Search page (unless they open search overlay and type, then prefix search would be invoked), are not tracked by eventlogging.

Thank you very much for your help! I understand it's very hard to investigate such a problem happened a while ago since a lot of data may be purged already.

Update: From the dashboard, we noticed that MobileWebSearch events increased drastically on Sep 29, back to the same level before March 29. We will keep watching.

phuedx added a comment.Oct 9 2017, 1:57 PM

@chelsyx: On Thursday, September 28th, the fix for T175918: EventLogging subscriber module in ready state but not sending tracked events was deployed. That bug definitely affected the MobileWebSearch instrumentation. I can't see any changes to the EventLogging or MobileFrontend extensions or the MinervaNeue skin that would also explain that uptick so maybe that bug is the culprit here.

The bug itself caused events to be dropped after the first pageview in a logged out user's session, which explains why we were still receiving data but in a diminished capacity.

I don't believe this explains what happened on Wednesday, 29th March.

Niedzielski renamed this task from [Spike] Did the new mobile header treatment break the search experience? to [Spike 2.5h] Did the new mobile header treatment break the search experience?.Oct 10 2017, 4:53 PM
phuedx updated the task description. (Show Details)Oct 11 2017, 4:17 AM
phuedx updated the task description. (Show Details)Oct 11 2017, 2:39 PM
Niedzielski added a subscriber: Niedzielski.

We've moved this to sign off. @chelsyx, does this look ok to you? Thanks!

@Niedzielski Looks ok to me. Thank you all very much for the help! :D

phuedx closed this task as Resolved.Oct 16 2017, 10:42 AM

🎉🎉🎉

Separately, @Nirzar noted that the number of Special:Search pageviews increased at around the same time. This uptick happened on Friday, 24th February.

Outcomes

Actionable by Research/PO

  • The reason for the increase of Special:Search pageviews is well understood and documented.

@ovasileva: This could be broken out into its own task if necessary. What do you think?

Tbayer added a subscriber: Tbayer.EditedSep 8 2018, 5:29 AM

I just read through this task and am trying to understand the outcome. Is it correct to summarize it as follows?

  • We confirmed that this huge drop in mobile search events coincided exactly with the rollout of the new header (T148514: [EPIC] Improve site branding on Mobile website).
  • We investigated and ruled out a number of possible reasons other than the intended change in user experience itself (e.g. bugs in the instrumentation, bugs in the feature, a change in JS supported browsers, etc.)
  • Thus it appears now vastly more likely that the new header indeed influenced user behavior and caused readers to use the search feature much less often - a drop of over 40% in searches started.

I see that T176464#3656152 listed a couple of other hypotheses that weren't tested, but they seem highly unlikely to me:

spike in traffic from Opera Mini

How would an temporary increase in Opera Mini traffic cause a permanent decrease in search actions? (Also, Opera Mini accounted only for between 1-2% of our mobile web pageviews at that time, so it's pretty much impossible that it could have caused such a large overall effect.)

seasonal (notice the increase in October, maybe due to kids returning back to school)

This can be ruled out, seasonal changes don't look abrupt like this, and in any case there was no corresponding change in pageviews around March 29.

So the hypotheses about impact on user behavior seem much more likely:

the search became more/less discoverable

How would the search becoming more discoverable explain a decrease in search actions? The second option is much more plausible.

people stopped accidentally clicking the header and invoking search due to the smaller more obvious tap area (notice how the click through rate is pretty stable)

Actually the number of clickthroughs dropped too, from (switching the dashboard to weekly medians, and taking the numbers from the week before and after the rollout) 882 to 680. It's true though that this drop is somewhat smaller than the drop in search start events, meaning that the clickthrough ratio (sometimes also referred to as the clickthrough rate) increased. So that could be a partial explanation, but it would still leave us with the conclusion that due to the new design, readers now find 22% less pages via mobile search.

Restricted Application added a project: Product-Analytics. · View Herald TranscriptSep 8 2018, 5:29 AM
Jdlrobson reopened this task as Open.Oct 5 2018, 5:03 PM

@chelsyx can you answer Tilman?

chelsyx moved this task from Triage to Backlog on the Product-Analytics board.Oct 18 2018, 8:40 PM