Page MenuHomePhabricator

Spike: Analyse impact of moving hamburger code back to the top of page
Closed, ResolvedPublic

Description

The change should go live on 28th January.

  • Did clicks to hamburger via JS go up significantly in the past week according to MobileUIWebClickTracking

?

  • Consult first paint graphs and see if it seems to impact first paint in a negative way. Report number of bytes increase in JS and increase in first paint time on 2G.

At first glance, nothing majorly unusual in the graphs but we should check again in a week to check whether cached pages could be masking a problem.

Outcome: Comment on task with result and write an email if there is some decision we need to make
Duration: 1hrs

Event Timeline

Jdlrobson assigned this task to Tbayer.
Jdlrobson raised the priority of this task from to Needs Triage.
Jdlrobson updated the task description. (Show Details)
Jdlrobson set Security to None.
Jdlrobson updated the task description. (Show Details)
Jdlrobson updated the task description. (Show Details)
Jdlrobson renamed this task from Analyse impact of moving hamburger code back to the top of page to Spike: Analyse impact of moving hamburger code back to the top of page.Feb 1 2016, 5:32 PM
Jdlrobson updated the task description. (Show Details)

At first glance, nothing majorly unusual in the graphs but we should check again in a week to check whether cached pages could be masking a problem.

Agreed.

Did clicks to hamburger via JS go up significantly in the past week according to MobileUIWebClickTracking.

The number of hamburger clicks has gone down significantly since Sunday, Jan 31st. Yesterday (Tuesday, 2nd February) shows a little over 30% of the clicks than Sunday. That being said, this appears to follow a general trend in the data and isn't any more alarming than the other relative decreases.

Is this an artefact of how how we're collecting this data? I'm not sure what I meant by this question.

phuedx removed phuedx as the assignee of this task.Feb 3 2016, 2:47 PM
phuedx moved this task from Doing to Code Review on the Reading-Web-Sprint-65-Game of Phones board.

I'd like someone else to go over this too.

Are you guys looking at the page-ui-daily graph at http://mobile-reportcard.wmflabs.org/? Or somewhere else?

This data is a mess and I don't think we can rely on it.

It seems the fix to T108723 would have dropped clicks to hamburger from 50% to 10%. It got merged December 28th so in theory went out on the 14th January and would mess with these results (but then we had lots of deployment issues recently).

Looking at main-menu-daily graph on http://mobile-reportcard.wmflabs.org/

  • Clicks to the main menu are sampled by the defaulting Schema sample rate of 0.5
  • 17th-24th there seems to have been an outage to logging to this table.
  • 28th-30th January clicks to the home link dropped from 50k to 22k - halved without explanation.

Looking at page-ui-daily graph on http://mobile-reportcard.wmflabs.org/

  • On January 10th 124,183 clicks to hamburger
  • On January 18th 35,732 clicks to hamburger. Seems to consistent(ish) with the sampling drop.
  • On Feb 3rd 36,110 clicks to hamburger so it hasn't decreased and it hasn't increased.

My analysis using this unreliable data is that moving to top has had no negative impact on clicks to the hamburger.
Given the downwards trend on main-menu-daily it's possible we actually broke stuff in the process.
My feeling is that something else led to the drop in the clicks to the hamburger as reverting the code to the top doesn't seem to have done much.

@bmansurov to look at Special:MobileMenu PVs.

This is the command I'm running in hue:

SELECT sum(view_count) as view_count
FROM wmf.pageview_hourly 
WHERE
  CONCAT(year, "-", LPAD(month, 2, "0"), "-", LPAD(day, 2, "0")) BETWEEN "2016-01-20" AND "2016-02-03"
  AND agent_type = "user"
  AND access_method = "mobile web"
  AND page_title = "Special:MobileMenu"
GROUP BY CONCAT(year, "-", LPAD(month, 2, "0"), "-", LPAD(day, 2, "0"));

Here is some data:

1/20/201693141
1/21/201691494
1/22/201689930
1/23/201691373
1/24/201697880
1/25/201690401
1/26/201694736
1/27/201690416
1/28/201687676
1/29/201687217
1/30/201693227
1/31/201686292
2/1/201686003
2/2/201689032
2/3/201687110

The pageview API returned the following data (which is surprisingly different from the above). I think we should consider the pageview API results more seriously than the above SQL result:

201601010071083
201601020078221
201601030081032
201601040076537
201601050077073
201601060076383
201601070076616
201601080073165
201601090076475
201601100082121
201601110078088
201601120078404
201601130074516
201601140077893
201601150077283
201601160078635
201601170084057
201601180079734
201601190083828
201601200085446
201601210083945
201601220082640
201601230083787
201601240090233
201601250082931
201601260086488
201601270082819
201601280079581
201601290078842
201601300080419
201601310085648
201602010079084
201602020081346

Quick look at the data tells us that the pageviews of the Special:MobileMenu haven't changed much. We hoped they would go down with the mobile menu patch. I have a theory that since the mobile menu is being loaded via ResourceLoader, there is still a noticeable delay which is preventing users from seeing the menu drawer. I think we should consider reviving https://gerrit.wikimedia.org/r/#/c/260746/ now.

Thanks for running this data @bmansurov.
On the plus side we've not made anything worse.

With async JavaScript the fact is we're never going to be able to load the menu JavaScript code earlier. We might want to hide the menu icon for JavaScript users until the code has finished loading but this could be a user experience degradation for users with JavaScript as it might lead to them not clicking on the menu at all and at worse slowing them down from accessing something in it.

My opinion is by pushing the HTML size down we'll get JavaScript loading quicker and see less traffic to the Special:MobileMenu page (we should keep watching that)

I suggest we send a mail out (mainly for the benefit of Jon Katz who was asking for this change) and get some input from @Nirzar around what we want to do here.

Any other thoughts?

The pageview API returned the following data (which is surprisingly different from the above).

That may be because the linked API result is restricted to the English Wikipedia, whereas the above query covered all projects.

I think we should consider the pageview API results more seriously than the above SQL result:

Why?

The pageview API returned the following data (which is surprisingly different from the above).

That may be because the linked API result is restricted to the English Wikipedia, whereas the above query covered all projects.

Interesting. I thought only enwiki had a page named 'Special:MobileMenu'.

I think we should consider the pageview API results more seriously than the above SQL result:

Why?

The API is more tested. The query may have a flaw in it maybe?

In other news, an email about this task has been sent to reading-wmf.

The pageview API returned the following data (which is surprisingly different from the above).

That may be because the linked API result is restricted to the English Wikipedia, whereas the above query covered all projects.

Interesting. I thought only enwiki had a page named 'Special:MobileMenu'.

No, such special pages usually exist under the same (English, original) name on all projects, although they may redirect to translated names (e.g. https://de.m.wikipedia.org/wiki/Special:MobileMenu --> https://de.m.wikipedia.org/wiki/Spezial:Mobiles_Men%C3%BC ).

I think we should consider the pageview API results more seriously than the above SQL result:

Why?

The API is more tested. The query may have a flaw in it maybe?

OK, good point, but on the other hand Hive is the data source here, and your query above (in the last edited version) seems solid to me. I checked by restricting it to enwiki (and rewriting the GROUP BY just a little bit, Hive/HQL is a bit peculiar about enforcing the use of partition fields there). Fortunately the resulting numbers are indeed exactly the same as in the API result you posted above - the universe is in balance again ;)

SELECT year, month, day, CONCAT(year, "-", LPAD(month, 2, "0"), "-", LPAD(day, 2, "0")) AS date, sum(view_count) as view_count
FROM wmf.pageview_hourly 
WHERE
  CONCAT(year, "-", LPAD(month, 2, "0"), "-", LPAD(day, 2, "0")) BETWEEN "2016-01-20" AND "2016-02-03"
  AND agent_type = "user"
  AND access_method = "mobile web"
  AND project = 'en.wikipedia'
  AND page_title = "Special:MobileMenu"
GROUP BY year, month, day ORDER BY year, month, day LIMIT 10000;

Result (dummy columns removed):

dateview_count
2016-01-2085446
2016-01-2183945
2016-01-2282640
2016-01-2383787
2016-01-2490233
2016-01-2582931
2016-01-2686488
2016-01-2782819
2016-01-2879581
2016-01-2978842
2016-01-3080419
2016-01-3185648
2016-02-0179084
2016-02-0281346
This comment was removed by Tbayer.

Thanks, @Tbayer. (btw, you may want to remove the double post above)

Thanks, @Tbayer. (btw, you may want to remove the double post above)

Done, not sure what happened there (looks a bit like T955, although I don't recall hitting submit twice; I've recently seem the same issue here).

With async JavaScript the fact is we're never going to be able to load the menu JavaScript code earlier. We might want to hide the menu icon for JavaScript users until the code has finished loading but this could be a user experience degradation for users with JavaScript as it might lead to them not clicking on the menu at all and at worse slowing them down from accessing something in it.

I think that showing a user a button that they expect will open a menu but, in fact, might cause their UA to navigate away from the page that they're on is a worse user experience than hiding the button until it will function as expected /cc @Nirzar.

I think that showing a user a button that they expect will open a menu but, in fact, might cause their UA to navigate away from the page that they're on is a worse user experience than hiding the button until it will function as expected.

We could also inline JavaScript. We shouldn't force users to load the whole page just to navigate away from it.

We could also inline JavaScript. We shouldn't force users to load the whole page just to navigate away from it.

I agree in general. However, in this case, the user would be navigating to the Special:MobileMenu page.

So that we're not comparing apples to oranges... our change went out on the 15th Oct 2015 to English Wikipedia to move the menu to the bottom.

@Tbayer 's analysis was based on data collected by event logging and showed a decrease in clicks to the hamburger menu. If this was true I'd expect to see an increase in visits to the Special:MobileMenu page as well....

So... I ran some [[
https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia/mobile-web/user/Special:MobileMenu/daily/20151012/20151013

queries ]]
2015101152915
2015101251007
2015101349496
2015101449984
2015101549340*[moved code to bottom]
2015101647562
2015101750907
2015101853627
2015101950992
2015102049996
2015102146904
2015102251646
2015102368944
2015102468859
2015102571979
2015102667849
2015102769849
2015102869449
2015102968757
2015103070464
2015103171429

If I look at before and after the async JavaScript changes (5th August 2015) there is a much more obvious spike as the changes went into effect.

2015080134871
2015080235071
2015080332477
2015080433282
2015080533571*[async js]
2015080637988
20150807176636
20150808176636
20150809130390
2015081088110
2015081172869
2015081267968
2015081364578
2015081460873
2015081559811
2015081659911

One way to look at this is maybe an increase in click throughs to the menu is a good thing as it could mean slower connections are now interacting with our site.

It's also worth noting that due to cache, we'd need to wait 30 days before being able to draw a complete conclusion here about whether it's helped (so far we only have 6 days).

"think that showing a user a button that they expect will open a menu but, in fact, might cause their UA to navigate away from the page that they're on is a worse user experience than hiding the button until it will function as expected"

Not sure I can fully agree. How do we know? If a user wants to click random maybe this allows them to do so quicker than if we blocked the action until JS has loaded. Surely getting them to their destination is more important and I don't think we can safely say that (on a 2G connection that menu may become completely inaccessible to them if we hide the link).

My personal feeling is we should chew on this problem some more, focus on improving performance of the site so time to first interactive gets so low this is not a problem. My preference would be to revisit this known problem later in the quarter.

Not sure I can fully agree. How do we know? If a user wants to click random maybe this allows them to do so quicker than if we blocked the action until JS has loaded.

To repeat your question: "How do we know?". You're assuming that the cost of navigating to the Special:MobileMenu page is trivial compared to loading a JS asset. I'm not sure agree.

I do agree that we should revisit this at the end of Sprint I.

phuedx changed the task status from Open to Stalled.Feb 8 2016, 2:28 PM

I'm moving this to Done (for now) but not marking it as resolved. There's a lot of important data/conversation here that I'd like to be immediately available when we look again at the end of February.

So that we're not comparing apples to oranges... our change went out on the 15th Oct 2015 to English Wikipedia to move the menu to the bottom.

@Tbayer 's analysis was based on data collected by event logging and showed a decrease in clicks to the hamburger menu. If this was true I'd expect to see an increase in visits to the Special:MobileMenu page as well....

Just to clarify the history and give credit where it's due, my linked analysis was focused on finding out whether that decrease in October was concentrated on particular wikis or user agents (answer: no and no). The fact that a decrease had happened had already been determined at that point, e.g. by yourself based on the "main-menu-daily" graph at http://mobile-reportcard.wmflabs.org/ .

Looking at that graph again now, I'm noticing that around Nov 10 (shortly after we had examined it in that ticket), it bounced back to almost the levels before the mid-October drop. Do we know the reason for that?

Change 276106 had a related patch set uploaded (by Giuseppe Lavagetto):
jobqueue_redis: remove temporarily rdb1003 for reimaging

https://gerrit.wikimedia.org/r/276106

Change 276106 merged by Giuseppe Lavagetto:
jobqueue_redis: remove temporarily rdb1003 for reimaging

https://gerrit.wikimedia.org/r/276106

bmansurov claimed this task.