Page MenuHomePhabricator

Bug: HTMLFormatter References drawer behaviour broken in stable and beta (closes instantly for some references)
Closed, ResolvedPublic3 Estimated Story Points

Description

There is a bug with lazy loaded references that leads to references not showing up and the drawer being closed.

Steps to reproduce

  1. Opt into beta
  2. Visit https://en.m.wikipedia.org/wiki/Barack_Obama
  3. Search for "[11]"
  4. Tap "[11]"

Expected results

The reference drawer is shown and stays open until dismissed manually

Actual results

Sometimes nothing happens, other times the drawer opens and immediately slams shut!
note: this is happening on stable as well

https://drive.google.com/file/d/1qEDv9fU89pAyl7StrHpu7uwoasB9bEis/view?usp=sharing

Environments observed

Browser Version:

  • Chromium v64.0.3282.167 (Official Build) Built on Ubuntu , running on Ubuntu 17.10 (64-bit)
  • Mobile web BETA mode

OS Version:

  • Ubuntu v17.10
  • Android v8.1.0

Device Model:

  • Desktop
  • Pixel XL

Device Language:

  • English

Developer notes

The bug is inside ReferencesHtmlScraperGateway.getReferenceFromContainer
I started by investigating beta. Inside ReferencesDrawer.showReference
gateway is an instance of ReferencesMobileViewGateway
The call to getReference throws an error
ReferencesGateway.ERROR_NOT_EXIST

This bug, as reported by @Mhurd in T183049, is caused because of unexpected encoding occurring on the HTML.

Mobile:

<sup id="cite_ref-Obama_1995,_2004,_pp._9–10_11-0" class="reference"><a href="#cite_note-Obama_1995,_2004,_pp._9%E2%80%9310-11">[11]</a></sup>

Desktop:

<sup id="cite_ref-Obama_1995,_2004,_pp._9–10_11-0" class="reference"><a href="#cite_note-Obama_1995,_2004,_pp._9–10-11">[11]</a></sup>

The problem lies somewhere inside the HTMLFormatter. After 20 mins debugging it seems that the problem relies inside MobileFormatter::filterContent or HtmlFormatter::getDoc

A client side fix can be made but it feels a little hacky. It would look like this:

  • If the reference cannot be found, attempt to call decodeURIComponent on the href and try again.

A server side fix will be trickier as it's likely to be related to the loadHTML method of DOMDocument. We do not directly modify the href attributes of the link.

Event Timeline

Jdlrobson renamed this task from [Bug] References drawer closes instantly for some references to [Bug/Beta] References drawer closes instantly for some references.Feb 28 2018, 11:05 PM
Jdlrobson updated the task description. (Show Details)
ovasileva triaged this task as Medium priority.Mar 5 2018, 5:13 PM

There are 2 issues here - in stable clicking the reference jumps to the bottom of the page rather than opening the drawer and in beta it opens and closes the drawer.

ovasileva raised the priority of this task from Medium to High.Mar 6 2018, 5:26 PM
ovasileva updated the task description. (Show Details)
ovasileva set the point value for this task to 3.Mar 6 2018, 5:28 PM
ovasileva subscribed.

Currently consensus is around 3, but might be higher pending analysis

I was a 5, needs re-estimation once we know more. Jon has signed up to look into it a bit.

Jdlrobson renamed this task from [Bug/Beta] References drawer closes instantly for some references to Bug: References drawer behaviour broken in stable and beta (closes instantly for some references).Mar 6 2018, 5:28 PM
Jdlrobson renamed this task from Bug: References drawer behaviour broken in stable and beta (closes instantly for some references) to Bug: HTMLFormatter References drawer behaviour broken in stable and beta (closes instantly for some references).Mar 6 2018, 10:43 PM
Jdlrobson removed a project: User-Jdlrobson.
Jdlrobson updated the task description. (Show Details)
Jdlrobson added a subscriber: Mhurd.
Jdlrobson removed the point value for this task.

Given my analysis I suggest there is enough significant information to re-estimate this...

Sounds like a good chance to add a failing test and then get it to pass 😄

ovasileva set the point value for this task to 3.Mar 7 2018, 6:32 PM

I don't think it's easily fixable by PHP as this is default behaviour of libxml2 library. The node attributes are escaped on HTML save.

Flow:

a) HtmlFormatter:getText calls saveHTML
b) PHP DomDocument on saveHTML (internal method dom_document_save_html) calls htmlNodeDump from libxml2 library
c) htmlNodeDump in libxml2 library iterates over children and for each child calls htmlAttrDumpOutput which returns escaped content

Probably changing the JS behaviour is the only correct way as this is default libxml2 behavior.

Change 420750 had a related patch set uploaded (by Pmiazga; owner: Pmiazga):
[mediawiki/skins/MinervaNeue@master] References should decode the href attribute before references lookup

https://gerrit.wikimedia.org/r/420750

pmiazga moved this task from Doing to Needs More Work on the Readers-Web-Kanbanana-Board-Old board.
pmiazga subscribed.

Change 420812 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/extensions/MobileFrontend@master] ReferenceGateway can handle encoded components

https://gerrit.wikimedia.org/r/420812

https://gerrit.wikimedia.org/r/#/c/420812/ fixes the problem

The issue was , and . characters as well as encoding...

Change 420750 abandoned by Jdlrobson:
References should decode the href attribute before references lookup

Reason:
We've decided to do this in MobileFrontend to keep all the processing together.

https://gerrit.wikimedia.org/r/420750

Change 420812 merged by jenkins-bot:
[mediawiki/extensions/MobileFrontend@master] ReferenceGateway can handle encoded components

https://gerrit.wikimedia.org/r/420812

I don't think this needs design sign off or technical sign off.

Confirmed on beta for the Barack Obama article as well. Resolving