Page MenuHomePhabricator

Lead introduction edge case in MobileFrontend where paragraph precedes list element
Closed, ResolvedPublic3 Estimated Story Points

Description

Background
When opted into beta and viewing the lead paragraph...

https://en.m.wikipedia.org/wiki/Planet is a rather interesting specimen. The lead paragraph is the text "A planet is an astronomical object orbiting a star or stellar remnant that". It is followed by a list item.

Screen Shot 2016-11-29 at 9.44.55 AM.png (526×382 px, 128 KB)

In the apps (see T111958 for historic purposes) they work around this issue by combining any trailing elements after the first paragraph (but there is also some logic to check paragraph length)

I don't think this is the right behaviour.
Using the 2nd paragraph and 3rd child element (after p,ul), changes the order and flow of the content that our editors have agreed upon.~
[edit this doesn't relate to this particular problem my bad!]

Acceptance Criteria
Move any elements between the lead paragraph and the next paragraph element that are not themselves paragraphs.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Jdlrobson renamed this task from Lead paragraph edge case to Lead paragraph edge case where paragraph precedes list element.Nov 2 2016, 9:43 PM

To clarify the transform, the current behavior in this case is to construct a lead span consisting of the first good paragraph element plus any "trailing nodes" we find until we hit a new paragraph.[1] So if you look at the apps you'll see that the first paragraph is "A planet is an astronomical object..." plus the bulleted list.

That said, the transform also includes a rule (not triggered in this case) to skip paragraphs below a certain minimum length (currently 40 characters) and that might not be a desirable solution.

[1] https://phabricator.wikimedia.org/diffusion/GMOA/browse/master/lib/transformations/relocateFirstParagraph.js

Consider

These are fruit:
* Apple
* Banana

I tend to agree that the arbitrary minimum length should go. Maybe we should skip only para elements where the textContent or innerText is actually empty? That's the case I think the paragraph-skipping check of this transform was really meant to address, looking at the comments.

Jdlrobson renamed this task from Lead paragraph edge case where paragraph precedes list element to Lead paragraph edge case in MobileFrontend where paragraph precedes list element.Nov 8 2016, 10:00 AM
Jdlrobson updated the task description. (Show Details)

This is tracking for MCS. The bug is in MobileFrontend.

i dont see any clipping of text in apps. as far as the behavior goes. the iOS app logic seems to work. when we say paragraph it's not the semantic <p> tag. we should really have better naming around it. lead section perhaps.

We've opted for "introduction" in the new MCS endpoint. This is defined as the opening paragraph and any text that follows that itself is not inside a paragraph.

ovasileva renamed this task from Lead paragraph edge case in MobileFrontend where paragraph precedes list element to Lead introduction edge case in MobileFrontend where paragraph precedes list element.Nov 9 2016, 5:17 PM
ovasileva triaged this task as High priority.
ovasileva updated the task description. (Show Details)
ovasileva moved this task from Incoming to Triaged but Future on the Web-Team-Backlog board.
ovasileva set the point value for this task to 3.Nov 29 2016, 5:52 PM

Can we create a factory and use

  • MediaWiki API as a default provider (current behavior)
  • implement new paragraph provider that loads introduction from RestBase - and set up this as a default provider on Wikipedia pages

Then we could use what MCS/Restbase provides us. We already did sth similar with Popups (loading image data via restbase)

I'm not against it. A clean upgrade path is a must have, though: if we choose to leverage MCS to provide a "better" experience, then we should also choose to remove the transforms from the base experience – the one which leverages MobileFormatter.

Is consuming rest services from the backend a good idea? Will the latency make it have worse performance? What happens with third party wikis?

Mimicking findFirstGoodParagraphIn(nodes) in MobileFormatter right now may be the best technical option with our current constraints.

@Jhernandez - I think we can consume MCS on the backend - it shouldn't be a problem as RestBase is supposed-to-be super fast. This is the idea of Microservices -> you have many small standalone services and you request things across the network. We're doing a network call anyway to retrieve article data -> it may be memcache, it may be a database, it might be RestBase, also one extra call to services in MWCore shouldn't noticeable.

My initial idea was to have a factory on the client and use some provider/decorator to fetch data from MCS.

Well @pmiazga cache or DB servers are localized close to the application servers to minimize network latency, I'm not sure the same is true when calling to the rest API. I don't think all application servers are localized with the rest API servers.

What happens with third party wikis? Is the approach to maintain two implementations? One that uses the service, and a php fallback for 3rd parties?

I'm pro services, but with our current constraints and stance on who/where the software needs to run, I strongly believe that maintaining 2 loading strategies (service + php fallback for 3rd parties) is a lot more costly and complex than duplicating a function into the php formatter.

Is consuming rest services from the backend a good idea? Will the latency make it have worse performance? What happens with third party wikis?

Free knowledge first. Third party wikis second. Our CTO has said this.
Third party wikis can setup their own RESTBase server.

I'm pro services, but with our current constraints and stance on who/where the software needs to run, I strongly believe that maintaining 2 loading strategies (service + php fallback for 3rd parties) is a lot more costly and complex than duplicating a function into the php formatter.

I think these constraints are not real. MobileFormatter only exists as a workaround for not having MCS and is probably not useful for most 3rd parties anyway (section support is probably the only useful thing there).

I'm not against it. A clean upgrade path is a must have, though: if we choose to leverage MCS to provide a "better" experience, then we should also choose to remove the transforms from the base experience – the one which leverages MobileFormatter.

Agreed. We should aim to sunset MobileFormatter and mobileview and get <section> support in core.
MobileFrontend's Minerva skin would then simply be a skin just like Vector which renders exactly the same content. The PHP fallback would just be normal mediawiki and would serve most 3rd parties fine (especially once TemplateStyles is available). Optimisations such as lazy loaded images would move to MCS.

Does this task need more analysis/meeting to discuss?

I think we're conflating two different tasks. This is a bug about Edge case in MobileFrontend where paragraph precedes list element

Fixing it is a few unit test and a function on the current mobile formatter.

If we want to sunset MobileFormatter and MobileView in favor of the mobile rest API, I'm fine with that, but it should be in another Epic.

Let's organize work properly and prioritize accordingly. If my house has damp spots, I'm sure I could fix it by moving to a different house, but that's another topic from if I should fix these damp spots in the current house or when.

Fair points, I think the original argument was that rather than finding all damp spots in your house and fixing them, and moving to another house after that, why not move right away? But given the urgency of the task, it makes sense to fix the problem at hand and then think about an alternative approach.

  1. Fix the bug. Ship the lead paragraph moving as is
  2. Remove all the code. Move to MCS (T101046 and T145219)

As @bmansurov points out the urgency leads us to do #1 and delivers user value in short time. Let's move any relevant discussions about MCS to the bugs in #2

We re-estimated this today and it seems we are still confident in the estimation and this is ready to go for next sprint.

Change 350329 abandoned by Pmiazga:
Move paragraph and immediate list element before infobox

Reason:
incorrect commit

https://gerrit.wikimedia.org/r/350329

Change 350330 had a related patch set uploaded (by Pmiazga):
[mediawiki/extensions/MobileFrontend@master] Move paragraph and immediate list element before infobox

https://gerrit.wikimedia.org/r/350330

pmiazga moved this task from Needs Code Review to Needs More Work on the Reading-Web-Sprint-96 board.

I just noticed different acceptance criteria. I have to amend the patch as current solution moves only ol|ul elements.

After investigating/playing with different test articles I recommend moving only lead paragraph with ol or ul elements for now as it solves all bad articles we're familiar of.

Moving lead paragraph and all elements before the second paragraph (if any) might cause to some strange article modifications, as for example:

<table class="infobox"></table><p>Lead paragraph</p><table class="infobox"></table><p>second paragraph</p>

and in this scenario we want to move only Lead paragraph (without second infobox).
Additionally, we would have to skip div elements as they usually contain more complex structures or even infoboxes. The only thing that comes to my mind that we could also move along with lead paragraph is math but I'm not familiar with any example for now.

Let's leave current solution as it is, and then when we find more broken articles we can improve existing solution.

Change 350330 merged by jenkins-bot:
[mediawiki/extensions/MobileFrontend@master] Move paragraph and immediate list element before infobox

https://gerrit.wikimedia.org/r/350330

We need a person who can sign off this task - @bmansurov, @phuedx, @Jhernandez - who wants to be T149852 champion?

This can be tested using the magic patch and navigating to the Planet article in beta.

👍 The Planet page LGTM. @Jdlrobson: You mentioned during code review that you'd tested the change on 20 or so pages. Any others – the edge-casier the better!!1 – we can test against?