Page MenuHomePhabricator

Page previews can consume new summary-HTML endpoint
Closed, ResolvedPublic3 Story Points

Description

T165017 introduces a new endpoint that returns HTML. It should be possible for page previews to consume this new API.

AC

QA Notes

@ABorbaWMF (or whoever) make sure you refer to T165018#3354869 before submitting any new bug reports as they may have already been written up. This task is about bug discovery and not fixing them all at once, which'll require larger architectural changes.

Post-sign-off tasks

Dev Notes

  • This'll require a new gateway as we'll want to test this in isolation. The wgPopupsAPIUseRESTBase configuration variable controls whether or not to consume the RESTBase Page Summary API. Perhaps this configuration variable needs revisiting, e.g. it could be superseded by a variable that can have the values "mediawiki", "restbase", and "restbase-experimental"?
    • Alternatively the endpoint can be configurable e.g. wgPopupsApiRESTBaseEndpoint and the new endpoint can simply be used with wgPopupsAPIUseRESTBase
  • Currently the renderer does additional processing on the plain text extract that it gets from the preview model (see ext.popups.renderer#renderExtract), this could be extracted to the preview model, i.e. ext.popups.PreviewModel#getHtml. The new gateway could return a specialised instance of ext.popups.PreviewModel.

Related Objects

StatusAssignedTask
Resolvedovasileva
Resolved Jdlrobson
DuplicateNone
DuplicateNone
Resolvedovasileva
Resolvedovasileva
Resolvedovasileva
Resolvedphuedx
Resolvedphuedx
DuplicateNone
Resolved Jdlrobson
Resolved Jdlrobson
DuplicateNone
Duplicateovasileva
Resolvedovasileva
DuplicateNone
DeclinedNone
Duplicate Jdlrobson
ResolvedMhurd
DeclinedJMinor
Resolvedphuedx
ResolvedPchelolo
Resolved Jdlrobson
DeclinedPchelolo
DeclinedNone
OpenNone
Resolvedphuedx
Declined Jdlrobson
DuplicateNone
ResolvedFjalapeno
Resolvedphuedx
Declinedpmiazga
DeclinedNone
Resolvedphuedx
DeclinedNone
ResolvedPchelolo
ResolvedbearND
ResolvedMholloway
ResolvedMSantos
ResolvedMholloway
InvalidNone
Resolved Jdlrobson
InvalidNone
DuplicateNone
Resolved Jdlrobson
Resolved Jdlrobson
Resolved Jdlrobson
Resolved Jdlrobson
Resolvedphuedx
ResolvedbearND
ResolvedMholloway
DuplicateNone
Resolved Jdlrobson
Resolved Jdlrobson
Resolvedphuedx
Resolved Jdlrobson
Resolved Jdlrobson
ResolvedbearND
Resolved Jdlrobson
ResolvedMholloway
ResolvedMholloway
Resolved Jdlrobson
Resolved Jdlrobson
ResolvedbearND

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 358415 had a related patch set uploaded (by Pmiazga; owner: Pmiazga):
[operations/mediawiki-config@master] Remove unused wgPopupsAPIUseRESTBase config variable

https://gerrit.wikimedia.org/r/358415

Quoting myself from my review of rEPOP8d2b7212c03c: Handle Restbase HTML response:

I'd like to chew over the pattern of exposing the extract parser as part of the gateway object level rather than exposing it at the factory level. I'm not sure that I can capture my thinking in Gerrit though. I'll comment on the associated Phab task.

The gateway factories should return objects whose shapes follow the *Gateway interfaces.

Prior to your change, we seem to have exposed the internals of the gateway modules as part of the objects returned by the top-level factory function, e.g.

module.exports = factory( fetch ) {
  return {
    interfaceFn: function () {
      return fetch( 'Foo' )
        .then( helper );
    },
    helper: helper
  };
};

function helper() {
  return 'bar';
}

We then exercise helper directly rather than via the interfaceFn method in our unit tests.

As part of rEPOP8d2b7212c03c: Handle Restbase HTML response, we've continued this pattern for the MediaWiki API gateway but tidied up the RESTBase gateways. We should apply the same refactoring to all of them.

This is outside of the scope of your change but it might be worth a @TODO.

ovasileva updated the task description. (Show Details)Jun 13 2017, 4:04 PM

Change 357808 merged by jenkins-bot:
[mediawiki/extensions/Popups@master] Implement html/rest.js gateway which handles HTML Restbase responses

https://gerrit.wikimedia.org/r/357808

My favourite screenshot from testing @pmiazga's change locally – in a "wow, that's nice!" sorta way:

/cc @ovasileva @Nirzar

@ABorbaWMF I'll assign this to you when the Beta Cluster and the staging server have been configured for you to test this change on.

pmiazga updated the task description. (Show Details)Jun 14 2017, 7:05 PM

Change 359123 had a related patch set uploaded (by Phuedx; owner: Phuedx):
[operations/mediawiki-config@master] pagePreviews: Consume HTML from RESTBase endpoint

https://gerrit.wikimedia.org/r/359123

My favourite screenshot from testing @pmiazga's change locally – in a "wow, that's nice!" sorta way:

/cc @ovasileva @Nirzar

:D

Change 359123 merged by jenkins-bot:
[operations/mediawiki-config@master] pagePreviews: Consume HTML from RESTBase endpoint

https://gerrit.wikimedia.org/r/359123

phuedx updated the task description. (Show Details)Jun 15 2017, 12:23 PM

^ I can confirm that PP is rendering the HTML part of the RESTBase response on the Beta Cluster.

phuedx added a comment.EditedJun 15 2017, 12:33 PM

On reading web staging when I hover over a link I see a text extract in HTML as provided by the new [summary] endpoint.

@ovasileva: Since there's an instance of RESTBase running on the Beta Cluster, I've opted to do this there. Let me know if this is the wrong way round.

phuedx updated the task description. (Show Details)Jun 15 2017, 12:34 PM
phuedx reassigned this task from pmiazga to ABorbaWMF.Jun 15 2017, 12:45 PM
phuedx added a subscriber: pmiazga.

@ABorbaWMF: This is now available to test on the Beta Cluster. This is both a general change to the layout of the codebase as well as a change to where the content of the previews come from, you should check that there aren't any regressions. I guess @ovasileva and @Nirzar are going to be looking at the test articles in the description.

@phuedx - that works, now on to testing!

Change 359203 had a related patch set uploaded (by Jhernandez; owner: Jhernandez):
[mediawiki/extensions/Popups@master] Hygiene: Capture jQuery at construction

https://gerrit.wikimedia.org/r/359203

Change 359204 had a related patch set uploaded (by Jhernandez; owner: Jhernandez):
[mediawiki/extensions/Popups@master] Hygiene: Move createGateway to gateway/index.js

https://gerrit.wikimedia.org/r/359204

Change 359205 had a related patch set uploaded (by Jhernandez; owner: Jhernandez):
[mediawiki/extensions/Popups@master] Fix the npm script test:node

https://gerrit.wikimedia.org/r/359205

Change 359206 had a related patch set uploaded (by Jhernandez; owner: Jhernandez):
[mediawiki/extensions/Popups@master] Hygiene: Simplify gateways

https://gerrit.wikimedia.org/r/359206

Change 359207 had a related patch set uploaded (by Jhernandez; owner: Jhernandez):
[mediawiki/extensions/Popups@master] Hygiene: Rename builder vars on require preview/model

https://gerrit.wikimedia.org/r/359207

I've submitted some hygiene changes related to this patch. No change in functionality should occur.

Testing on the Beta cluster with the Dog article. I noticed problems with some of the previews. I'm not sure what is common among these preview types, but they are appearing as blank windows. Check out the video:

Change 359203 merged by jenkins-bot:
[mediawiki/extensions/Popups@master] Hygiene: Capture jQuery at construction

https://gerrit.wikimedia.org/r/359203

@ABorbaWMF thank you for your finding. The problem is that the TextExtracts return

<p>(Blank.)</p>

(Blank.) gets stripped and empty <p> stays. The content is not empty as there is an empty <p> tag. I'll add a logic to show a generic preview when preview contains only empty elements (like p, b), no text nodes.

This comment was removed by ovasileva.

Nice spot @ABorbaWMF! This is a genuine bug.

Devs: the HTML returned in a response from RESTBase is wrapped in a p. All of the articles that produce empty previews in @ABorbaWMF's video have the same content: "(Blank.)", meaning that the extract_html property of the RESTBase response is '<p>(Blank.)</p>' and '<p></p>' after processing, i.e. it ain't empty.

I think this should be fixed by stripping the wrapping p wrapper element from the extract_html property as it causes this issue and also messes with the padding/margins of the preview.

We have math! And it's looking pretty nice

@Nirzar - the last line is a bit cut off still. Thoughts?

@Nirzar - the last line is a bit cut off still. Thoughts?

@ovasileva: This might be a result of not stripping the wrapper p that I mentioned in T165018#3354254.

Change 359204 merged by jenkins-bot:
[mediawiki/extensions/Popups@master] Hygiene: Move createGateway to gateway/index.js

https://gerrit.wikimedia.org/r/359204

I think this should be fixed by stripping the wrapping p wrapper element from the extract_html property as it causes this issue and also messes with the padding/margins of the preview.

We also have to handle the extract containing more than one p, when the first one possibly being empty.

list looking nice:

giant comment with all current issues I've found. Let's discuss first before we open separate bugs

Multiple bold elements in preview

Steps to reproduce

  1. go to: https://en.wikipedia.beta.wmflabs.org/wiki/Dog
  2. Hover over “Canidae”

Expected behavior: Canidae should be bolded
Observed behavior: Canidae and canid are bolded

NOTE: this behavior is acceptable, @ovasileva to add to requirements

Certain templates are not stripped from previews

Steps to reproduce

  1. go to: https://en.wikipedia.beta.wmflabs.org/wiki/Dog
  2. Hover over “Dog anatomy”

Expected behavior: preview reads “Dog anatomy….”
Observed behavior: expand language template appears

Similar behavior when hovering over “great dane”

Parentheticals stripped improperly

Steps to reproduce

  1. go to: https://en.wikipedia.beta.wmflabs.org/wiki/Planet
  2. Hover over “India”

Observed behavior: script error is stripped, but the parentheses remain

Extract cut off/too long

NOTE: I think this is the same as T165018#3354714 Steps to reproduce
  1. Go to: https://en.wikipedia.beta.wmflabs.org/wiki/Birdman_(film)
  2. Hover over Alejandro G. Iñárritu

Observed behavior:

Expected behavior: last line of text is not cut off/does not appear

Horizontal gradient appearing incorrectly

Steps to reproduce

  1. Go to: https://en.wikipedia.beta.wmflabs.org/wiki/Category:Living_people
  2. Hover over “Andrew Miller”

Expected behavior: Horizontal gradient does not appear
Observed behavior: top of horizontal gradient overlaps text


Extracts for lists displaying partial previews

Steps to reproduce

  1. Go to: https://en.wikipedia.beta.wmflabs.org/wiki/Marko_Saaresto
  2. Hover over “Finnish”

Expected: generic preview
Observed: preview ends with the first paragraph

NOTE: this isn't technically a bug, but might as well fix it now. I think the best way would be to display the generic preview for all extracts that end in a colon

Previews do not concatenate multiple paragraphs

Steps to reproduce

  1. Page: https://en.wikipedia.beta.wmflabs.org/wiki/Category:Articles_with_%27species%27_microformats
  2. Hover over “european mink”

Observed behavior: multiple paragraphs (also broken length and gradient)


Expected behavior: paragraphs should be concatenated

Welcome to the land of HTML and scope increase!

Extract cut off/too long
Previews do not concatenate multiple paragraphs

These are the same problem.
@ovasileva paragraph concatentation is going to be very difficult (i'm not sure possible?) HTML elements themselves have different margins/line heights so the approach we have right now to fix it with fixed heights will not work

Extracts for lists displaying partial previews

This specific problem is an issue with disambiguation pages and tracked in T130671.
If you take a look at https://en.wikipedia.beta.wmflabs.org/wiki/List_of_bands_from_Finland they are ordered by headings. There is no list and we can't show every single page.

Horizontal gradient appearing incorrectly

looks like we need to reopen T165974

Certain templates are not stripped from previews

This is an editor problem and we already have a mechanism to easily fix this: https://www.mediawiki.org/wiki/Extension:Popups#How_can_I_remove_content_from_a_page_preview.3F

Multiple bold elements in preview

Welcome to HTML :) Note HTML can obviously contain bold items. On top of this we try to bold the title of the page we are linking to. IMO it's dangerous to unbold things as they might be bolded for a good reason, but we could update the TextExtracts to strip bold elements.. although we should think about that very very carefully. TextExtracts API is used for many purposes.

Parentheticals stripped improperly

This is because we are doing something very hacky to remove them. All this is captured in T91344.

Multiple bold elements in preview

There is no logic for bolding title elements. If in orignal text extract Title is not bolded the HTML preview won't show bold titles. If we want to make Title bold also in HTML previews we have to implement that. Now the only processing applied to the extract is strip parenthesis and removing trailing ellipsis. Nothing else.

Certain templates are not stripped from previews

This shouldn't happen on production as beta cluster doesn't have all templates. Those extra texts are visible only because template parser failed.

Extract cut off/too long

This one is tricky as HTML can hold styling and add some margins to text. Probably the best solution would be to add a gradient on the bottom so preview always smoothly transition to transparent.

Horizontal gradient appearing incorrectly

look comment above

Extracts for lists displaying partial previews

This is a general error (happens also in plain text extract). This is the content we retrieve from TextExtracts API, The following is a list of bands from Finland: is everything we got. Please file a separate ticket for this one.

Previews do not concatenate multiple paragraphs

This one is probably most tricky. We can concatenate paragraphs in Frontend but in HTML world we every element can hold some styling. This is a standalone task.

This one is tricky as HTML can hold styling and add some margins to text. Probably the best solution would be to add a gradient on the bottom so preview always smoothly transition to transparent.

We need to think hard about fixing this. Can we strip out style elements? I want to avoid vertical gradient for this.

@Nirzar We can strip all style/cass attributes and we additionally can define a default styles for every tag (ul/ol/p/img) inside page preview.

By default we strip those elements:

			"table",
			"div",
			"style",
			"ul.gallery",
			".mw-editsection",
			"sup.reference",
			"ol.references",
			".error",
			".nomobile",
			".noprint",
			".noexcerpt",
			".sortkey"
Nirzar added a comment.EditedJun 16 2017, 5:54 PM

@pmiazga what are the elements we allow? only ul/ol/p

what about b, strong?

TextExtracts allows everything else other than the list plus wikimedia-config adds couple more: .metadata and 'pan.coordinates, span.geo-multi-punct, span.geo-nondefault' #coordinates
Everything other stays - so <b>, <strong>, <i>, <inderline>, <span>, <ol>, <ul> even <img> and others are allowed.

Change 359206 merged by jenkins-bot:
[mediawiki/extensions/Popups@master] Hygiene: Simplify gateways

https://gerrit.wikimedia.org/r/359206

Change 359207 merged by jenkins-bot:
[mediawiki/extensions/Popups@master] Hygiene: Rename builder vars on require preview/model

https://gerrit.wikimedia.org/r/359207

@ovasileva - I have an idea for multiple paragraphs, much easier way would be stripping everything from the second paragraph. We would show only the first paragraph and proceeding elements that are not paragraphs (for example Planet - we would show first paragraph and a list).

pmiazga updated the task description. (Show Details)Jun 19 2017, 3:21 PM
pmiazga updated the task description. (Show Details)

Change 359954 had a related patch set uploaded (by Pmiazga; owner: Pmiazga):
[mediawiki/extensions/Popups@master] Do not show empty HTML previews

https://gerrit.wikimedia.org/r/359954

@pmiazga - re: paragraphs - I think that makes sense

@phuedx - for empty html previews - for now, I'm not removing <p> element as we might do some more advanced crunching later. I decided to use $().text() to look for text existence. With that approach I can remove both simple structures like <p></p> and bit more complex like <p><ol><li></li></ol></p>.

Now I'm wondering, are there any text extracts that contain an image only? eg <img src="..." />. My solution will render a generic preview instead of an image.

phuedx reassigned this task from ABorbaWMF to pmiazga.Jun 19 2017, 5:05 PM

@pmiazga's working on this.

just documenting, we decided to use the generic preview for image-only previews during standup.

I've -1ed https://gerrit.wikimedia.org/r/#/c/359954/2 on the basis that we should be very disciplined about removing the client side hackery and move this logic into the service. I'd like a few more thoughts about whether I'm being unreasonable.

I don't think this blocks completing this task however. The sign off criteria has been met...

I've -1ed https://gerrit.wikimedia.org/r/#/c/359954/2 on the basis that we should be very disciplined about removing the client side hackery and move this logic into the service. I'd like a few more thoughts about whether I'm being unreasonable.

I think this is as good a place as any to draw a line in the sand. This brief test has unearthed a bunch of issues that need careful consideration – least of all, us writing down a definition of what constitutes "a preview" – and any feeling that you're being unreasonable is likely misdirected frustration that we haven't given ourselves much time to plan this work much further than this.

@ovasileva: Were we expecting @ABorbaWMF to do more QA on this? If so, then I'll move this task back to Needs QA.

phuedx updated the task description. (Show Details)Jun 22 2017, 10:27 AM
phuedx reassigned this task from pmiazga to ovasileva.
ovasileva reassigned this task from ovasileva to ABorbaWMF.

@phuedx - yup, I think @ABorbaWMF will do some more exploratory qa (if any bugs are found, they'll be filed separately however)

Looks mostly fixed. I have been testing on beta using the pages listed above by @ovasileva and some other pages as well. I tried on Chrome, Firefox, Safari, Edge, and Opera and on Mac and Windows.

Results:
Lists cut-off at the bottom (new bug?)


Multiple bold Items looks fine

Templates (still happens)

Parenthesis (still happens)

Long Extract looks fine

Horizontal Gradient/Andrew Miller looks fine


Extracts for lists displaying partial previews (may be broken? list items are missing)

Concatenate multiple paragraphs (not exactly like expected, but looks good)

phuedx reassigned this task from ABorbaWMF to ovasileva.Jun 23 2017, 10:24 AM

Moving this to Ready for Signoff as @ABorbaWMF's hasn't discovered any issues that we didn't already know about – this is a good thing, right?

@ABorbaWMF - did you get a chance to test through multiple browsers?

Formulas don't seem to be appearing in Chrome:

@ABorbaWMF - final check, could you go over these in across a few different browsers?

  • Article: Lie group, hover over lie algebra, formulas should appear in preview
  • Article: Earth, hover over planet, list should appear
  • Article: Berlin, hover over Germany

@ovasileva, yes I tried multiple browsers. What article are you using for the math page previews?

@ABorbaWMF - lie group, hovering over lie algebra. @Nirzar first noticed this, I could replicate as well

Sorry I meant to hit submit earlier on this one. I am also seeing the issue with math formulas, but only on some browsers/systems.

Looks good on:
Mac - Safari
Win or Mac - Firefox

Looks bad on:
Mac or Win - Chrome
Mac or Win - Opera
Win - IE
Win - Edge

Krinkle removed a subscriber: Krinkle.Jun 27 2017, 2:54 AM
ovasileva closed this task as Resolved.Jun 27 2017, 11:55 AM

Change 359954 abandoned by Pmiazga:
Do not show empty HTML previews

Reason:
Instead of moving more logic into frontend we decided to create a service in the backend to crunch page previews data. With that approach this change is not necessary.

https://gerrit.wikimedia.org/r/359954

Change 358415 abandoned by Pmiazga:
Remove unused wgPopupsAPIUseRESTBase config variable

Reason:
Abandoned as change is already merged into config

https://gerrit.wikimedia.org/r/358415

Restricted Application added a subscriber: jeblad. · View Herald TranscriptAug 25 2017, 2:49 PM