Page MenuHomePhabricator

WWT: Original article HTML not necessarily preserved (was: Images have their links changed from Commons to local) [medium]
Open, Needs TriagePublicBUG REPORT

Description

What is the problem?

WWT extension does not necessarily preserve the HTML of the article as it originally was, even after closing.

I think there are two parts to this bug:

  1. The HTML returned by WhoColor is not exactly the same as the HTML in the original article.
  2. At the point when the WWT extension stores the original HTML of the article, not all the HTML has been loaded. When it attempts to restore the original HTML, some things might be missing. (I tested this by using the browser debugger to stop loading of the page at the point the WWT ActivationSingleton is being initialised.)

The first might be tricky, because WhoColor relies on the MediaWiki wikitext parsing API, which presumably ignores any HTML added to the article dynamically (i.e. by JavaScript).

For the second, is it possible to make the WWT extension the very last thing to be loaded, so the article has been fully generated?

Example 1

When using the WhoWroteThat tool on an article, any images which linked to Commons have their links changed to the local counterpart.

For example, https://commons.wikimedia.org/wiki/File:Nasa_brazil_fires_20190820.jpg gets changed to https://es.wikipedia.org/wiki/Archivo:Nasa_brazil_fires_20190820.jpg.

From experimenting, the <a> element has its href changed. The <img> element appears untouched.

Therefore, I don't think the images as they appear on the page will be changed. Just where you go when you click on them.

Steps to reproduce problem

Install the extension using the instructions.

  1. Go to https://es.wikipedia.org/wiki/Incendios_de_la_selva_amaz%C3%B3nica_de_Brasil_de_2019
  2. Click on any image on the page, see it goes to https://commons.wikimedia.org
  3. In the left sidebar, click "Who Wrote That?"
  4. If WWT reports an error, close the infobar at the top
  5. Click on the same image you clicked on in step 2
  6. Close WWT
  7. Click the same image from step 2

Expected behavior: Takes you to https://commons.wikimedia.org
Observed behavior: Takes you to https://es.wikipedia.org

Background

Some wikis can display images from foreign repositories, such as Commons. This appears quite common on https://es.wikipedia.org.

Example 2

The {{Coord}} template has its WikiMiniAtlas removed.

Steps to reproduce problem
  1. Go to https://en.wikipedia.org/wiki/Flight_19#PBM-5_(Bureau_Number_59225)
  2. If you hover over "29°N 79°W", you will see a little dropdown "WikiMiniAtlas"
  3. If you click on "WikiMiniAtlas", an atlas will appear
  4. Click "Who Wrote That?"
  5. Once it has loaded, close WWT
  6. Drop down no longer appears; Atlas will no longer work

Expected behavior: Atlas will work again
Observed behavior: Dropdown does not appear, atlas does not work

More examples
Environment

Browser: Chromium 73, Firefox 60
Wiki(s): https://es.wikipedia.org

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 28 2019, 9:41 AM
dom_walden updated the task description. (Show Details)Aug 28 2019, 1:32 PM
dom_walden renamed this task from [BUG] Images have their links changed from Commons to local to Original article HTML not necessarily preserved (was: Images have their links changed from Commons to local).Sep 10 2019, 10:05 AM
dom_walden updated the task description. (Show Details)
Restricted Application added a subscriber: Liuxinyu970226. · View Herald TranscriptSep 10 2019, 10:05 AM
dom_walden updated the task description. (Show Details)Sep 10 2019, 10:20 AM
dom_walden updated the task description. (Show Details)Sep 10 2019, 11:21 AM
ifried renamed this task from Original article HTML not necessarily preserved (was: Images have their links changed from Commons to local) to Original article HTML not necessarily preserved (was: Images have their links changed from Commons to local) [medium].Sep 12 2019, 5:53 PM
ifried moved this task from To Be Estimated/Discussed to Estimated on the Community-Tech board.
ifried renamed this task from Original article HTML not necessarily preserved (was: Images have their links changed from Commons to local) [medium] to WWT: Original article HTML not necessarily preserved (was: Images have their links changed from Commons to local) [medium].Sep 18 2019, 12:03 AM
Samwilson added a subscriber: Samwilson.

The image links on eswiki are modified by the a-commons-directo.js gadget, which uses mw.hook( 'wikipage.content' ). It looks like if we clone the original content later (i.e. when WWT is opened, rather than initialised) this error is avoided. PR for this: https://github.com/wikimedia/WhoWroteThat/pull/53

The Miniatlas thing is a bit more confusing. It seems to be taking it outside of the containing <p>, so there's a newline appearing when WWT is closed.

PR merged, not sure if we are still investigating the issue with Miniatlas.

Yes, I've been trying to figure out the miniatlas thing. It seems to boil down to the fact that it creates HTML like this:

<span style="position: relative; white-space: nowrap;">
	<div style="background-color: white; padding: 0.2em; border: 1px solid black; position: absolute; top: 1em; left: 0em; z-index: 15; display: none;">
		<span style="cursor: pointer;">
			<img src="..." srcset="..." class="wmamapbutton noprint" title="Show location on an interactive map" alt="" style="padding: 0px 3px 0px 0px; cursor: pointer;">
			&nbsp;WikiMiniAtlas
		</span>
	</div>
	<a class="external text" href="//tools.wmflabs.org/geohack/geohack.php?pagename=Flight_19&amp;params=29_N_79_W_">
		...
	</a>
</span>

i.e. with a div inside the span. This seems to go awry when we insert the old HTML back into the DOM. This construct behaves differently when it's created as elements to when it's created from an HTML string (e.g. this codepen). I think this is also why the mouse events aren't being preserved on these elements, because they're ending up in the wrong places. I've opened an issue on dschwen/wikiminiatlas.

aezell moved this task from Backlog to In progress on the Who-Wrote-That board.Oct 22 2019, 11:05 AM
Restricted Application edited projects, added Community-Tech; removed Community-Tech (Kanban-Q2-2019-20). · View Herald TranscriptOct 22 2019, 11:05 AM

It looks like keeping the original content attached to the DOM does avoid the issues with re-attaching a deep clone of it. At least, the wikiminiatlas behaves itself, with its mis-nested elements. :)

PR: https://github.com/wikimedia/WhoWroteThat/pull/136

All the issues here have been fixed now. Ready for QA from the current master branch.

dom_walden updated the task description. (Show Details)Wed, Jan 22, 10:51 AM

I can no longer reproduce the bugs in the 3 examples.

I also used a script which compares the HTML of an article before WWT is turned on and after it is turned off, to see if they are equivalent (i.e. that the original HTML is being preserved properly).

I did this for ~150 random articles for each of the 5 supported wikis.

No differences were detected.