Page MenuHomePhabricator

VisualEditor losing Media: links
Closed, ResolvedPublic

Description

I just finished upgrading to MediaWiki 1.31 and updated VisualEditor and parsoid to match it.

Parsoid is at git revision: bbc56abd8fcc245876fc340723ed3431565b61d1
VisualEditor is on branch REL1_31

I have some wiki pages that were linking to files like this (this bypasses going through the File: info page for the file directly to the media, which has been useful with PDF to allow the browser to render it natively:

[[Media:Test.pdf]]

Upon loeading the page with VisualEditor, the content changes to a link like this:

[[Images/1/16/Test.pdf]]

This link is not valid and becomes broken on saving the page. If I replace the link with [[Special:FilePath/Test.pdf]] MediaWiki doesn't lose the link on the VisualEditor edit.

I can't tell which part is broken here, but I will do more digging if someone can suggest any tests to run. I have many extensions loaded but I tried this test with all of them disabled except for VisualEditor and could still reproduce this problem on an otherwise blank page.

See also:

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 29 2018, 7:46 PM
Kmstange updated the task description. (Show Details)Jun 29 2018, 7:47 PM

On my local testing installation, it is not quite so broken – a link like [[Media:Logo.png]] becomes [//localhost:3080/w/images/c/c9/Logo.png Media:Logo.png], which is still functional. Perhaps the paths on your wiki are configured in a funny way (are you using $wgScriptPath = ""? that's not recommended). Regardless, this is still a wikitext corruption bug.

My $wgScriptPath is "/wiki" (the default), so I don't think that's related to the discrepancy between our results. Not sure what else it might be.

Starting with [[Media:Logo.png]], copy(ve.init.target.doc.body.outerHTML) gives me,

<body data-parsoid="{&quot;dsr&quot;:[0,18,0,0]}" lang="en" class="mw-content-ltr sitedir-ltr ltr mw-body-content parsoid-body mediawiki mw-parser-output" dir="ltr"><p data-parsoid="{&quot;dsr&quot;:[0,18,0,0]}"><a rel="mw:MediaLink" href="//upload.wikimedia.org/wikipedia/commons/c/c9/Logo.png" title="Logo.png" data-parsoid="{&quot;dsr&quot;:[0,18,null,null]}">Media:Logo.png</a></p></body>

which matches the spec. After an insignificant edit, copy(ve.init.target.docToSave.body.outerHTML) yields,

<body data-parsoid="{&quot;dsr&quot;:[0,18,0,0]}" lang="en" class="mw-content-ltr sitedir-ltr ltr mw-body-content parsoid-body mediawiki mw-parser-output" dir="ltr"><p data-parsoid="{&quot;dsr&quot;:[0,18,0,0]}"><a href="//upload.wikimedia.org/wikipedia/commons/c/c9/Logo.png" rel="mw:ExtLink" title="Logo.png" data-parsoid="{&quot;dsr&quot;:[0,18,null,null]}">Media:Logo.png</a> asdf</p></body>

which Parsoid serializes as [//upload.wikimedia.org/wikipedia/commons/c/c9/Logo.png Media:Logo.png] asdf

Looks like VE is doing s/mw:MediaLink/mw:ExtLink/?

Deskana triaged this task as High priority.Jul 31 2018, 6:36 PM
Deskana added a subscriber: Deskana.

Fairly bad.

DLynch claimed this task.Aug 6 2018, 4:38 PM
DLynch moved this task from Incoming to In progress on the VisualEditor (Current work) board.
DLynch added a subscriber: DLynch.

We have no explicit handling of mw:MediaLink, so I think what's happening here is just that it's treating it like any other link and falling back on that generic behavior.

DLynch added a comment.Aug 6 2018, 4:41 PM

If I'm speculating wildly, https://gerrit.wikimedia.org/r/c/mediawiki/extensions/VisualEditor/+/436544 did change a bunch of our parsoid-and-links interactions recently.

DLynch added a comment.Aug 6 2018, 4:45 PM

Alas, testing indicates the behavior existed prior to that. So I'll just fix it rather than digging.

DLynch added a subscriber: ssastry.

Okay, so. I've gone and read wt2html/tt/LinkHandler.js and html2wt/LinkHandler.js in Parsoid, and I think that this never worked in VE, since Parsoid added this MediaLink stuff back in May of 2017.

Parsoid takes [[Media:Logo.png]], generates a mw:MediaLink that contains the URL of the image we're linking to, and doesn't give us that original Media: string. Instead it stores it internally as shadowinfo which it uses to reconstruct that later. We could mostly guess this by taking the filename from the href and sticking the local media namespace prefix on it, but I'm not sure if there are edge cases this wouldn't catch.

@ssastry My reading of html2wt/LinkHandler.js is that Parsoid expects us to not notice these and send back Media:Logo.png as the href, insofar as I think rtData.target = DU.getShadowInfo(node, 'fileName', rtData.href.replace(/.*\//, '')); wouldn't cope with it. Since I think we'd like to let end users who are linking to these see the Media: URL, do you want to adjust Parsoid to cope with that better and give it to us as a parsoid resource like other namespace links? Or something else? What's your vision for how this should work?

cscott added a subscriber: cscott.Aug 6 2018, 7:28 PM

From a UX standpoint, I'd expect that "link to file page on commons" vs "link directly to media download" is a checkbox in the link options somehow. I think we'd need some insight from a designer on how to best express this w/in the current VE framework. The magical checkbox would determine whether VE gave us back the mw:MediaLink or mw:ExtLink type.

I expect selective serialization is in play here as well: the original poster and @matmarex were likely testing on a setup w/o RESTBase, which means selective serialization might not have been enabled. That would lead to dirty diffs when even null edits were made. On WMF wikis, I don't expect we'd change/corrupt Media: links unless you actually edited the link itself.

cscott added a comment.Aug 6 2018, 8:33 PM

To clarify, I'm suggesting this is primarily (solely?) a VE bug.

  1. On the Parsoid side, we should round-trip unedited HTML fine (via selser or data-parsoid), and if you send back typeof='mw:MediaLink' in newly-authored HTML we should serialize it as a [[Media:...]] link. (If we don't do this, it's a bug we should fix in Parsoid.)
  1. On the VE side, there is no way to distinguish a Media from a File link. I think the correct UX is probably *not* to surface the specific "Media" vs "File" link name, since that's hugely obscure and "Media" doesn't really describe well what's happening anyway. I'm suggesting it be thrown over to design for some thoughts. I suspect a checkbox would appear/be enabled when the target of the link was a "File:" link, and there would be descriptive text such as "[x] Link directly to media file" or "[x] Link to file description page" (depending on the sense of the checkbox). But other options are possible -- it could be handled like we handle named-vs-unnamed links, with a "convert this to a direct media link" button at the top, and "media link" and "direct media link" distinguished in the UX as two different link types. Anyway, VE would need to surface the distinction somehow, and then...
  1. VE should send back mw:MediaLink iff it wants a direct link, and mw:ExtLink iff it wants a link to the file description page, and Parsoid should honor the typeof it is given (as appears to be the case currently)
  1. (optional) We could add a "smart clean up" pass that recognizes pasted URLs directly to wiki resources (ExtLinks) and converts them into appropriate MediaLinks. This seems fragile to me, though: I'm not sure how our users would be getting their direct link to the resource, and to be robust we'd probably need to handle a pasted link to any of the myriad thumbnail URLs... but then the user would get a link to the full size image which may---or may not!---be actually what they were after.

I expect selective serialization is in play here as well: the original poster and @matmarex were likely testing on a setup w/o RESTBase, which means selective serialization might not have been enabled. That would lead to dirty diffs when even null edits were made. On WMF wikis, I don't expect we'd change/corrupt Media: links unless you actually edited the link itself.

Sorry to derail the discussion a little bit, but is selective serialization something that I should have on now? It defaults to false in the configuration file.

cscott added a comment.Aug 6 2018, 9:18 PM

I expect selective serialization is in play here as well: the original poster and @matmarex were likely testing on a setup w/o RESTBase, which means selective serialization might not have been enabled. That would lead to dirty diffs when even null edits were made. On WMF wikis, I don't expect we'd change/corrupt Media: links unless you actually edited the link itself.

Sorry to derail the discussion a little bit, but is selective serialization something that I should have on now? It defaults to false in the configuration file.

Yes. We should fix the docs and the default config. Nothing will break (well, except for Media links apparently) but you'll get lots more dirty diffs from VE w/o selective serialization, and we've been simplifying our code recently by assuming it is on in order to handle the more unusual round-tripping edge cases.

@cscott

From a UX standpoint, I'd expect that "link to file page on commons" vs "link directly to media download" is a checkbox in the link options somehow. I think we'd need some insight from a designer on how to best express this w/in the current VE framework. The magical checkbox would determine whether VE gave us back the mw:MediaLink or mw:ExtLink type.

I'm open to talking with Design (we just added a designer to the VE team again) for the best way to surface this distinction... but I think a good interim step would be a quick fix to make Media just as easily-edited as File.

To clarify, I'm suggesting this is primarily (solely?) a VE bug.

I disagree, just because I think Parsoid isn't behaving helpfully in this case. It's not giving the VE information in a useful form, in two senses:

  1. It takes wikitext and turns it into something which it's complicated for VE to make into a thing the end-user can edit sensibly.
  2. I don't think VE needs the direct media URL it returns, though I can't speak for any other consumers of this API?

I can make VE turn what Parsoid is giving me into something useful... but that seems more fragile than just getting Parsoid to give me the useful-thing-it-already-knows directly.

On the VE side, there is no way to distinguish a Media from a File link.

This is quite wrong! The return from Parsoid is really different. It turns [[:File:Logo.png]] into <p id="mwAQ"><a rel="mw:WikiLink" href="./File:Logo.png" title="File:Logo.png" id="mwAg">File:Logo.png</a></p>`.

This we handle well in VE, because we're used to detailing with those Parsoid resource links. It gets us this helpful edit experience:

Instead of this very unhelpful one:

I.e. my suggested change is: give me the Media:Logo.png somehow, and handle it when people send it back as part of a mw:MediaLink. I don't care if you still include the expanded Media URL, just so long as you don't expect me to know what the new one should be when a user edits a link.

re: "on VE side no way to distinguish" -- I mean from the UX perspective. Parsoid does (and should, IMO) return different HTML for the two cases, of course. And in fact the href and label are tightly constrained by the functional aspects, so there's no much room for Parsoid to return different HTML. VE should/must handle these better, with improved UX. It can/should provide the UX from your first example even when given the HTML for your second example (which again, we can't really change much because of the functional requirements on the href).

Absent Parsoid output/input changes, I will make those editing experiences equivalent by:

  • Looking for mw:MediaLink, throwing away the href, assuming that the original wikitext was [[<local media namespace>:<contents of title attribute>]] and displaying that as the link's href.
  • Serializing any links with Media:Filename.png to a mw:MediaLink with href of ./Filename.png and trusting my reading of getLinkRoundTripData that this will produce the right result inside Parsoid.

I worry that this is more fragile than it has to be. That said, so long as Parsoid doesn't update to require that the href returned actually be the one the final link would point to, it should probably all work.

You should look at the resource attribute of the <a> tag, I believe, instead of the title attribute. And you shouldn't be looking at Media:.., you should be looking at VE's link type or a boolean probably.

@cscott: Nope, this is all we get from Parsoid: [[Media:Logo.png]] => <p id="mwAQ"><a rel="mw:MediaLink" href="//upload.wikimedia.org/wikipedia/commons/c/c9/Logo.png" title="Logo.png" id="mwAg">Media:Logo.png</a></p> (with the link text obviously not being reliable, because people can change that).

We'd need to look for people linking to Media:... and treat it differently, because Parsoid treats them differently, no? (We already special-case internal links to File: and Category: because they need to have that : prepended so they're not turned into actual inclusions of those resources. This seems equivalent.)

cscott added a comment.Aug 7 2018, 2:33 AM

Looks like Media handling was added back in T151277; we probably should have ensured VE handled these correctly at the same time, but it looks like our focus was on read-view functionality.

@Arlolra, @ssastry: I think we should be adding a resource attribute here, like we do for <img> tags, and for the same reason: so that the client doesn't have to try to infer the resource from the thumbnail url (or from the title attribute, which is subject to language conversion & etc). Do you agree?

@DLynch: I don't agree that a user of VisualEditor should be expected to know the magic behavior of Media:. If they happen to manually type it in, it should work, but there should be actual human-oriented and reasonably-intuitive UX to obtain this link behavior. I'd defer to design, but note that typeahead search on link insertion doesn't work at all currently with Media: (try: Media:common.jpg on enwiki), and while it sort of works for File: (try: File:common.jpg) the UX is non-ideal. We have an "insert > media" option with a nice visual browser; ideally both "File:" and "Media:" links could be inserted with something closer to this UX (maybe "Insert Media Link", or else a "insert link instead of image" checkbox in the media options).

@cscott: Oh, I totally agree that this UX isn't optimal, and that we should add it to @iamjessklein's big pile of VE design thoughts. I'm just happy to fix the power-user case right now while we wait on that. (And we'll presumably need whatever functionality we add via resource for the nicely-designed option anyway, so that shouldn't hurt.)

Change 451001 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/services/parsoid@master] Use resource attribute for [[Media:....]] links

https://gerrit.wikimedia.org/r/451001

I am not that deep in mediawiki code and development. So now I am asking if solution has been found?
Can I apply change 451001 to my productive mw1.31.0 installation as we get defective media links while users are editing pages?
And yes I read the whole thread.

@rakekniven: Conceptually a solution is found, but it's incomplete. Once 451001 is merged into Parsoid, and a new Parsoid release happens, we'll need another patch to VE to actually use this new resource attribute.

@DLynch Ok, so I have to wait. Rollback to my previous mv install is not possible due to the page edits we had in the past.
Are you aware of any time schedule to get this VE patch?

@rakekniven: No strict schedule. It's not very urgent to work on for us until the Parsoid patch lands, and I have no idea when that'll be. Sometime in the next few weeks is plausible.

@rakekniven: No strict schedule. It's not very urgent to work on for us until the Parsoid patch lands, and I have no idea when that'll be. Sometime in the next few weeks is plausible.

This is probably a week or two since we are piling all HTML version changes into a single version bump, and the other patches in this pile are held up on some other dependent patches being deployed. If it looks like it will get delayed further, we might deploy this media-related patch independently.

@ssastry Thanks for the update!

Change 454860 had a related patch set uploaded (by DLynch; owner: DLynch):
[mediawiki/extensions/VisualEditor@master] [WIP] Handle mw:MediaLink

https://gerrit.wikimedia.org/r/454860

(That's very provisional.)

Hello and good morning,

any status update on this one?

Best regards

Change 451001 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Use resource attribute for [[Media:....]] links

https://gerrit.wikimedia.org/r/451001

rakekniven added a comment.EditedOct 8 2018, 3:04 PM

@ssastry Any update regarding release of Parsoid? Our wiki is still losing working links.
In the meantime I upgrade our MW instance to v1.13.1 and Parsoid is still 0.9.0

Can somebody tell me the manual steps to get it working again. Maybe @DLynch ?
I really do not like to wait until Parsoid has its pieces together.

Any help is really appreciated. Thank you.

@ssastry Any update regarding release of Parsoid? Our wiki is still losing working links.
In the meantime I upgrade our MW instance to v1.13.1 and Parsoid is still 0.9.0
Can somebody tell me the manual steps to get it working again. Maybe @DLynch ?
I really do not like to wait until Parsoid has its pieces together.
Any help is really appreciated. Thank you.

If you have the option of checking out a specific revision, I recommend using the ff6ffb5 from the deploy-2018-09-27 branch so you aren't waiting for us to do another release.
See https://github.com/wikimedia/parsoid/commits/deploy-2018-09-27

The earliest I've found at enwiki was on Saturday, 14 April 2018:

I also found an edit on Wednesday, 11 April at enwiki that was not affected by this, so something in the deployment train during the week of April 9th is likely the trigger.

@Magioladitis and @NicoV: This bug is causing ugly wikitext at all the wikis. There are probably a few hundred (but not thousands) of these problems at enwiki alone. Do you think think that the clean-up work could be automated?

The earliest I've found at enwiki was on Saturday, 14 April 2018:

I also found an edit on Wednesday, 11 April at enwiki that was not affected by this, so something in the deployment train during the week of April 9th is likely the trigger.
@Magioladitis and @NicoV: This bug is causing ugly wikitext at all the wikis. There are probably a few hundred (but not thousands) of these problems at enwiki alone. Do you think think that the clean-up work could be automated?

Did you mean to post on this ticket? I couldn't immediately tell what was wrong with enwiki:Nightwish edit up there.

The Nightwish link changed the functional ([[Media:Nightwish-highhopes.ogg|sample]]) to the broken ([//upload.wikimedia.org/wikipedia/en/6/69/Nightwish-highhopes.ogg sample]) (and about a thousand other little things, none of which matter).

Change 454860 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@master] Handle mw:MediaLink

https://gerrit.wikimedia.org/r/454860

@DLynch Do you think this patch solves the issue? It's hard for me since the patch doesn't have details.

DLynch added a comment.EditedOct 18 2018, 5:17 PM

@Deskana: It should fix any future mangling of Media: links. (Given: that the new parsoid release is deployed, which I believe it is.) Basically, paste [[Media:Test.pdf]] into a VE document. If the link context item you get shows it being a Media: link after you do that, it's fixed.

However, it doesn't do any repair on links that have already been mangled. That's what @Whatamidoing-WMF was discussing above -- if we'd want to try some sort of bot-repair of them. You could certainly argue that's not in the scope of this ticket, and make a new one for that if it's desired, though.

rakekniven added a comment.EditedOct 19 2018, 5:56 AM

@DLynch New Parsoid release is not out. See https://github.com/wikimedia/parsoid/releases
Who to contact regarding a release? @ssastry Can you help?

@rakekniven Sorry, I was talking about deployment-to-wikipedia more than about proper parsoid releases.

ssastry's earlier advice about using one of the deployment branches rather than waiting for a release seems valid for now, if that's an option for you.

JTannerWMF moved this task from Inbox to High Priority on the Editing QA board.Nov 8 2018, 6:48 PM
marcella closed this task as Resolved.Jan 8 2019, 9:42 PM
rakekniven added a comment.EditedJan 10 2019, 1:25 PM

@ssastry My ubuntu told me there is a new parsoid release v0.10
Installed it and using it together with MW v1.31.1

I still have the issue of broken links (localhost...).
Do I need to use another MW version as well?
Or an specific VE version? Currently we use 0.1.0 (13a585a).

Thank you for any help.

Update:
Just upgraded my installation to:
MW 1.32.0
VE 0.1.0 (21d40ce)
Nodejs 0.8.10

I still have the issue of broken links (localhost...).
As soon as I start the VE on a page the media links are broken, no need to save page.

Any help is appreciated

Change 497354 had a related patch set uploaded (by Jforrester; owner: DLynch):
[mediawiki/extensions/VisualEditor@REL1_32] Handle mw:MediaLink

https://gerrit.wikimedia.org/r/497354

Change 497354 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@REL1_32] Handle mw:MediaLink

https://gerrit.wikimedia.org/r/497354

Change 534274 had a related patch set uploaded (by Robert Vogel; owner: DLynch):
[mediawiki/extensions/VisualEditor@REL1_31] Handle mw:MediaLink

https://gerrit.wikimedia.org/r/534274