Page MenuHomePhabricator

Visual Editor adds link= parameter to image when only caption is edited
Closed, ResolvedPublicBUG REPORT

Assigned To
Authored By
Jonesey95
Jul 9 2020, 2:35 PM
Referenced Files
F32363687: enwiki.sql
Sep 25 2020, 5:56 PM
F32363684: frwiki.csv
Sep 25 2020, 5:56 PM
F32363686: enwiki.csv
Sep 25 2020, 5:56 PM
F32363685: frwiki.sql
Sep 25 2020, 5:56 PM
Tokens
"Burninate" token, awarded by Sunpriat2.

Description

Steps to Reproduce: Here's a link to what happened:
https://en.wikipedia.org/w/index.php?title=COVID-19_pandemic_in_New_York_City&type=revision&diff=966591435&oldid=966591280

The editor used the Visual Editor to change a typo in an image caption. Presumably, a developer could go to that previous version of the article and try to edit the image caption using VE to reproduce this bug.

Actual Results: The Visual Editor added a link= parameter with a modified version of the image file name, and it added an empty alt= parameter.

As a side effect, because the file name was HTML-ized and there appears to be a bug in image syntax processing, the link= parameter and value were displayed as the caption. See T216003 for that bug report.

Expected Results: VE should not add a link= or alt= parameter unless the human editor requests them.

Event Timeline

The bug was first mentioned at https://en.wikipedia.org/wiki//Wikipedia:Village_pump_(technical)#Image_caption_showing_up_as_link?
Tests indicate it happens when the file name has quotation marks. In the example, VisualEditor changed

[[File:NYC Bus with "masks required" sign.jpg|thumb|New York City transit bus. 177th Street near Devoe Ave, Bronx, NY. Route sight reads, "Masks Required"]]

to

[[File:NYC Bus with "masks required" sign.jpg|thumb|New York City transit bus. 177th Street near Devoe Ave, Bronx, NY. Route sign reads, "Masks Required"|link=File:NYC_Bus_with_%22masks_required%22_sign.jpg|alt=]]

The user only changed "sight" to "sign" in the caption.

(This is similar to T193253, which we tried fixing previously, but that patch caused other dirty diffs.)

It should add |alt= parameters, and everyone should (almost) always fill in the blank.

@Jonesey95, how often is the |link= problem turning up?

I don't know how often the link= problem is turning up. The devs could probably do some sort of database search to look for VE edits that added link=, which people should almost never be adding.

As for the blank alt= option (it is confusing to refer to it as a parameter, since the image options do not work like template parameters), VE should not add it if the human editor did not ask for it to be added. I don't use VE, so I don't know if the editor is presented with some way to provide a value for the alt= option while modifying the image caption.

The same problem occurs in ruWP. https://ru.wikipedia.org/w/index.php?title=Надписи_из_деревьев&diff=next&oldid=108444834 Our bot periodically cleans it up.
See the fourth picture - https://ru.wikipedia.org/w/index.php?title=Надписи_из_деревьев&oldid=108499780 The link parameter text is added with percent-encoding, which is why the link is visible in the article instead of the file description!

I also saw similar problems when reviewing the corrupted diffs caused by T259855, where VisualEditor wasn't involved. That made it look like there's an underlying issue in Parsoid, and I found the task about it: T108504.

Let's wait for that to be fixed, and see if it also fixes the bug in VisualEditor.

Esanders subscribed.

Looks like Parsoid has put in a fix so hopefully there is nothing more for us to do here.

May I ask which bot and how often it runs? Checking its contributions would be a good way to confirm whether this issue has been fixed by the Parsoid changes.

FWIW, I just copied the File wikitext from the original bug report to my sandbox on en.WP, then edited it with VE, changed one word in the caption, and saved it. I did not get a link= or alt= parameter, which means I am no longer able to reproduce the bug. Here's the diff:

https://en.wikipedia.org/w/index.php?title=User%3AJonesey95%2Fsandbox&type=revision&diff=979666625&oldid=979666566

May I ask which bot and how often it runs? Checking its contributions would be a good way to confirm whether this issue has been fixed by the Parsoid changes.

@matmarex
I don't know about ruWP, but my bot (WikiCleanerBot) fixes this both in frWP and enWP. The same summary is used for all "internal links written as external links", so it's not only restricted to the link= problem in images

  • On frWP, look for edits with "Lien interne écrit comme un lien externe" in the edit summary, like here or here.
  • On enWP, look for edits with "Internal link written as an external link" in the edit summary, like here.

I mostly run those fixes as part of my dump analysis process, so it's usually twice a month (a few days after the 1st and the 20th).

You can also look at problems found in my dump analysis, both on enWP and frWP (but they contain only problems not reported by CheckWiki or that my bot wasn't able to fix).

Thanks @NicoV!

I wrote two queries to check your bot's edits on enwiki and frwiki, and it has actually made remarkably fewer edits with these summaries in September than before. The fix was only deployed on September 14 so I'm not sure if that's really related or just a random occurence (also, I'm assuming you already did the second run this month?). I guess we'll see next month whether the numbers go down further.

Monthfrwiki countenwiki count
2019-0150
2019-0250
2019-0340
2019-0430
2019-0550
2019-061920
2019-07180
2019-082811969
2019-098262
2019-109199
2019-1138246
2019-12117263
2020-01462192
2020-02117154
2020-0356329
2020-04139461
2020-05236458
2020-0646353
2020-0793200
2020-0853276
2020-092986

Thanks @matmarex

I already ran my bot twice this month, but one thing that can also partly explain the difference is that in August, I ran it very late (like 30th of August instead of 22nd or 23rd), so it fixed some that would usually be fixed in the beginning of the next month.

Thanks @NicoV!

I wrote two queries to check your bot's edits on enwiki and frwiki...

Nice, Bartosz.

Thanks @matmarex

I already ran my bot twice this month, but one thing that can also partly explain the difference is that in August, I ran it very late (like 30th of August instead of 22nd or 23rd), so it fixed some that would usually be fixed in the beginning of the next month.

@NicoV: considering what you shared in the quoted comment above, I was thinking we'll leave this task open and re-run the queries @matmarex wrote after 31-Oct to confirm the issue is indeed fixed. How does that sound to you?

@ppelberg and @matmarex
Is the Parsoid fix supposed to handle also this kind of edits? Because it was done on Septembre 30th, 2 weeks after Parsoid fix has been deployed, and there are still many "link=" added by VE (but to new images). My bot fixed it for the pass on the October 1st dump.

@NicoV Thanks, that is helpful.

I think all of the recent issues you point out are actually caused by T193253 – that is, the erroneous markup is added when copy-pasting the image from another page, rather than when editing the caption of an image (like in this bug). I looked at them more closely:

Edits by the same user in related articles, the affected images appear multiple times, so they were likely copy-pasted.

https://en.wikipedia.org/w/index.php?title=Ministry_of_Mining_and_Energy_(Serbia)&diff=982383906&oldid=953995112
https://en.wikipedia.org/w/index.php?title=Ministry_of_Health_(Serbia)&diff=982383040&oldid=964534644
https://en.wikipedia.org/w/index.php?title=Ministry_of_Education,_Science_and_Technological_Development_(Serbia)&diff=982381908&oldid=953989723
https://en.wikipedia.org/w/index.php?title=Ministry_of_Agriculture,_Forestry_and_Water_Economy_(Serbia)&diff=982380534&oldid=961787908
https://en.wikipedia.org/w/index.php?title=Minister_without_portfolio_(Serbia)&diff=982652098&oldid=949141558

Edits by the same user in related articles, the affected image appear multiple times here as well, so it was likely copy-pasted.

Is the Parsoid fix supposed to handle also this kind of edits?

At first I thought this might be a different problem, since it did not obviously look like a copy-paste – but I checked the user's other contributions, and in their next edit, they deleted an identical table from their sandbox, so the first edit was likely a copy-paste of the content from there.

I re-ran the queries from T257581#6494846:

Monthfrwiki countenwiki count
2020-0646353
2020-0793197
2020-0853267
2020-094084
2020-1021350

So, that no longer looks like a drop…

(note that some of the numbers for previous months dropped – this is not a mistake, and it must be a result of articles with these edits being deleted)

I'm still fairly confident that we fixed this particular issue (per my own testing and T257581#6482369), but it seems that other problems like T193253 are responsible for more of broken wikitext than I was hoping.

I think we should close this task, and I'm going to continue working on T193253.

I think we should close this task, and I'm going to continue working on T193253.

Sounds good.