Page MenuHomePhabricator

Flow wikitext API doesn’t include image that’s present in a topic
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue

What happens?
There is no image present in the wikitext API response, despite there being an image present at https://mediawiki.org/wiki/Topic:Uecl7etmvwe4tg5x

What should have happened instead?
The image should have been included in the wikitext API response, given that it was part of the topic.

Software version
WMF production - Special:Version gives me MW 1.44.0-wmf.20 & Parsoid 0.21.0-a20

Other information/notes

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

The HTML of the image has typeof="mw:Image/Thumb", while normal Parsoid output should have typeof="mw:File/Thumb" instead. Presumably this is something that changed in the past 7 years, and modern Parsoid can no longer convert that HTML to wikitext?

I was aware of that patch, but assumed that the number of Flow posts with images was insignificant enough that I didn't have to care, and what I expected it to produce was some sort of malformed markup, not silently dropping the image.

Stephonjeffries19 changed the task status from Open to Stalled.Mar 22 2025, 4:04 PM
Stephonjeffries19 set the point value for this task to 364955616.
Stephonjeffries19 set Final Story Points to 52648.
Pppery changed the task status from Stalled to Open.Mar 22 2025, 4:04 PM
Pppery removed the point value 364955616 for this task.
Pppery removed Final Story Points.
Stephonjeffries19 changed the task status from Open to Stalled.Mar 22 2025, 4:05 PM
Pppery changed the task status from Stalled to Open.Mar 22 2025, 4:07 PM

I was aware of that patch, but assumed that the number of Flow posts with images was insignificant enough that I didn't have to care, and what I expected it to produce was some sort of malformed markup, not silently dropping the image.

Do you want me to restore some of that backwards compatible code?

No, it's too late now - I ran the export bot weeks ago.

That's really up to A smart kitten, not me. I guess I could re-export the specific sampling of topics that lost images if I could find them.

To make that easier, it would be useful to know when the relevant Parsoid change was ...

(Sorry, my comment above was kind of terse - restoring the backward compatibility now wouldn't fix anything by itself - it might be part of a broader scheme if I decide to pursue that, but the time for that isn't now).


My personal opinion is that we should accept Flow->wikitext conversion is lossy (which isn't all Parsoid's fault, some of it is unavoidable due to data-model or wikitext syntax mismatches), and do T389680 so edge-cases like this don't get lost, rather than trying to track down every single one and fix the wikitext.

@A_smart_kitten What do you think?

So we can decline this?

I guess a few things come to my mind immediately (disclaimer: probably not an exhaustive list):

  • Would this not be an issue for when the WMF runs a script to convert Flow boards to wikitext on other (non-MW.org) wikis? (I feel like I’ve read somewhere that this will happen, correct me if I’m wrong though)
  • Viewing this through the lens of being a StructuredDiscussions bug (ie., rather than necessarily a Parsoid bug) — might there be installations of StructuredDiscussions on any third party wikis that may have posts affected by this? If so, then this bug affects them as well — they would expect any Flow-extension-generated wikitext exports to also include any images that are present in a topic, and currently it seems like this wouldn't always be the case. How to resolve this specific issue is (presumably?) a decision for the stewards of this extension (Growth-Team, I believe?) — to throw out a couple of ideas that immediately come to my mind (also not an exhaustive list): maybe defining a singular/specific version of Parsoid that the extension is marked as being compatible with, that includes all the necessary backwards-compatibility for correctly exporting old Flow posts? Maybe writing & bundling a StructuredDiscussions maintenance script to make the necessary changes to stored Flow posts (e.g., like those recommended in this commit message) such that they export properly to wikitext under all listed supported versions of Parsoid? (There are probably also other possible ideas that aren’t immediately occurring to me)

Reply to @Pppery to come when my brain is working a bit more - feel free to ping me if I’ve forgotten for like a week or something. May also have more thoughts on what I’ve written above.

To make that easier, it would be useful to know when the relevant Parsoid change was ...

See T273505

Would this not be an issue for when the WMF runs a script to convert Flow boards to wikitext on other (non-MW.org) wikis? (I feel like I’ve read somewhere that this will happen, correct me if I’m wrong though)

Yes. There's only one way to convert boards to wikitext, which is to send the HTML through Parsoid, so anything will be affected by this in the same way.

might there be installations of StructuredDiscussions on any third party wikis that may have posts affected by this?

Presumably any StructuredDiscussions post with an image that was made before 2022 (on MediaWiki.org this is 8000 topics) is affected by it. Or maybe only some kinds of image, since anecdotally that's not true.

maybe defining a singular/specific version of Parsoid that the extension is marked as being compatible with, that includes all the necessary backwards-compatibility for correctly exporting old Flow posts

The version of Parsoid is tightly coupled to the version of MediaWiki core - each version of MediaWiki depends on a specific version of Parsoid, so that can't really happen.

Maybe writing & bundling a StructuredDiscussions maintenance script to make the necessary changes to stored Flow posts (e.g., like those recommended in this commit message) such that they export properly to wikitext under all listed supported versions of Parsoid

That would be T209120. Good idea, but I don't think the development resources to do this exist.

But I think @A_smart_kitten has a point that, even if MediaWiki.org has already been done - and I don't have the energy to delete and re-import thousands of Flow boards there even if Parsoid is fixed, other wikis that have Flow boards to export exist, so maybe it would be a good idea to restore the image back compat. I guess I was so deeply engrossed in MediaWiki.org's bot run that I kind of assumed it was all that is.

Also not all back-compat was removed: https://www.mediawiki.org/wiki/Topic:Qau7gi3y1ohkf20j, for example, does export the image correctly (to https://www.mediawiki.org/wiki/User_talk:Wargo/Flow_export#Gwiazda_WikiLove_(przeniesiona_do_nowego_typu_dyskusji) ) , despite the parsoid HTML containing "mw:image". Is it only mw:image/thumb that was broken?

No, that's probably not right. Looking at the exported wikitext it includes useless <span> tags around the image. So Parsoid must have failed to find the wrapper, but then found the raw image tag and constructed a wikitext image from it as if serializing an img tag copied from elsewhere. Why didn't that work here?

Apparently because it's wrapped inside a <figure> tag, which Parsoid ignores if it can't recognize. That sounds like another Parsoid bug. <figure-inline> tags were handled similarly - it treated them as an unknown HTML tag, wrote the literal text <figure-inline> to the wikitext, then recursed into the tag and found the <img> and serialized it to a wikitext image.

... now that I have something to look for other than "all Flow boards with images" the task looks much more approachable. From scanning the 20241220 dump since I wanted what was before I started shuffling around Flow boards wildly.

List of Flow boards probably missing images: P74368

List of Flow topics probably missing images: P74367

(I excluded a bunch of pages where the only missing image was some trivial icon, or part of a massmessage, as I frankly don't care)

There are few enough that I think it is entirely practical to delete the exports for just those 200 or so pages and re-import. So, after a much closer look, I would like mw:Image (and mw:image/whatever too) backcompat restored. This isn't urgent - it can wait until .23, but should be done sometime soon.

I also checked mw:Audio and mw:Video - there were few enough that the only missing video was https://www.mediawiki.org/wiki/Project:Support_desk/Flow/2022/04 which I just added manually.

Or perhaps just fix the bug that Parsoid silently deletes figure tags it doesn't recognize, rather than emitting the literal wikitext <figure> and recursing into their content. If it did that, that would turn the impact of the lack of BC here from "content is silently lost" to "content is wrapped in a useless set of HTML tags" - I'm fine with the latter as part of the known Flow export noise, and am much more concerned by the former.

Thanks everyone for prodding me to look at this more closely, and awakening me from the "flow export is inevitably messy" stupor I had fallen into.

Change #1131812 had a related patch set uploaded (by Pppery; author: Pppery):

[mediawiki/services/parsoid@master] Do not delete old Flow board images when unserializing

https://gerrit.wikimedia.org/r/1131812

Change #1131812 abandoned by Pppery:

[mediawiki/services/parsoid@master] Do not delete old Flow board images when converting to wikitext

https://gerrit.wikimedia.org/r/1131812

Change #1131812 restored by Arlolra:

[mediawiki/services/parsoid@master] Do not delete old Flow board images when converting to wikitext

https://gerrit.wikimedia.org/r/1131812

Change #1131812 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Do not delete old Flow board images when converting to wikitext

https://gerrit.wikimedia.org/r/1131812

Change #1132724 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.21.0-a23

https://gerrit.wikimedia.org/r/1132724

Change #1132724 merged by jenkins-bot:

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.21.0-a23

https://gerrit.wikimedia.org/r/1132724

Pppery claimed this task.

Confirmed that the wikitext call above now shows the image.