Page MenuHomePhabricator

VE removes all new lines when editing a template
Closed, ResolvedPublic

Description

Reported on Persian Wikipedia Example edit. Users are quite unhappy about this.

Event Timeline

I had also noticed over the last few days and mentioned here on EN WP - Examples: here, this and this.

CC @Krinkle in case this is related to recent TemplateData changes?

CC @Krinkle in case this is related to recent TemplateData changes?

I've re-reviewed the paramOrder change that landed week but don't see a connection for the formatting.

The aforementioned fa.wikipedia diff (link) is about a template "Infobox Officeholder". The TemplateData query for that (link) yield the following:

"format": "block",

So it seems the backend is describing the template correctly. That leaves Parsoid and VisualEditor as potential suspects. As far as I know, the serialisation based on TemplateData happens in Parsoid, not VE, so perhaps something regressed there and it no longer honouring the format?

Strange. There have been no deployments since Jan 8 and even that doesn't touch any serialization code. Anyway, we test html2wt locally for this.

CC @Krinkle in case this is related to recent TemplateData changes?

I've re-reviewed the paramOrder change that landed week but don't see a connection for the formatting.

The aforementioned fa.wikipedia diff (link) is about a template "Infobox Officeholder". The TemplateData query for that (link) yield the following:

"format": "block",

So it seems the backend is describing the template correctly. That leaves Parsoid and VisualEditor as potential suspects. As far as I know, the serialisation based on TemplateData happens in Parsoid, not VE, so perhaps something regressed there and it no longer honouring the format?

So I patched html2wt with as follows (since this is triggered only on edited HTML and I wanted to test it with regular HTML),

diff --git a/lib/html2wt/WikitextSerializer.js b/lib/html2wt/WikitextSerializer.js
index d052a47e0..9470070fa 100644
--- a/lib/html2wt/WikitextSerializer.js
+++ b/lib/html2wt/WikitextSerializer.js
@@ -782,11 +782,12 @@ WikitextSerializer.prototype.serializeFromParts = Promise.async(function *(state
                var fetched = false;
                try {
                        var apiResp = null;
-                       if (isTpl && useTplData) {
+                       if (isTpl && true) {
                                var href = tplHref.replace(/^\.\//, '');
                                apiResp = yield TemplateDataRequest.promise(env, href, Util.makeHash(["templatedata", href]));
                        }
                        tplData = apiResp && apiResp[Object.keys(apiResp)[0]];
+                       console.log("TPLDATA: " + JSON.stringify(tplData));

Now, when I rut it with the HTML of fawiki's infobox, see the output:

[subbu@earth:~/work/wmf/parsoid] parse.js --html2wt --prefix fawiki  --trace apirequest< /tmp/fa.html 
....
[ApiRequest]   | #3 Starting HTTP request:  {"method":"GET","qs":{"format":"json","action":"templatedata","includeMissingTitles":"1","titles":"الگو:Infobox_Officeholder"},"followRedirect":true,"uri":"https://fa.wikipedia.org/w/api.php","timeout":30000,"headers":{"X-Request-ID":null,"User-Agent":"Parsoid/0.10.0+git","Connection":"close"},"strictSSL":true}
....
TPLDATA: {"title":"الگو:Infobox Officeholder","notemplatedata":"","params":[]}

@Krinkle: do you know what is going on here?

So, here is the thing. Even with this broken API response, Parsoid serializes the HTML correctly as long as the data-parsoid block is preserved. But, if I edit fa.html and remove the data-parsoid attribute, it gets serialized inline with all newlines gone. Now, data-parsoid comes from RESTBase and requires that the id attribute match up. So, in addition to the broken TemplateData request, is VE accidentally clobbering the id?

@Esanders, can you test this hypothesis and check if there is some reason why Parsoid might not be getting the right data-parsoid blob?

(This issue was first reported yesterday on another task: T179259#4881378)

@ssastry Note that the Parsoid query is for "Infobox Officeholder" (with uppercase "O"), which is a redirect to "Infobox officeholder" (lowercase "o"). Only the latter page has templatedata. Parsoid should probably ask the MediaWiki API to follow redirects (&redirects=1). But that seems like a separate issue.

@ssastry Note that the Parsoid query is for "Infobox Officeholder" (with uppercase "O"), which is a redirect to "Infobox officeholder" (lowercase "o"). Only the latter page has templatedata. Parsoid should probably ask the MediaWiki API to follow redirects (&redirects=1). But that seems like a separate issue.

In my snippet I pasted above with the API trace, see that we already have "followRedirect":true. Is that the wrong param?

In any case, could you also investigate the second part of my comment there about VE losing the id attribute in some scenarios?

In any case, could you also investigate the second part of my comment there about VE losing the id attribute in some scenarios?

I managed to reproduce the issue locally (although I'm using a setup without RESTBase). The data-parsoid attribute disappears on edited templates in the HTML sent back to the API (you can confirm this by putting a breakpoint inside ve.init.mw.ArticleTarget.prototype.deflateHtml and looking at html).

I found that it is caused by rEVED48db45df7602: Don't preserveHtmlAttributes on transclusion nodes. I am not sure how it causes the problem (I didn't investigate much since I assume @Esanders will know it off the top of his head once we know the faulty patch), but reverting that patch fixes it. So this is not a TemplateData or Parsoid issue – sorry for alarming you.

The problem mostly affects templates that have no templatedata, or do not have format set in templatedata (or templates transcluded via redirects, due to the unrelated Parsoid issue). If format is set, then Parsoid applies it, masking the VE issue.

matmarex triaged this task as Unbreak Now! priority.Jan 17 2019, 1:41 AM
matmarex edited projects, added VisualEditor (Current work); removed VisualEditor.

(Wikitext corruption → UBN)

Change 484820 had a related patch set uploaded (by Bartosz Dziewoński; owner: Bartosz Dziewoński):
[mediawiki/extensions/VisualEditor@master] Revert "Don't preserveHtmlAttributes on transclusion nodes"

https://gerrit.wikimedia.org/r/484820

@ssastry Note that the Parsoid query is for "Infobox Officeholder" (with uppercase "O"), which is a redirect to "Infobox officeholder" (lowercase "o"). Only the latter page has templatedata. Parsoid should probably ask the MediaWiki API to follow redirects (&redirects=1). But that seems like a separate issue.

In my snippet I pasted above with the API trace, see that we already have "followRedirect":true. Is that the wrong param?

I'm not familiar with that library, but it looks like this is the library's parameter for following HTTP redirects, rather than a query parameter for following MediaWiki redirects.

Change 484820 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@master] Revert "Don't preserveHtmlAttributes on transclusion nodes"

https://gerrit.wikimedia.org/r/484820

Change 485048 had a related patch set uploaded (by Jforrester; owner: Bartosz Dziewoński):
[mediawiki/extensions/VisualEditor@wmf/1.33.0-wmf.13] Revert "Don't preserveHtmlAttributes on transclusion nodes"

https://gerrit.wikimedia.org/r/485048

Change 485049 had a related patch set uploaded (by Jforrester; owner: Bartosz Dziewoński):
[mediawiki/extensions/VisualEditor@wmf/1.33.0-wmf.12] Revert "Don't preserveHtmlAttributes on transclusion nodes"

https://gerrit.wikimedia.org/r/485049

Change 485049 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@wmf/1.33.0-wmf.12] Revert "Don't preserveHtmlAttributes on transclusion nodes"

https://gerrit.wikimedia.org/r/485049

Change 485048 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@wmf/1.33.0-wmf.13] Revert "Don't preserveHtmlAttributes on transclusion nodes"

https://gerrit.wikimedia.org/r/485048

Mentioned in SAL (#wikimedia-operations) [2019-01-17T16:56:01Z] <jforrester@deploy1001> Synchronized php-1.33.0-wmf.13/extensions/VisualEditor/modules/ve-mw/: T213922: Revert 48db45df7602 for wmf.13 (duration: 00m 51s)

Mentioned in SAL (#wikimedia-operations) [2019-01-17T16:57:12Z] <jforrester@deploy1001> Synchronized php-1.33.0-wmf.12/extensions/VisualEditor/modules/ve-mw/: T213922: Revert 48db45df7602 for wmf.12 (duration: 00m 52s)

matmarex added a subscriber: Jdforrester-WMF.

@Jdforrester-WMF deployed the patches. The issue should now be fixed on all wikis.

The issue was introduced in wmf.12, so it affected edits made between 8 January and today, 17 January. I updated the note in Tech News with this.

Is this bug completely solved, and can be closed?

ppelberg claimed this task.