Page MenuHomePhabricator

Content Translation Publishing fails with docserver-http: HTTP 400
Closed, ResolvedPublicBUG REPORT

Description

We are seeing a spike of a particular error while publishing translations recently.

{"error":{"code":"docserver","info":"Error converting HTML to wikitext: docserver-http: HTTP 400: {\"type\":\"https://mediawiki.org/wiki/HyperSwitch/errors/unknown_error\",\"method\":\"post\",\"uri\":\"/it.wikipedia.org/v1/transform/html/to/wikitext/Meira_Kumar\"}","*":"See https://it.wikipedia.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce> for notice of API deprecations and breaking

Was there any parsoid related changes that can trigger this? From CX, we have not done any code changes in past months.

Here is a spreadsheet showing a list of such errors in recent days

Event Timeline

The issues seems quite random. Publishing usually works fine - So it is not like the error is always happening.

Maybe T249742: Call to a member function getAttribute() on null ? This will roll out on the next train and maybe see what happens then.

And/or maybe search logstash for one of the recent failing titles and see if there is a log entry corresponding to T249742.

As far as I can see, these errors produce no events in Logstash. At least nothing I can find by searching with the article name.

Hello, I have several complete translations pending to be published but I get this same error in all of them. No matter how hard I try again, the error persists. Do you already know how to fix it? Thanks in advance, excuse my English.

Pginer-WMF added a subscriber: Pginer-WMF.

More instances of this issue were reported recently. So the resolution of the issue suggested in T268872#6653151 does not seem to have resolved the issue. We may want to investigate further the possible causes.

Change 668353 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/extensions/ContentTranslation@master] CX2: Remove pathological transclusions before publishing

https://gerrit.wikimedia.org/r/668353

Very minimal content that cause this error <span typeof="mw:Transclusion" data-mw="{}" data-cx="[{&quot;adapted&quot;:false}]" id="mwCH0">

Can be tried by pasting { "html": "<span typeof=\"mw:Transclusion\" data-mw=\"{}\" id=\"mwCH0\">" } as the request content at https://en.wikipedia.org/api/rest_v1/#/Transforms/post_transform_html_to_wikitext.

We cannot say this is the cause for all errors, but at least this is one of the pattern that can cause parsoid error.

Change 668353 merged by jenkins-bot:
[mediawiki/extensions/ContentTranslation@master] CX2: Remove pathological transclusions before publishing

https://gerrit.wikimedia.org/r/668353

Jpita added a subscriber: Jpita.

Can be tried by pasting { "html": "<span typeof=\"mw:Transclusion\" data-mw=\"{}\" id=\"mwCH0\">" } as the request content at https://en.wikipedia.org/api/rest_v1/#/Transforms/post_transform_html_to_wikitext.

trying this method I get an error

{
  "type": "https://mediawiki.org/wiki/HyperSwitch/errors/unknown_error",
  "method": "post",
  "uri": "/en.wikipedia.org/v1/transform/html/to/wikitext"
}

is this expected?

Can be tried by pasting { "html": "<span typeof=\"mw:Transclusion\" data-mw=\"{}\" id=\"mwCH0\">" } as the request content at https://en.wikipedia.org/api/rest_v1/#/Transforms/post_transform_html_to_wikitext.

trying this method I get an error

{
  "type": "https://mediawiki.org/wiki/HyperSwitch/errors/unknown_error",
  "method": "post",
  "uri": "/en.wikipedia.org/v1/transform/html/to/wikitext"
}

is this expected?

Yes, that is the minimal sample of problematic content. The patch was to check for this kind of content and remove.