Page MenuHomePhabricator

Section Translation removes categories after publishing
Closed, ResolvedPublicBUG REPORT

Description

List of steps to reproduce (step by step, including full links if applicable):

  • This occurs when trying to publish a new article during translation, sometimes a category at the end of the category is removed

What happens?:
category at the end of the category is removed

What should have happened instead?:
Category/template should not be removed

Software version (if not a Wikimedia wiki), browser information, screenshots, other information, etc.:
Google Chrome

Event Timeline

Pginer-WMF raised the priority of this task from Medium to High.Jun 30 2022, 11:58 AM

After some investigation I noticed that this issue happens only when exactly one appendice section exists inside the target article. The reason behind it is that for some reason the section contents of the appendice section appears in the PHP backend to also contain the categories. More specifically, during my investigation I tested the "Brunei" article for the (en/gu) language pair. I dumped the contents of the first section of the Brunei article in gu and the results were:

$referencesContent = $content->getSection(1); // where $content is the content of the "બ્રુનેઈ" (Brunei) article in gu language
wfDebugLog( "debug", "Content of section: {$referencesContent->getText()}" );
--------------
Result:
2022-07-01 14:49:55 wmf2763 wii_wiki: Content of section: == આ પણ જુઓ ==
* [[wikt:બ્રુનેઈ|બ્રુનેઈ]] (વિક્ષનરી)

{{stub}}
{{-}}
{{એશિયા}}

[[શ્રેણી:દેશ]]
[[શ્રેણી:એશિયા]]

As one can easily check, the contents of the references section (which is an appendix section) also contains two category links (last 2 lines of the contents). So if we publish a section for this article, it will be positioned just before the "References" section, by replacing the current contents of the "References" section with the concatenation of the new section contents, plus the contents of the References section. However, the contents of the "References" section inside the frontend application do not contain the category links, and thus the concatenated text will not contain the category links. That will lead the category links that exist inside the contents of the section as it is computed inside the backend code, to be lost after the replacement.

Notes:

In order to bypass this issue, I would suggest to move the implementation of the concatenation of the new section contents and the first appendice section contents to the backend, instead of the frontend application. That would increase the stability and the predictability of the publishing functionality, since both the section handling and the publishing will happen in the same environment.

Thanks for detailed analysis. CXServer' section segmentation(Parsing and converting the whole content to sections) extract the categories out of the page content and provide as a structured array of category objects. So the segmented content will not have categories at all. Category editing is different from section editing. That is the reason for this behavior.

So you are right that to get the consistant and accurate publishing, we cannot use the section content from cxserver, instead the Publishing API backend should do this construction to insert the section. Agreed.

This ticket is expected to be fixed by the work submitted in T314392.

test status: QA PASS (unable to reproduce the issue)