Page MenuHomePhabricator

CX2: <span class="Z3988"...
Open, MediumPublic

Description

Sometimes, CX2 creates completely useless span tags with many useless parameters but with just a "&nbsp;" as the text... Could you fix this ?

Example with Franz Ludwig Güssefeld on frwiki :

<span class="Z3988" style="display:none" title="ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rfr_id=info:sid/de.wikipedia.org:Franz+Ludwig+G%C3%BCssefeld&rft.atitle=Biographische+Notiz+von+Franz+Ludwig+G%C3%BCssefeld&rft.au=Franz+Ludwig+G%C3%BCssefeld&rft.btitle=Allgemeine+geographische+Ephemeriden&rft.date=1808&rft.genre=book&rft.place=Weimar&rft.pub=Landes-Industrie-Comptoir&rft.volume=Band+26">&nbsp;</span>

In addition to this huge span tag, there are other completely useless span tags :

<span>(NDB).</span> <span>Band</span>&nbsp;<span>7, Duncker & Humblot, Berlin 1966,</span> ISBN 3-428-00188-5<span>, S.</span>&nbsp;<span>289</span> <span>(</span><span class="plainlinks-print">[http://daten.digitale-sammlungen.de/0001/bsb00016325/images/index.html?seite=303 Digitalisat]</span><span>).</span>

Event Timeline

NicoV created this task.Mar 15 2019, 6:49 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 15 2019, 6:49 PM
Arrbee moved this task from Check & Move to Bugs on the ContentTranslation board.May 13 2019, 2:50 PM

This is still happening, requiring cleanup by en.WP gnomes. Please make it stop. Here's an example from a few days ago:

https://en.wikipedia.org/w/index.php?title=History_of_the_far-right_in_Spain&oldid=929127143

NicoV added a comment.Dec 9 2019, 4:52 PM

Yes, please fix it : when will CX stop creating articles that requires gnomes to check each article and fix it ? This situation has been during for years

The history of Pedro Laín Entralgo article you linked: https://en.wikipedia.org/w/index.php?title=Pedro_La%C3%ADn_Entralgo&action=history tells that it was created in 2007. Why do you think that ContentTranslation created that article?

This is still happening, requiring cleanup by en.WP gnomes. Please make it stop. Here's an example from a few days ago:

https://en.wikipedia.org/w/index.php?title=History_of_the_far-right_in_Spain&oldid=929127143

as far as I can see this article is created using VisualEditor from scratch and not using translation. The articles created using ContentTranslation will have an edittag "contenttranslation"

Please let us know how these two issues are related to ContentTranslation. Thanks

Really? Look at the span tags. They say

span title="ctx_ver=Z39.88-2004&rfr_id=info%3Asid%2Fes.wikipedia.org%3APedro+La%C3%ADn+Entralgo

and

span class="Z3988" title="ctx_ver=Z39.88-2004&rfr_id=info%3Asid%2Fes.wikipedia.org%3AHistoria+de+la+extrema+derecha+en+Espa%C3%B1a

It is my impression that "CTX" is the content translation tool, but I could be wrong. The rest of this post proceeds on that assumption.

For the former article, the edit I linked to was made on 1 July 2019, as shown in the diff I linked; it does not matter when the article was created, it matters when the edit in question was made.

Edits created with the content translation tool do not have to be tagged with "contenttranslation" if they were copy-pasted from another location, such as the editor's user space in en.WP or another language's WP.

I don't know how this is happening, but a search for the relevant span tag should help someone track down the source of this bug. I hope that someone is listening, and that I am not just typing into the void. On the slim hope that someone out there is willing to work on fixing this problem and not just posting objections like the frustrating one above without doing any research at all, here's some volunteer research for you:

A search for the relevant portion of the span tag (currently 72 hits in en.WP article space, plenty of hits to begin to figure out how this problem is created):
https://en.wikipedia.org/w/index.php?sort=relevance&search=insource%3A%2Ftitle%5C%3D%5C%22ctx_ver%2F&title=Special:Search&profile=advanced&fulltext=1&advancedSearch-current=%7B%7D&ns0=1

(Hint: look in the history of each article to determine when and how the span tags were added.)

A page created by the content translation tool, with the tags ContentTranslation ContentTranslation2 PHP7, in March 2019:
https://en.wikipedia.org/w/index.php?title=Bombing_of_Nuremberg_in_World_War_II&action=history

In the above page, translation of the German Template:Stadtlexikon Nürnberg resulted in the offending span tag. When you go to Special:ExpandTemplates in de.WP and expand that template, you get:

[[Michael Diefenbacher]], [[Rudolf Endres]] (Hrsg.): <cite style="font-style:italic">[[Stadtlexikon Nürnberg]]</cite>. 2.,&nbsp;verbesserte Auflage. W.&nbsp;Tümmels Verlag, Nürnberg 2000, ISBN 3-921590-69-8 ([https://www.nuernberg.de/internet/stadtarchiv/publikationen_einzeln_stadtlexikon.html online]).<span class="Z3988" title="ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rfr_id=info:sid/de.wikipedia.org:Spezial%3AVorlagen+expandieren&rft.btitle=Stadtlexikon+N%C3%BCrnberg&rft.date=2000&rft.edition=2.%2C+verbesserte&rft.genre=book&rft.isbn=3921590698&rft.place=N%C3%BCrnberg&rft.pub=W.+T%C3%BCmmels+Verlag" style='display:none'>&#160;</span>

As you can see, there is one of the offending span tags in that code. I can't explain it, and I don't see exactly how or if it relates to the content translation tool, but maybe this research will help one of the perceptive analysts or coders at the WMF to figure out which project, if not content translation, this bug should be assigned to. That useless span tag should not be brought into en.WP, from any language, when an article is translated to en.WP using the content translation tool.

It is my impression that "CTX" is the content translation tool, but I could be wrong. The rest of this post proceeds on that assumption.

No. CTX is not Content Translation(https://www.mediawiki.org/wiki/Content_translation). Also, English Wikipedia had restricted use of this tool- https://en.wikipedia.org/wiki/Wikipedia:Content_translation_tool

NicoV added a comment.EditedDec 10 2019, 7:37 AM

Is it possible to stop posting objections and start looking into this bug ?

This problem is not restricted to enwiki. I'm mostly working on frwiki, this problem is seen repeatedly for months or years, and nothing is done about it.
A quick search on frwiki : https://en.wikipedia.org/w/index.php?sort=relevance&search=insource%3A%2Ftitle%5C%3D%5C%22ctx_ver%2F&title=Special:Search&profile=advanced&fulltext=1&advancedSearch-current=%7B%7D&ns0=1 yields more than 50 articles with this tag, while I have already fixed hundreds of them

Is it possible to stop posting objections and start looking into this bug ?

Please, keep in mind that here we have a common goal, improve our tools to help editors. I'd recommend not to generalize and focus the reporting on the specific issue. That always helps to prevent side conversations and contributes to keep people focused and motivated to solve the issue.

Templates are independent on each wiki, and they can produce many different problems. Even if those problems seem similar on the surface, they need specific investigation and resolution. Fortunately, once a specific problem is resolved, a test case is added to prevent future regressions. Your comments above may give the impression that problems persists without any progress, but I don't think that's the case. Only looking at the 32 bugs you reported about this tool, 15 have been closed already. We appreciate users reporting bugs because that helps the tool to improve, but we have many requests, and we need to organize our efforts.

One aspect that helps a lot to reduce the investigation time is to provide a specific example of the source content that produces the issue. For this ticket I extracted the problematic case in this page, and can be tested now with this quick link.
Translating the contents with Content translation (using Google Translate), resulted in the problematic tags when published:

  • {{Ouvrage|volume=Band 26}} <span>Dans:</span> [[Friedrich Justin Bertuch]] <span>(éd.</span> <span>):</span> <cite style="font-style:italic">Éphémérides géographiques générales</cite> <span>.</span> <span>Fabriqué par une société d'universitaires.</span> <span style="white-space:nowrap">bande <span style="display:inline-block;width:.2em">&nbsp;</span> 26</span> <span>.</span> <span>Landes-Industrie-Comptoir, Weimar 1808 (</span> [https://books.google.de/books?id=YN0BAAAAYAAJ texte intégral] &#x20; <span>dans Google Recherche de Livres).</span> <span class="Z3988" style="display:none" title="ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rfr_id=info:sid/de.wikipedia.org:Benutzer%3ACXTests%2FT218420&rft.atitle=Biographische+Notiz+von+Franz+Ludwig+G%C3%BCssefeld&rft.au=Franz+Ludwig+G%C3%BCssefeld&rft.btitle=Allgemeine+geographische+Ephemeriden&rft.date=1808&rft.genre=book&rft.place=Weimar&rft.pub=Landes-Industrie-Comptoir&rft.volume=Band+26">&nbsp;</span>

In Content translation, the template seems to no longer appear as a template, and it is decomposed in its HTML parts in the source (maybe similar to T216812 ). Which causes tags to be generated when the source elements are transferred into the translation: