Search for "badtitletext" in this (very messy) diff: https://fr.wikipedia.org/w/index.php?title=Zach_Galifianakis&diff=prev&oldid=112810599
Based on the location, this is probably replacing internal links (maybe redlinks?).
Search for "badtitletext" in this (very messy) diff: https://fr.wikipedia.org/w/index.php?title=Zach_Galifianakis&diff=prev&oldid=112810599
Based on the location, this is probably replacing internal links (maybe redlinks?).
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Catrope | T93045 [[Mediawiki:Badtitletext]] being added to articles | |||
Resolved | Catrope | T94498 Trim whitespace in LinkTargetInputWidget | |||
Resolved | ssastry | T94599 Parsoid should not output [[MediaWiki:Badtitletext|...]] ever |
Were you able to create an edit of your own like this? I suspect the huge number of links being added is the problem here, MediaWiki:Badtitletext showing up being just a symptom of it attempting to link all the things (including something invalid)?
I have not tried to reproduce it. However, I point out the remarkable
similarity between the new content and the English Wikipedia article on the
same subject. It is possible that a massive copy-paste operation happened
here.
Logs from Kibana:
Can add more (21 found in last 24 hours) if required .. Throw anything parsoid-specific over to us.
Scratch all that. There are false positives .. I just confirmed by trying to serialize some of these reports and they serialize just fine but the log output is also generated. So, our logging is not precise enough. Will fix that and we can revisit new reports once that is in place. But, what I can do is take a look at the 36 instances we have in kibana and sift out the false positives.
https://gerrit.wikimedia.org/r/#/c/199800/ is the patch to remove false positives from the logs. Subbu says it'll be deployed on Monday.
Okay, I slurped the relevant log entries from kibana via curl and extracted the HTML snippets and ran them through parsoid html2wt and found a few valid instances of bad title text. I'm going to post one relevant entry (full log):
{"host":"wtp1018","level":3,"version":"1.0","@version":"1","@timestamp":"2015-03-25T10:08:48.539Z","source_host":"10.64.32.91","pid":14556,"logType":"error","wiki":"frwiki","title":"Journal_de_la_psychanalyse_de_l'enfant","oldId":113242101,"longMsg":"Bad title text\n<a rel=\"mw:WikiLink\" href=\"%09http%3A%2F%2Fbsf.spp.asso.fr%2Findex.php%3Flvl%3Dnotice_display%26id%3D290\" data-parsoid-diff=\"{"id":8472291,"diff":["modified","inserted"]}\">Indexation complète des articles parus à la Bibliothèque Sigmund Freud</a>","type":"parsoid","tags":["es","gelf","normalized_message_trimmed"],"message":"Bad title text <a rel=\"mw:WikiLink\" href=\"%09http%3A%2F%2Fbsf.spp.asso.fr%2Findex.php%3Flvl%3Dnotice_display%26id%3D290\" data-parsoid-diff=\"{"id":8472291,"diff":["modified","inserted"]}\">Indexation complète des articles parus à la Bibliothèque Sigmund Freud</a>","normalized_message":"Bad title text <a rel=\"mw:WikiLink\" href=\"%09http%3A%2F%2Fbsf.spp.asso.fr%2Findex.php%3Flvl%3Dnotice_display%26id%3D290\" data-parsoid-diff=\"{"id":8472291,"diff":["modified","inserted"]}\">Indexation complète des arti"}
So, something happened in the editor. In case it matters, note that the link in the old version is a mw:ExtLink .. but, in the log entry above, parsoid got the same link with a mw:WikiLink type.
After the Parsoid deploy, I found one instance of this error in Kibana:
Bad title text <a href="https://ru.wikipedia.org/wiki/%D3%ED%E8%E2%E5%F0._%CD%EE%E2%E0%FF_%EE%E1%F9%E0%E3%E0"; rel="mw:ExtLink" data-parsoid-diff="{"id":1662279,"diff":["inserted"]}">Универ. Новая общага</a>
which corresponds to the following diff: https://ru.wikipedia.org/w/index.php?title=%D0%9C%D0%BE%D0%BB%D0%BE%D1%85%D0%BE%D0%B2%D1%81%D0%BA%D0%B0%D1%8F,_%D0%95%D0%BA%D0%B0%D1%82%D0%B5%D1%80%D0%B8%D0%BD%D0%B0_%D0%92%D0%B8%D0%BA%D1%82%D0%BE%D1%80%D0%BE%D0%B2%D0%BD%D0%B0&diff=next&oldid=69680636
It looks like the link href was encoded using windows-1251 rather than UTF-8, but URL-encoding is always required to be UTF-8, so URL-decoding fails.
That one appears to be due to a tab character having been added before the URL. You can't do this with the tab key, but you can paste a tab character (or a URL preceded by a tab) into the link interface.