Page MenuHomePhabricator

2012 cleanUpLinks fix not in core
Closed, DeclinedPublic

Description

https://mediawiki.org/wiki/Special:Code/pywikipedia/10251 added fix to cleanUpLinks in compat on 2012-05-24 , and it appears that fix was not made to core.

In May 2015, with c81ecba8, cleanUpLinks was disabled in compat , but it is still enabled in core.

Event Timeline

jayvdb assigned this task to Ladsgroup.
jayvdb raised the priority of this task from to High.
jayvdb updated the task description. (Show Details)
jayvdb added subscribers: jayvdb, Yamaha5, Dalba.

The linking MoS of Persian WP specifically says that it's better to type links like ‏[[گوسفند]]ها instead of ‏‏ [[گوسفند|گوسفندها]]‏. And this has been the case since may 2009. That code actually does the reverse.

I don't know why it had been suggested, maybe a technical difficulty at the time or a misunderstanding of the MoS? Anyway, I feel comfortable with the MoS' way and don't see why we should change it.

Maybe this is a RTL issue?

Maybe this is a RTL issue?

No, I don't think so :)

@Ladsgroup, in siteinfo, there are two configurations, one for linkprefix and another for linktrail. See https://fa.wikipedia.org/w/api.php?action=query&meta=siteinfo . I'm not 100% sure what linkprefix does, but from the MW code it appears to be similar to linktrail.
https://doc.wikimedia.org/mediawiki-core/master/php/MessagesEn_8php.html#a51c4bee1bf441947578c290040297367 and ~165-170 of https://doc.wikimedia.org/mediawiki-core/master/php/ApiQuerySiteinfo_8php_source.html

Are you sure that the linktrail occurs on the right hand side of the 'word' in a RTL language?

linktrail happens in the left side of a word in RTL languages. but it's end of a word in these languages. So it does not matter left or right.

Our link replace routines use the re module, which processes text left to right.

computer stores and processes everything from left to right, no matter what language. But when it's showing them, it differs between RTL and LTR langugages.
e.g. this text is in an RTL language:
"עברית רי"
As you can see it's writing from left to right. More importantly in computer it's stored this way:
[0] = ע
[1] = ב
[2] = ר
[3] = י
[4] = ת
[5] = ' '
[6] = ר
[7] = י
but for showing them, due to unicode standard algorithm it starts from right and appends characters to left.

But if ‏[[گوسفند]]ها is being converted to ‏‏[[گوسفند|گوسفندها]]‏, as Dalba suggests above, then something is wrong, as that is not what cleanUpLinks (and the new replace links tools) are supposed do (and in my experience they do not do this in LTR text).

Anyway, rather than theorize, lets find what the actual problem is. Unfortunately, I cant reproduce that behaviour.

$ python2.7 pwb.py shell
Welcome to the Pywikibot interactive shell!
>>> import pywikibot
>>> s = pywikibot.Site('fa', 'wikipedia')
>>> p = pywikibot.Page(s, 'User:John_Vandenberg/test')  # م[[ختلف]]ی
>>> from pywikibot.cosmetic_changes import CosmeticChangesToolkit
>>> cc = CosmeticChangesToolkit.from_page(p, False, False)
>>> cc.cleanUpLinks(p.text)
u'\u0645[[\u062e\u062a\u0644\u0641]]\u06cc'
>>> cc.cleanUpLinks(u'[[گوسفند]]ها')
u'[[\u06af\u0648\u0633\u0641\u0646\u062f]]\u0647\u0627'

Would be good if a current reproducible example can be shown of cleanUpLinks splitting the link inappropriately.

It was intentional. I remember we had a discussion in Persian Wikipedia and then I was asked to add a functionality to change [[foo]]bar to [[foo|foobar]]. I don't know why people say it's wrong now.

Anyway I don't think we need this function anymore.

@Ladsgroup, can you show where is the "functionality to change [[foo]]bar to [[foo|foobar]]" on Persian Wikipedia? I cant see any 'fa' specific code in core cosmetic changes which achieves that.

it is better to leave fa.wiki's link as they are!
here cosmetic changes cause بوم‌منطقه‌های > بوم‌منطقههای which is not accepted on fa.wikipedia. the code remove the ZWNJ. Now the code is ok

Ladsgroup removed a project: User-Ladsgroup.
Ladsgroup subscribed.
Xqt subscribed.

Closed because it it too old and it is not intended to merge old code from compat to core. Pywikibot uses linktrails from mw API since 7.3.0 and any changes due to corresponding behaviour should be addressed to MediaWiki first. Please feel free to open a new task and new specificatons if there are remaining problems.