Page MenuHomePhabricator

Arabic and Waw letter و becomes part of link suffix but it should not (and it seems to not be part of $linkTrail)
Open, Stalled, Needs TriagePublic

Description

Event Timeline

Aklapper changed the task status from Open to Stalled.Dec 31 2020, 10:02 PM

@alaa: Errm, looking at https://phabricator.wikimedia.org/source/mediawiki/browse/master/languages/messages/MessagesAr.php$470 , و is not part of $linkPrefixCharset for Arabic (or I misunderstand something here)? Could you clarify, please?

@alaa: Errm, looking at https://phabricator.wikimedia.org/source/mediawiki/browse/master/languages/messages/MessagesAr.php$470 , و is not part of $linkPrefixCharset for Arabic (or I misunderstand something here)? Could you clarify, please?

Hello @Aklapper, per T263266#6518983, you can see و in Orange (row U+064x/column 8). و (Waw) it's the Arabic version of and, but there's no space between it and the next word. See below:

Present

Expected

Thanks. The task summary talks about removing و from the linkPrefixCharset, and I wrote that it looks like it is not part of the linkPrefixCharset.
So I don't see how you can remove something when it is not there?

Aklapper renamed this task from Removing Arabic (and) Waw و from the list of $linkPrefixCharset to Arabic and Waw letter و becomes part of link suffix but it should not (and it seems to not be part of $linkTrail).Jan 11 2021, 12:11 PM

I'll repeat the questions posted on the original task. If you are ok with all these, then we can make the change:

We can remove Waw from the list of characters, but I would want to check a few things first:

  • Doing so would mean that this character never gets pulled into the link, regardless of context. I don't know if this character can ever be used inside another word, or have a different meaning, so I don't know if this is a problem.
  • This would apply to any wiki that has a content language of 'ar'. I don't know if there any wikis where they write in a dialect but use this language code, but it's worth making sure that the meaning of this character is reasonably global.
  • Waw was included in the link trail set prior to this task, so this would be a change to that behaviour.

As discussed during the team's 17-February meeting, we're going to create a prototype for an approach to resolving this issue. When that is ready, we'll ping @Dyolf77_WMF and @alaa to provide feedback.