Page MenuHomePhabricator

[minor] Truncate/filter Tag filter descriptions
Closed, ResolvedPublic

Description

In the New Filters tag menu, some Tag filter descriptions

  • are long

Screen Shot 2017-11-02 at 3.34.24 PM.png (498×714 px, 134 KB)

  • contain links that are not displayed as links, so it's not a useful information - in the screenshot below - hist and log

Screen Shot 2017-11-02 at 3.34.49 PM.png (248×704 px, 61 KB)

Actions to take

  • Please truncate the descriptions at 120 characters (that should keep to 2 lines). It doesn't matter if it's in the middle of a word or not.
  • Append four dots at the end, to indicate that this is an excerpt.
  • Is there anything we can do about the links (which don't link)? I'm not sure what I'd prefer: Either omit the link and sustitute three dots, or omit and substitute [link]? Any other ideas?

Event Timeline

Change 395768 had a related patch set uploaded (by Petar.petkovic; owner: Petar.petkovic):
[mediawiki/core@master] Truncate tag filter descriptions

https://gerrit.wikimedia.org/r/395768

  • Please truncate the descriptions at 120 characters (that should keep to 2 lines). It doesn't matter if it's in the middle of a word or not.
  • Append four dots at the end, to indicate that this is an excerpt.
  • Is there anything we can do about the links (which don't link)? I'm not sure what I'd prefer: Either omit the link and substitute three dots, or omit and substitute [link]? Any other ideas?

These requirements need to take non-English languages and wikis into account:

  • The character is not always represented using the same amount of bytes as Latin characters are. We have a method to truncate by number of bytes, and I have published a patch to truncate the description at 120 bytes.
  • Indicating excerpt is different in other languages and we have long standing way to support that, but it has three dots for English, not four.
  • The links to abuse filters that add tags are wiki-specific. enwiki uses the template to create messages like "Tagged by filter 217 (hist · log)". French and Catalan wiki don't use templates, but do link to abuse filter(s) which adds the tag. Some wikis don't have such descriptions for tags. The method which we decide to apply to descriptions needs to work across all wikis.
  • I did not know we couldn't do character counts for truncation. I thought we could.... Hm. So, what is the relationship between K and characters? Doesn't that vary a lot (I have a vague recollection that asian characters require more...). Anyway, what does 120k look like? The goal was to stop these fom running on more than a couple of lines.
  • Three dots instead of four is fine.
  • I don't understand your comments about links and templates. Here's the deal: some of these descriptions include long-form URLs (e.g., see the en.wiki tag dashboard.wikiedu.org [2.0]). I was just hoping we could skip over actual URLs like this. Links like "hist" are also useless but less objectionable, I suppose.
  • I did not know we couldn't do character counts for truncation. I thought we could.... Hm. So, what is the relationship between K and characters? Doesn't that vary a lot (I have a vague recollection that asian characters require more...). Anyway, what does 120k look like? The goal was to stop these fom running on more than a couple of lines.
  • Three dots instead of four is fine.
  • I don't understand your comments about links and templates. Here's the deal: some of these descriptions include long-form URLs (e.g., see the en.wiki tag dashboard.wikiedu.org [2.0]). I was just hoping we could skip over actual URLs like this. Links like "hist" are also useless but less objectionable, I suppose.
  • It is true that Asian characters require more bytes to represent one character. I was using the standard infrastructure we have for truncating the long texts. That means Latin script would have 120 character limit, and Cyrillic script would be cut at around 65 characters. You don't have to worry about that, this is more technical, and I will continue the discussion with developers about having more characters for all non-Latin scripts.
  • Yes, it's better to follow the standard that is used for years (or decade).
  • I was talking about where the descriptions in screenshot (in the description) are coming from. They include links like "hist", which you've described as useless. Removing those would require too much specificity, as they are defined that way on English wiki by template. URLs like one for dashboard.wikiedu.org [2.0] are more general and easier to remove. EDIT: We need to come up with a way to make sense of the whole description. Just removing the URL leaves the rest of the sentence hanging.

In T179626#3816770, @Petar.petkovic wrote:

... We need to come up with a way to make sense of the whole description. Just removing the URL leaves the rest of the sentence hanging.

Do you want to try substituting [link], and see what that looks like? Or we could just do the truncation and let the chips fall where they may.

Do you want to try substituting [link], and see what that looks like? Or we could just do the truncation and let the chips fall where they may.

I will try substituting [link], but I can see some problems beforehand. Replacing For more details, see URL with For more details, see ... is weird and even weirder if there is any text after that.

We have examples like this one, where we have text substituting a link and that is something we can work around. We can extract the text, have something meaningful, and actually don't let long URLs add to length of a text when truncating. But, when you have just URLs, without text substitution, there will most definitely be problems:

  • With sentence context and meaning, if we remove/replace the URL
  • With truncation (because URLs are lengthy and can add to description length), if we don't remove/replace the URL.

I will work more on this, but we need to pick our poison.

Change 395768 merged by jenkins-bot:
[mediawiki/core@master] Truncate tag filter descriptions

https://gerrit.wikimedia.org/r/395768

So what was the decision about how to handle inline links?

This patch turned out to be the one where support for truncating based on number of characters is introduced. Before that, we only had option to truncate based on number of bytes. See our previous comments on this ticket for more. That feature stole the show on that patch and was discussed more than what's actually described on this ticket description.

So what was the decision about how to handle inline links?

Everything will stay the same as it is now in the production, but truncated to 120 characters. Or in other words:

We could just do the truncation and let the chips fall where they may.

Checked in betalabs - the screenshots below show truncation for long tag description (English and Chinese):

Screen Shot 2018-02-21 at 12.36.32 PM.png (283×642 px, 49 KB)

Screen Shot 2018-02-21 at 12.39.19 PM.png (258×644 px, 68 KB)

QA Recommendation: Resolve