Page MenuHomePhabricator

Magic word on English WP to override display of Wikidata short description
Closed, ResolvedPublic

Description

Create a magic word to be used on English Wikipedia as an override for the display of the Wikidata short description in all of the places where the description is paired with English Wikipedia content, including on the apps, in search, and in Visual Editor's link module.

The model for this is the defaultsort magic word -- {{DEFAULTSORT:Big Sleep, The}} -- which is used in the wikitext on the article page to determine where the page title appears on category pages.

The new magic word should be {{SHORTDESC:1946 film by Howard Hawkes}}

When displaying the short description, the display should check to see if the magic word is filled in (not blank) in the English Wikipedia article. If there is a description on English Wikipedia, then that description should be used.

If the magic word isn't used on the page, or if the description is blank, then it should show the description from Wikidata.

"Blank" means:

  • not having any description in the field {{SHORTDESC:}}
  • just blank spaces {{SHORTDESC: }}
  • just punctuation {{SHORTDESC:.}}
  • just non-breaking space {{SHORTDESC: }}

In those cases, it should pull the description from Wikidata.

Once this override is live, Wikipedia editors will populate the magic word on pages where they want to override the description. If/when they write enough descriptions that it's roughly comparable to the number of existing Wikidata descriptions, we will change the system again, to only pull from the Wikipedia descriptions, and not use the descriptions on Wikidata as the default/fallback. At that point, on pages that don't have the magic word (or on pages where the description is left blank), there will be no description to display.

We're still talking about when that switch will happen -- the current plan is to switch when there are non-blank descriptions on 2 million article pages. That might happen quickly, or it might take a long time, depending on how the community chooses to use the magic word.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Deployed on enwiki (ping @LGoto). Not used anywhere yet (I imagine {{short description}}, used on ~5000 pages, will be soon updated to use it); API help is at https://en.wikipedia.org/wiki/Special:ApiHelp/query+description
Note that the API is enabled on all wikis but overrides only work on enwiki (and testwiki / enwiki beta).

Follow-ups needed:

  • add short description to page info so that editors can check the current value without having to call the API
  • add auto-tracking of Wikidata description to pages where the override is not used
  • use the new API in MobileFrontend, MCS and mobile apps (and maybe VisualEditor?)

I'm testing it out on enwiki:

https://en.wikipedia.org/wiki/Wagon_Train (Wikidata desc: Television program / enwiki desc: Western television series aired 1957-1965)

https://en.wikipedia.org/wiki/Guy_Madison (Wikidata: Actor / enwiki: American film and television actor)

I don't see the overrides showing up in the iOS app, but it's only been a couple minutes and it's probably cached. Is there an estimate for how long it'll take to see the override description?

@DannyH see follow ups - I think we need to update all clients now. @Tgr do you plan to do one for mobile web or do you need me to organise that?

Thanks, Jon. So this is the most important follow-up:

  • use the new API in MobileFrontend, MCS and mobile apps (and maybe VisualEditor?)

VisualEditor should be included in this; the short description should use the override everywhere that the description is used on English WP.

@Tgr do you plan to do one for mobile web or do you need me to organise that?

I'd leave that to Reading Web if that's OK with you.

Will it be possible to access/display the descriptions in the text of other articles (e.g., list articles, or ideally even on Commons/other projects), or will this be write-only from the perspective of editing?

Asking this again in light of the recent activity.

VisualEditor should be included in this; the short description should use the override everywhere that the description is used on English WP.

From a quick grep, the affected MediaWiki extensions seem to be CirrusSearch, ContentTranslation, MobileFrontend, RelatedArticles and VisualEditor.
(Plus the mobile apps, plus any services - MCR at least.)

I've now enabled the magic word on some 4600 articles on English Wikipedia:

https://en.wikipedia.org/wiki/Special:WhatLinksHere/Template:Short_description

That should give some scope for testing where you want the output of {{SHORTDESC:}} to go to.

Change 423244 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/extensions/Wikibase@master] Refactor business logic in description API into helper class

https://gerrit.wikimedia.org/r/423244

Change 423245 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/extensions/Wikibase@master] Add local and central description to page info

https://gerrit.wikimedia.org/r/423245

Will it be possible to access/display the descriptions in the text of other articles (e.g., list articles, or ideally even on Commons/other projects), or will this be write-only from the perspective of editing?

Asking this again in light of the recent activity.

The local description can be retrieved from the API using prop=description, so anything that can query the API can use the description.

Since people don't want to wait for proper development, nor are capable of writing their own scripts (why do we even bother with gadgets etc any longer? ) https://en.wikipedia.org/wiki/MediaWiki:Gadget-Page_descriptions.js

The short description stored is the one from the last instance of the magic word SHORRTDESC found on the page. This is problematic and it would be preferable to store the first instance found on the page. This is a bug. Please fix

Short descriptions are automatically produced by several templates, Some produce a generic short description suitable for a large number of pages, such as disambiguation pages. Others assemble a short description from component parameters of an infobox, which are sufficient in most cases, but seldom ideal.

In cases where a local customised short description is preferred, this is added to the top of the article for reasons discussed on Wikipedia, which come down to that the top of the article is whre people will look for it as it is an annotation to the title.

Currently the last instance of the magic word defines the stored short description. This should be changed so that the first instance is stored

Every magic word with a side effect works in a last-one-wins manner; it would be pretty confusing to change that just for this one, IMO. Anyway, probably better to discuss in a new task than an already-closed one.

a) What side effect?
b) Confusing to whom? As it stands it is confusing to the people who have to use it, and may require complicated work-arounds to get it to work usefully. These will be a huge time-sink for the people who are writing the encyclopaedia.
c) This is the task for creating this magic word, and the task is not finished until it works effectively for its purpose.
c) How do you suggest we make this thing work?

@Tgr That ticket would basically be T193857: Support 'noreplace' keyword in {{SHORTDESC}}

@Pbsouthwood
a) The side effect is taking something from a revision and using it for the page (different level of the datamodel).
b) This has been the behaviour of all magic words for 15 years. For instance this is also the behaviour of {{DISPLAYTITLE}}, {{DEFAULTSORT}} etc. consistency is important, even if it might be inconvenient for a particular use.
c) Wikitext parsing has many peculiarities that people need to know about in order to be able to use it correctly

Change 423244 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Refactor business logic in description API into helper class

https://gerrit.wikimedia.org/r/423244

Change 423245 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Add local and central description to page info

https://gerrit.wikimedia.org/r/423245

Just a "heads-up" that a search on enwiki articles for hastemplate:"short description" yields 2,083,124 results.

May we assume that we will be moving on to Stage 2 in the near future?

Just a "heads-up" that a search on enwiki articles for hastemplate:"short description" yields 2,083,124 results.

May we assume that we will be moving on to Stage 2 in the near future?

I really hope this never happens, and that the locally-defined descriptions are merged back to Wikidata. There are 2.8+ million infoboxes on Commons that are using the English descriptions from Wikidata, and there is currently no way that they can access the enwp descriptions.

Just a "heads-up" that a search on enwiki articles for hastemplate:"short description" yields 2,083,124 results.

May we assume that we will be moving on to Stage 2 in the near future?

I really hope this never happens, and that the locally-defined descriptions are merged back to Wikidata. There are 2.8+ million infoboxes on Commons that are using the English descriptions from Wikidata, and there is currently no way that they can access the enwp descriptions.

What about a bot importing the local descriptions from enwiki to wikidata? That way enwiki still has control over the content, but wikidata still has accurate and updated descriptions (that are then provided to commons, etc)

What about a bot importing the local descriptions from enwiki to wikidata? That way enwiki still has control over the content, but wikidata still has accurate and updated descriptions (that are then provided to commons, etc)

I can write a bot that will do that. It would still suck, as it's duplicating descriptions rather than having a single place that the descriptions are updated at, and I expect that it would be opposed for that reason. But if you would support it, then I can write it up and propose it.

What about a bot importing the local descriptions from enwiki to wikidata? That way enwiki still has control over the content, but wikidata still has accurate and updated descriptions (that are then provided to commons, etc)

I can write a bot that will do that. It would still suck, as it's duplicating descriptions rather than having a single place that the descriptions are updated at, and I expect that it would be opposed for that reason. But if you would support it, then I can write it up and propose it.

It would suck, but its better than two different and diverging sources. We would need a way to tell the bot to skip certain items though...
Is there a suitable api for fetching the descriptions of multiple items at once?

What about a bot importing the local descriptions from enwiki to wikidata? That way enwiki still has control over the content, but wikidata still has accurate and updated descriptions (that are then provided to commons, etc)

I can write a bot that will do that. It would still suck, as it's duplicating descriptions rather than having a single place that the descriptions are updated at, and I expect that it would be opposed for that reason. But if you would support it, then I can write it up and propose it.

It would suck, but its better than two different and diverging sources. We would need a way to tell the bot to skip certain items though...

It still means that we maintain two separate databases containing what should be the same information. Telling a bot to skip certain items is a key problem, but not the only one, we still have to figure out how to automatically resolve arguments that arise between an editor on enwp and an editor on wikidata, or have some sort of mediation page.

The thing is, Mike, that the descriptions on Wikidata are monolingual. There is a separate description for every language, so it's not a matter of a central repository that can be used across all projects. The Wikidata description field is potentially useful for the corresponding language Wikipedia and the handful of multilingual Wikis, so it really doesn't find as many uses as statements do. Also unlike statements, it is not capable of being linked to a reliable source. so there's no inherent mechanism to promote reliability.

I agree with Danny: we'd be far better off bot-importing the short description from the language Wikipedia into Wikidata's description. Or even better, dynamically importing the "local description' field, generated by the language Wikipedia as a read-only replacement for the Wikidata description.

I think we are forgetting that both systems have divergent copyright licenses, with WikiData the more liberal one. As such a bot automatically copy pasting would likely violate copyright licensing.

I think we are forgetting that both systems have divergent copyright licenses, with WikiData the more liberal one. As such a bot automatically copy pasting would likely violate copyright licensing.

That does cause a problem. @RexxS any thoughts? From my perspective, just maintaining them on Wikidata still seems to be the simplest option...

Well it didn't seem to cause a problem for all of the statements imported from the various Wikipedias.

It also doesn't seem to cause a problem for anybody using the enwiki 'short description helper', which silently and invisibly adds a Wikidata description whenever the enwiki short description is added. I'm not seeing any difference in licensing issues between that and the proposed updating bot.

Wikidata remains unsuitable as a repository for article descriptions because of (1) its tardy response to vandalism of the description field; (2) the inability to validate the description field; (3) the lack of a mature and nuanced BLP policy that would prevent subtle but serious breaches of enwiki policies on living persons.

Description are designated short enough to be uncopyrightable. Any long description (that is copyright eligible) should be rewritten.

Previously I imported (as one-off task) ~110000 short descriptions to Wikidata, and I found the short descriptions have such problems: 1. inconsistent use of capitalization of first character; 2. descriptions in title case (such as "Largest City in United States", where "city" should be in lower case); 3. wiki markup in description. in addition, a few short descriptions generated by templates do not make sense.

The advice given to editors adding short descriptions on enwiki is to use the same case as would be used in article titles, e.g. "American politician", "Largest city in England", "eBay Founder". Those are the correct case to use on any sub-heading such the search on mobile and the sub-heading on the Wikipedia App. It is unfortunate that Wikidata chose to use prose fragment case, as that never gets used in any application that I'm aware of.

Editors will of course make mistakes, on either project, so differences in capitalisation won't be an argument in favour of using either project as the central repository.

Because enwiki editors have found an extra use for the short description, as an annotation for links (in See also sections, for example), there will be valid examples of short descriptions that contain wikimarkup, usually a link. It's fairly straightforward to strip wikimarkup from any text, so that shouldn't be a barrier to importing short descriptions into Wikidata, where the markup would serve no purpose.

In Wikidata it is excepted that the description can be added after label (or usually title) in parentheses, e.g. machine translation (sub-field of computational linguistics).

Wikidata description is also designed to disambiguate items with the same or similar labels, so that no two items have same label and same description (and labels should not include disambiguation information). This is not the case of Wikipedia as no two article have same title and disambiguation is handled by article name. for example Georgia (country), in this case the "country" served the disambiguation purpose, similar to Wikidata description. Note Wikidata description should also be understandable independently, so "mathematics" is not a good description for "Set (mathematics)".

Wikidata is not an application, so I'm not sure of the value of its expectations., I'm unaware of any application that adds the description after the label in the way you describe, Perhaps you can name some examples?

Otherwise the principal use of Wikidata descriptions is on the mobile platforms for the 6 million+ articles on the English Wikipedia, plus the millions of other articles in other language Wikipedias. The use of those descriptions is for sub-headings, which naturally begin with a capital, with the exception of stylisations like "eBay" and scientific terms like "nCov-19".

When used to help searches on mobile, short descriptions are designed to distinguish items with similar titles as quickly as possible. That allows users to see a list of matching articles and pick the one they want without having to type the entire search term. For example, you should only need to type the first few characters of "Benzopyrene" before you see

(+)-Benzo(a)pyrene-7,8-dihydrodiol-9,10-epoxide
Cancer-causing agent derived from tobacco smoke

which allows you pick the particular carcinogenic chemical derived from smoke, rather than any other "Benzo-" chemical.

You don't need a short description to distinguish Georgia (country) from Georgia (U.S. state) or any of the other 30+ similar articles when searching, so neither the local short description nor the Wikidata description is needed in that case.

The Wikidata short description for "set" (Q36161) is currently "fundamental mathematical concept related to the notions of belonging or inclusion". That is unnecessarily wordy and long for a search using a mobile phone screen, so it's fortunate that the article is titled "Set (mathematics)" and that is easily enough to distinguish the article from any of the other 50+ articles listed on the disambiguation page. On the Wikipedia App, you see:

Set (mathematics)
Fundamental mathematical concept related to the notions of belonging or inclusion

Can anybody give an example of a user who would find that lengthy sub-title useful? Particularly when you look at the opening sentence of the lead.

An insistence that the Wikidata description has to be a stand-alone description of the entity is another reason why Wikidata descriptions are often unsuitable to do the job of short descriptions on Wikipedias.

Wikidata description is to be used with labels, so users in Wikidata will see:

set
fundamental mathematical concept related to the notions of belonging or inclusion

Georgia
state of the United States of America

In addition, not all items have an English label, so adding description will let users know what they are.

As far as I can see no-one has answered RexxS's question. Is it likely that anyone here can actually do that?

Wikidata description is to be used with labels, so users in Wikidata will see:

set
fundamental mathematical concept related to the notions of belonging or inclusion

Georgia
state of the United States of America

In addition, not all items have an English label, so adding description will let users know what they are.

Nobody uses the Wikidata description with labels, and you're confusing "users of Wikidata" with people reading a page on Wikidata. That's not how Wikidata is used, nor how it was meant to be used.

The first example is twice as long as it needs to be to perform a useful purpose. The second one could use "USA state" without losing any meaning at all. These are being used as short descriptions, not essays.

Descriptions are made in exactly the same languages as labels are. I've been working with Wikidata since it began, and in all that time, I've never seen an entry with a description in English, but no label in English. Can you point to a single example of that happening?

"Nobody uses the Wikidata description with labels" - Wikidata description with first letter converted to uppercase is used in Siri, see https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2018/08#User_instructions_in_descriptions_are_harmful (though it mentions an obvious issue to fix).

It is unfortunate that Wikidata chose to use prose fragment case, as that never gets used in any application that I'm aware of.

I don’t understand why this would be unfortunate. One can trivially convert prose fragment case to the case you desire (by upper-casing the first letter), while the reverse is not-trivial (one cannot just lower-case the first letter, because it might be a proper noun). Or am I missing something else?

Compared to the 14,986,105 entries with an English label but no English description. How does the bot-added English description "Wikimedia list article" help anyone create a label for Q1831702? or the description "scientific article (publication date: June 2006)" help anyone create a label for Q28242786?

I remember that discussion titled "User instructions in descriptions are harmful" but I don't see how anybody could think that Siri (a speech-based virtual assistant) is an example of an application that shows

set
fundamental mathematical concept related to the notions of belonging or inclusion

After all, Wikidata description with first letter converted to uppercase is used is used in enwiki mobile searches and the Wikipedia App unless we have created a local short description, as I explained earlier, but those don't use the Wikidata label.

It is unfortunate that Wikidata chose to use prose fragment case, as that never gets used in any application that I'm aware of.

I don’t understand why this would be unfortunate. One can trivially convert prose fragment case to the case you desire (by upper-casing the first letter), while the reverse is not-trivial (one cannot just lower-case the first letter, because it might be a proper noun). Or am I missing something else?

Yes, you're missing the third example I gave: "eBay", which I expanded upon in a subsequent post :

The use of those descriptions is for sub-headings, which naturally begin with a capital, with the exception of stylisations like "eBay" and scientific terms like "nCov-19".

There is a non-trivial and unbounded set of words that always begin with a lower-case letter in English. So if you try to "trivially convert prose fragment case to the case you desire (by upper-casing the first letter)" you come unstuck in an undefinable number of cases. Why not store the description in title case as is recommended for enwiki short descriptions?

What about a bot importing the local descriptions from enwiki to wikidata? That way enwiki still has control over the content, but wikidata still has accurate and updated descriptions (that are then provided to commons, etc)

I can write a bot that will do that. It would still suck, as it's duplicating descriptions rather than having a single place that the descriptions are updated at, and I expect that it would be opposed for that reason. But if you would support it, then I can write it up and propose it.

Following up on this, please see the bot proposal at https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot/Pi_bot_14 (and the links therein)