Page MenuHomePhabricator

Provide another way to surface usage instructions besides tacking them onto descriptions
Open, HighPublic

Assigned To
None
Authored By
kaldari
Apr 29 2015, 7:20 PM
Referenced Files
F24824003: Angel.jpg
Aug 11 2018, 10:13 PM
F191956: Screen Shot 2015-07-13 at 21.01.59.png
Jul 14 2015, 4:05 AM
F158661: Screen_Shot_2015-04-29_at_12.23.01_PM.png
Apr 29 2015, 7:24 PM
F158666: Screen_Shot_2015-04-29_at_12.24.10_PM.png
Apr 29 2015, 7:24 PM
F158667: Screen_Shot_2015-04-29_at_12.24.22_PM.png
Apr 29 2015, 7:24 PM
F158660: Screen_Shot_2015-04-29_at_12.23.20_PM.png
Apr 29 2015, 7:24 PM
F158662: Screen_Shot_2015-04-29_at_12.22.09_PM.png
Apr 29 2015, 7:24 PM
Tokens
"Like" token, awarded by Moebeus."Burninate" token, awarded by DarTar."Pterodactyl" token, awarded by Ejegg."Like" token, awarded by Daniel_Mietchen.

Description

It's becoming more and more common on Wikidata for the description field to be overloaded with instructions about how to use a Wikidata item within claims. The instructions are not appropriate for the description field and end up causing confusion when the descriptions are displayed outside the context of Wikidata (for example, in Wikipedia App search results).

Here are some examples (I can provide lots more):
banana - the fruit (for the best-known species, use Q10757112; for the genus, use Q8666090)
male - person who is male (use with Property:P21 sex or gender). For groups of males use with subclass of (P279).
fictional character - fictional person in a narrative work of arts (for human fictional character use Q15632617)

There should be a separate field for entering instructions related to the use of a Wikidata item. This field doesn't necessarily need to be localized like the description does.

Current state: there is a Property to store the usage instructions but it is not used yet in the entity suggester.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

There are several ways to solve this problem that are all equally valid at this point and require much less engineering by the dev team and maintenance by the community.

@Lydia_Pintscher What are the other possible solutions? Is there a previous discussion about this somewhere?

There are small discussions all over the place started by the people in this ticket.

Other potential solutions:

  • Agree that usage information should not go into the descriptions. Existing descriptions should be rewritten.
  • Agree that usage descriptions should always go in brackets and that they can be discarded by clients who don't need them.

My feeling is that we are talking about a very small number of items here. (Happy to be proven wrong.) I am not going to spend a lot of development time and add additional complexity, maintenance effort and so on for a very marginal number of items when the issue can be solved much easier and with less complexity for everyone.

  • Agree that usage information should not go into the descriptions. Existing descriptions should be rewritten.

This solution would remove a use case which is currently supported in Wikidata, albeit badly, so this does not seem like a good solution to me.

  • Agree that usage descriptions should always go in brackets and that they can be discarded by clients who don't need them.

This would necessitate duplicative code in all API clients, so also does not seem like a good solution. That said, an alternative to this would be for Wikidata to also emit a sanitised description via the API, which performs this stripping on the server, thus reducing code duplication whilst essentially performing the same function. Exposing such functionality in the PageTerms API module would likely not be hard.

My feeling is that we are talking about a very small number of items here. (Happy to be proven wrong.)

I think you're right about that, although I do not have numbers to back up my statement either. :-)

I am not going to spend a lot of development time and add additional complexity, maintenance effort and so on for a very marginal number of items when the issue can be solved much easier and with less complexity for everyone.

If a team the the WMF badly needs this, they can consider assigning their own engineers to tackle the problem. Ah, the beauty of open source! They would undoubtedly appreciate your guidance and domain knowledge as to what you believe the correct solution to be before they can do that. :-)

If the Readership Team chose to tackle this themselves, which of the proposed solutions would you prefer they implement?

It is not just about the time of my team. It is about the effort we're putting on the community as well to maintain this information on all items when it is only needed on a very small number of them.

If we're adapting Wikidata's api to also provide a sanitized description or leave it to the client either way we need to have a discussion in the community to agree that "anything in brackets can be stripped" is ok and can be agreed upon as an editing standard. I'm happy to do that.

Which of those two to chose for me depends on if we believe most clients will need the stripped version or if they don't care. I don't know to be honest. The biggest usecase of descriptions for 3rd parties is tagging and there you'd want the full descriptions.

It is not just about the time of my team. It is about the effort we're putting on the community as well to maintain this information on all items when it is only needed on a very small number of them.

I'm not sure I understand this. If an item doesn't need this information, then it can simply not be added. No extra maintenance needed. If an item does need this information, then it's already been added to the description, so there is also no extra maintenance needed.

If we're adapting Wikidata's api to also provide a sanitized description or leave it to the client either way we need to have a discussion in the community to agree that "anything in brackets can be stripped" is ok and can be agreed upon as an editing standard. I'm happy to do that.

Right. This is why I suggested the API providing both descriptions, then the consumer can choose which they like. For example, the PageTerms module already provides some support for this. For example,

https://en.wikipedia.org/w/api.php?action=query&titles=Jellyfish&prop=pageterms&wbptterms=description

This query provides only the description, unsanitised. It could continue to do so, for backwards compatibility. You could add support for, say, wbptterms=sanitisedescription, which provides a sanitised description.

Which of those two to chose for me depends on if we believe most clients will need the stripped version or if they don't care. I don't know to be honest. The biggest usecase of descriptions for 3rd parties is tagging and there you'd want the full descriptions.

Right. I don't think any of us know that right now. The advantage of the above solution is that it's totally backwards compatible.

It is not just about the time of my team. It is about the effort we're putting on the community as well to maintain this information on all items when it is only needed on a very small number of them.

I'm not sure I understand this. If an item doesn't need this information, then it can simply not be added. No extra maintenance needed. If an item does need this information, then it's already been added to the description, so there is also no extra maintenance needed.

An additional empty field on nearly every items means:

  • more mental attention required to figure out what it is and does
  • more clutter on item pages drawing away attention from the things we want people to care about
  • more pieces that need patrolling for vandalism and spam

If we're adapting Wikidata's api to also provide a sanitized description or leave it to the client either way we need to have a discussion in the community to agree that "anything in brackets can be stripped" is ok and can be agreed upon as an editing standard. I'm happy to do that.

Right. This is why I suggested the API providing both descriptions, then the consumer can choose which they like. For example, the PageTerms module already provides some support for this. For example,

https://en.wikipedia.org/w/api.php?action=query&titles=Jellyfish&prop=pageterms&wbptterms=description

This query provides only the description, unsanitised. It could continue to do so, for backwards compatibility. You could add support for, say, wbptterms=sanitisedescription, which provides a sanitised description.

Which of those two to chose for me depends on if we believe most clients will need the stripped version or if they don't care. I don't know to be honest. The biggest usecase of descriptions for 3rd parties is tagging and there you'd want the full descriptions.

Right. I don't think any of us know that right now. The advantage of the above solution is that it's totally backwards compatible.

I've just talked this through with Daniel and he made a good point. Wikibase as a software shouldn't know about this convention because other wikis using the software are free to use completely different conventions. So it should go into the client that needs it from our side.

either way we need to have a discussion in the community to agree that "anything in brackets can be stripped" is ok

It certainly isn't; I've already identified problems caused by the Wikipedia app doing so in another context.

Consider, for example, the use-case of a descritption that includes a string representing a chemical formula, say:

(NH4)2SO4

or a description of an album by, say, the band called:

Empire! Empire! (I was a Lonely Estate)

It's really not a good idea to use parentheses to split what are conceptually two different data fields.

@Pigsonthewing: Would [] or even {} be better?

That would probably give fewer errors, but errors nonetheless.

We need to remove instructions from the description; if we don't want to change the underlying software, they could go into a property (datatype=text; qualified by language) or into the documentation template. A user script or gadget could then display them.

I suspect, though, that once we deliberately accommodate instructions, in whatever way, more of them would be written; and that might make editing easier for newbies.

Conceptually descriptions and usage notes are different things. The link between them is that users need to see both of these in pop-down lists on the user interface to help the users pick which to pick.

As they are different things therefore, in an ideal world, they should be different fields.

This isn't an ideal world so we need to think about what we do with the world we have.

Short term options:

  1. mobile.WP ignores the descriptions and automatically generates descriptions based on the statements.
  2. Usage notes in descriptions are denoted by (brackets) or <!--HTML comment tags--> or whatever and m.WP learns to ignore them and use the rest of the description.

Longer term options:

  1. Have a separate (multilingual) field for usage notes and have the user interface display both the description and the usage notes in the drop down options
  2. Where Qids or Pids are included in descriptions translate these into text links using the appropriate language label for that property or item. This should allow all usage notes to be incorporated into descriptions in a way which doesn't look strange when these are reused by m.WP.

Of the options presented by Filceolaire, #2 (for short term) and #3 (for long term) seem the best. @Lydia_Pintscher, do you think supporting some type of comment syntax for the description field would be doable? I would recommend that the API strip out these comments by default, and only show them if specifically requested (for the Wikidata suggestion UI for example).

FWIW, I tried to drum up support for just banning usage instructions from descriptions at https://www.wikidata.org/wiki/Help_talk:Description, but the community was opposed to it since there was no where else useful to put them.

I don't know that this is the right way of thinking about these. The little bits of descriptions seem like slightly-more-complicated constraints notes, that have to give specific information to the editor. Perhaps this data could be added as statements to properties, with notes there regarding use?

@Lydia_Pintscher We're going to start showing Wikidata descriptions to all mobile users of Wikipedia soon (not just in the apps), first in search results and then in article headers. If this isn't resolved by then, I predict edit wars on Wikidata as Wikipedians try to "fix" the descriptions. Any thoughts on the most recent suggestions above?

I tried to add John Katz to the ticket but couldn't find his username. Can someone who knows please do?

IMHO the following things need to happen before a large-scale roll-out:

  • A discussion needs to happen with the Wikidata community. I will handle this now.
  • Description editing needs to be tested on a small scale to gauge impact. (Ideally with a kill-switch in case vandalism is too bad.) Since it is already developed for Android I suggest communicating this to the Wikidata community and rolling it out. Evaluate the impact.
  • If that works well roll it out further on other platforms.

Other things to keep in mind:

  • Someone need to figure out the interaction when editing a description that has "hidden" parts. What happens when it goes over the length limit for example?
  • I advice against rolling out Wikidata descriptions further without them being editable. Wikidata already got quite a reputation hit because of this.
  • As I said earlier the stripping should imho happen on the consumer side as it is specific to the usecase of Wikipedia based on convention in Wikidata. Other Wikibase installations could have different conventions so hardcoding that in Wikibase seems bad.

@Lydia_Pintscher "it is already developed for Android" (you mean iOS?) android description editing is coming also.

agree that it makes sense to see how it goes on the apps and then roll out further.

Would [] or even {} be better?

I strongly suggest to use the same format as in edit summaries: "Description /* comment */". It has a meaning that can't be confused and pretty much all users (all software developers and all users that ever edited an article section) are used to this.

After discussion and reconsideration I'm changing my suggestion to "Description (comment)" with round brackets. This is what's done for article titles. Many bots and tools already have code for stripping this.

Some of the problems with adding a delimiter and instructions to the description field are detailed here: https://phabricator.wikimedia.org/T90765#1357678

tldr: I highly recommend adding an instruction field for instructions rather than using delimiters to overload the description field.

This is still blocking T101719, although I imagine the team may eventually move ahead with that feature regardless. Is there any hope for a resolution to this bug? It seems the delimiter stop-gap solution has stalled.

Personally, I agree with the others that the best solution is to somehow store the usage instructions separately. I understand that you don't want to clutter the interface with an extra field that is only going to be used on a small number of items, so why not make it only editable via the API? The few items that need usage instructions are almost never going to need those instructions edited. They just need to be set once and stored. Then if an item does have usage instructions, those instructions should be displayed below the description on both the item page and in pop-up hints (and as separate data in API results). 3rd party Wikibase users could then just completely ignore the concept of usage instructions if they want to (or there could be a feature flag for it). What do you think?

The Discovery Department is currently planning on making it a Q2 2015-16 goal for Search to, amongst other things, expose Wikidata descriptions in search to all users of all Wikimedia wikis. If this issue remains unaddressed, it will significantly detract from that experience. Some resolution to this issue would be appreciated.

Proof-of-concept draft mockup:

Screen Shot 2015-07-13 at 21.01.59.png (1×1 px, 130 KB)

The problem with these use notes is also that they are usually not localized - i.e. Q503 has this useful note in English, but if you're editing in Russian, nope. For Q6581097 (male) it is a gradual degradation - most information in English, a little less in Spanish, even less in Russian or Hebrew, and none (of the additional info) in Ukrainian.

Noticing that, I wonder: should we give up and recognize the useful use instructions exist pretty much only in English (I know I'm exaggerating here since I can examine maybe half-dozen languages out of hundreds, but the impression is definitely like this)?

If we need to store instructions only in one language, there are more options (e.g. crazy idea: we already have language qqq for i18n texts - use it here too?). I'm not sure if it's a viable solution, but I notice that non-English usability of instructions is now very low.

@Smalyshev: According to the query in T103836, the majority of usage instructions are in English (about 280); French has 35; German has 24, and the numbers decline from there. Thus even if we only created a non-localizable field for instructions, it would be adequate for most items (and a big improvement over the way we are doing it now).

@Lydia_Pintscher: Where would you suggest that we go from here? The community discussion just went in circles with no movement towards consensus. We now know how many items have usage instructions (a few hundred), so that's no longer a blocker. FWIW, I think there are thousands of items that would benefit from usage instructions, but don't currently have them because there is nowhere to put them (other than the description field). Also, judging by the previous community discussions, it seems that the Wikidata community has no interest in the usefulness of the description field outside of Wikidata itself, thus they have no incentive to try to find a solution. I think WMDE is going to have to lead the way here.

@Lydia_Pintscher: Where would you suggest that we go from here? The community discussion just went in circles with no movement towards consensus. We now know how many items have usage instructions (a few hundred), so that's no longer a blocker. FWIW, I think there are thousands of items that would benefit from usage instructions, but don't currently have them because there is nowhere to put them (other than the description field). Also, judging by the previous community discussions, it seems that the Wikidata community has no interest in the usefulness of the description field outside of Wikidata itself, thus they have no incentive to try to find a solution. I think WMDE is going to have to lead the way here.

I did exactly that and I got a rough consensus for a next step - it is just one that you (plural) don't like. Sorry. For now I am going to wait and see how this develops when the descriptions are actually shown in more places and editable there.

@Lydia_Pintscher: That's great to hear. Any solution is better than what we have now. So what's the next step?

The Wikidata usage instructions property has been approved and is now being used (https://www.wikidata.org/wiki/Property:P2559). Once there is a feature to surface this property when adding new statements, we will then be able to remove the usage instructions from the descriptions.

kaldari renamed this task from Add an instruction field for Wikidata items to Provide another way to surface usage instructions besides tacking them onto descriptions.Jul 12 2016, 5:01 PM

As someone using the wbsearchentities API in another application, I just want to express that I'd very much appreciate this functionality -- the internal usage descriptions on some items are distracting and make them significantly less useful for any kind of integration, as users outside the Wikidata context have no idea what Q or P numbers are. Thanks! :)

This issue is being discussed at https://www.wikidata.org/wiki/Wikidata:Project_chat#User_instructions_in_descriptions_are_harmful after it transpired that Siri had given an answer that a subject is "male - human who is male (use with Property:P21 sex or gender). For groups of males use with subclass of (P279)" (see attached screenshot:

Angel.jpg (937×1 px, 78 KB)
).

I think the way to solve the issue is to use P2559. For this to happen, it seems to be we need:

  • A way for wbsearchentity to return P2559 (which implies, to return any additional data, configurable as P2559 for Wikidata). We may want to use T189744: Add hints parameter to wbsearchentities here or any other way to ask for it from client-side, or configure it server-side.
  • An UI acommodation to display usage instructions. Here some UX specialist help would be useful.

I think the way to solve the issue is to use P2559. For this to happen, it seems to be we need:

  • A way for wbsearchentity to return P2559 (which implies, to return any additional data, configurable as P2559 for Wikidata). We may want to use T189744: Add hints parameter to wbsearchentities here or any other way to ask for it from client-side, or configure it server-side.
  • An UI acommodation to display usage instructions. Here some UX specialist help would be useful.

sounds about right. P2559 statements could then be surfaced in search UIs that are relevant for contributors (e.g. value autocompletion in Wikidata) but omitted in search interfaces relevant to data consumers.

What's happening to finish this task? P2559 has about 1790 uses, which is not an insignificant number, but P2559 is insignificant because it isn't prominently shown.

@Pfps - I think the main blocker at this point is T140131, which is unassigned and hasn't had any comments since 2017.

Couldn't we just use qdw (query data wiki) like how Translatewiki.net uses qqq? It seems like the pretty obvious solution to this to me.. It'll also provide an end-run around T140131.

¯\_(ツ)_/¯

In T97566#6514271, @MJL wrote:

Couldn't we just use qdw (query data wiki) like how Translatewiki.net uses qqq? It seems like the pretty obvious solution to this to me.. It'll also provide an end-run around T140131.

These descriptions are intended to be shown to users, so should be in the user's language whenever possible. That means we need something which allows for multiple translations using the correct language codes.

More and more I run into incorrect information in Wikidata that I think would happen less if there was a good way of presenting usage instructions to users. The most recent example was for music (Q638), where Dmitry had to go in and clean out multiple incorrect subclasses. There is already Wikidata usage instructions (P2559) that can be used to hold these instructions so the only missing part is showing usage instructions more prominently. Couldn't it just be possible to put this property at the top of the displayed list of properties? That's not a great solution but it would be a good start. Even better would be to display more information when the item is used as a value, but that's a more complicated change to the Wikidata user interface.