Page MenuHomePhabricator

Don't capitalize Wikidata descriptions when viewing article
Open, MediumPublic

Description

I've noticed, when using the beta version, the Wikidata description on top of the page starts with a upper-case letter, despite the one of Wikidata starting with a lower-case one. This is rather ugly in some languages, can this be un-forced?

Summary of problems

  1. editors expect how descriptions display to match case when they edit them and this can cause edit confusion
  2. descriptions are inconsistent despite guidelines; wikipedia clients want to be consistent with how they display them
  3. from a design perspective it makes sense to render sentence case in this context as otherwise it will look like an incomplete sentence.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

The community standard is very clearly defined: https://www.wikidata.org/wiki/Help:Description

The standard for entry of the descriptions is quite well defined indeed. That said, rigorously applying something that's quite literally a guideline outside its original context (i.e. display of the descriptions within Wikidata) seems unwise to me.

@Lydia_Pintscher sorry for my ignorance of WikiData norms. I knew there were standards for good descriptions, but took from this thread that capitalization was not covered.

Lest I get back on my hobby horse of auto-generated descriptions...
It seems like what we have here is a conflict of use cases. According to the official guidelines, the purpose of the description is to "disambiguate items with the same or similar labels," which is subtly but crucially different from our use case of "a one-line summary of the subject." I fear that if we're not aligned on these use cases at these early stages, then we'll have deeper issues than capitalization later on.

Given the rationale:

Fair point, so allow me to explain the rationale. This change was made because having a lower case description created a capitalisation inconsistency with the article layout; the article title and first character of the article were capitalised, but the description was not. It looked odd, and created an inconsistent scan line.

I have a question for/need help from language folks (ping @Amire80). Is there any language we can think of where keeping the capitalization will break reading comprehension? Here's an example where you can edit the html with other languages and test it with native speakers http://jsbin.com/xexeqix/edit?html,output

I'm trying to understand if this is really a blocker for showing the descriptions on stable.

I'd also like the perspective/rationale from Design (ping @Nirzar). Design direction and consistency is very important for providing a useful reading experience, given we don't break other important things like language comprehension.

the first letter being lowercase gives a sense of incompleteness in a sentence < like this.

It's difficult to quantify or rationalize this but sentence case has a better sense of human intervention. in branding otherwise, sometimes companies use all lowercase to suggest the "casualness" of the company. that's why facebooks F is lowercase. If you see, sentence case is used in as a standard in English prose and our communication also follows it throughout the product.

thoughts on consistently just keeping the user-provided casing?

I strongly believe we should use "Sentence case" descriptions. but the bigger problem is, CSS doesn't have sentence case as an option. it has Capitalize which makes the first letter of every word capitalised. that's just title case.

overall this is a obvious choice. Communications dept within WMF also uses Sentence case.

as far as I know, cases don't exist in other scripts like devnagari (for hindi, marathi) and if i am not wrong, hebrew according to wikipedia.

Case exists only in Latin, Cyrillic, Greek and Armenian. Also, it is used differently in different Latin-based languages, although sentence case is pretty universal.

@Nirzar Just a note that the current CSS implementation actually sentence cases the sentence by leveraging the :first-letter pseudo-selector so it is actually doing proper sentence casing and not title casing.

@Jhernandez ooo that's excellent. my css knowledge is fading :(

CONVERT TO ALL CAPS .. REALLY THE BEST ;)

I wonder if this leads people to do descriptions starting with caps when doing mobile edits: https://www.wikidata.org/w/index.php?title=Special:RecentChanges&tagfilter=mobile+edit

@Esc3300 I believe mobile edit and mobile app edit are edits from the native Android/iOS apps, and this task is about mobile web, which doesn't have a wikidata description editing functionality or is even rolled out, it only is on beta.

If this is the case, mobile phones have autocorrect enabled by default, so people typing edits on their phones will be submitting sentence cased descriptions because the operating system's are correcting the text to be that way. So that would probably be the reason why most descriptions from mobile apps are capitalized.

hi @Jhernandez @Esc3300 - actually the keyboard has been made to default to lowercase when adding/editing Wikidata descriptions specifically to reduce the incidence of incorrect capitalization. In addition we have included a point in the help text explaining not to capitalize unless the first word is a proper noun.

Currently the ability to edit Wikidata descriptions is in the Android app only, and has now been slowly rolled out to all languages except English, with edits being monitored for quality.

mobile edit also applies if wikidata.org mobile domain is being used. And yes @Esc3300 autocapitalisation does happen on a mobile device. I've experienced it first hand. E.g. https://m.wikidata.org/wiki/Special:SetLabelDescriptionAliases/Q31887667/en anyway this is getting a little off topic.... :)

It feels like this is a won't fix in that I don't see any way to resolve this such that all descriptions are consistent and uppercase without enforcing capitalisation in a wikidata validation layer.

It seems a bit odd that people view a description "Village in Kafkanistan" and then are expected to type "village in Kafkanistan" for the item of the neighboring village

or get reverted when they change "village in Kafkanistan" to "Village in Kafkanistan".

To summarise the problems im hearing here from all sides:

  1. editors expect how descriptions display to match case when they edit them and this can cause edit confusion
  2. descriptions are inconsistent despite guidelines; wikipedia clients want to be consistent with how they display them
  3. from a design perspective it makes sense to render sentence case in this context as otherwise it will look like an incomplete sentence.

I still see this an editing problem. When editing a wikidata description it should guide me to not use sentence case if that is indeed a policy or it should remove any leading uppercase letter . That solves 1 and 2. I liken this problem to code linting. Some developers like to use tabs and some like spaces. The only way you can make consistency happen is invalidating when the rules are broken and enforcing it.

Wikidata is a data store. Just as we wouldn't expect clients to have to render dates mm/dd/yy we shouldn't expect them to have to use case. We should be caring about the content not how it's used. I think #3 is up to the client. Rather than say it's wrong it would be helpful to point out examples where it doesn't work. Right now these seem to be hypothetical and/or rare.

I'm not sure about (2.): obviously there are descriptions with caps, possibly due to Android auto-completion, but the bulk of descriptions at Wikidata are bot generated and are unlikely to have incorrect caps.

Thanks for all the info @RHo!

My guesses definitely don't apply to the Android app, they do apply to edits via mobile web on wikidata.org.

Example of user that incorrectly uses capitalization using the mobile app, probably due to this behaviour: https://www.wikidata.org/wiki/Special:Contributions/Gfk (see older contributions, as they are informed now)

Sigh.

Georgian is a particularly troublesome example. There is a long controversy about the usage of capital letters in its alphabet, and about their technical implementation. Until this is cleared up, let's not mess with user input.

As suggested a few times above, automatic capitalization just shouldn't be applied anywhere. It may make English and Dutch look better, but it isn't necessary. And in some languages it is just harmful. This should be completely removed.

Sigh.

Georgian is a particularly troublesome example. There is a long controversy about the usage of capital letters in its alphabet, and about their technical implementation. Until this is cleared up, let's not mess with user input.

As suggested a few times above, automatic capitalization just shouldn't be applied anywhere. It may make English and Dutch look better, but it isn't necessary. And in some languages it is just harmful. This should be completely removed.

In Georgian, we don't use capitalization. In Georgian grammar there is no such understanding at all.

Change 486102 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/skins/MinervaNeue@master] Do not capitalize wikidata descriptions

https://gerrit.wikimedia.org/r/486102

^ If we were to make this change across the project, this is all that would be needed on Minerva side. Alternatives would limit this styling to latin based languages with a more complicated CSS selector/translateable LESS variable.

Change 486102 merged by jenkins-bot:
[mediawiki/skins/MinervaNeue@master] Do not capitalize wikidata descriptions

https://gerrit.wikimedia.org/r/486102

matej_suchanek changed the task status from Stalled to Open.Jan 26 2019, 1:35 PM
Jdlrobson raised the priority of this task from Low to Medium.Feb 5 2019, 6:55 PM
Jdlrobson removed a project: Patch-For-Review.

The question remains: Do we want to keep capitalisation on languages where it is useful e.g. Deutsch, English, francais.
I'd argue no as it creates tech debt and confusion, but let's make a decision promptly.

The question remains: Do we want to keep capitalisation on languages where it is useful e.g. Deutsch, English, francais.
I'd argue no as it creates tech debt and confusion, but let's make a decision promptly.

I agree.

No capitalization anywhere.

What remains to be decided? How would such a decision be executed?

I agree entirely with your take here:
https://phabricator.wikimedia.org/T131013#3544541

Editing clients shouldn't allow/encourage capitalization (and should in documentation make it clear not to capitalize in languages where that is possible). The normal form should be lowercase for entry and storage. Display is context, language and even user preference dependent, just as with dates. I don't see the need for any further work or decision to be made...

Exposing the real, in some cases “wrong” case would make all sense, if there's a connection on where to edit it.
Let's not forget we're automatically uppercasing every article in article namespace, which has provided some kind of order with the drawback of enforcing non-language correct casing.

Pros for uppercasing first letter in Deutsch, English, français:

  • We would be inline with article namespace handling
  • It provides a more orderly interface in some cases

Cons:

  • There are cases where CSS uppercasing could result in incorrect overwrites, for example in names
  • Hiding from editors leaves the descriptions “incorrect” in source

How are we dealing with uppercase/lowercase mIXinG in entries? Do we leave all of those for editors and don't approach to correct them by software?

I'm a bit confused because I see people expressing agreement with each other, however it is not clear to me what we're agreeing on. So here is another attempt to clarify things:

Clear

  1. Capitalization does not make sense in certain languages (e.g. Georgian). We should never force capitalization of Wikidata descriptions on the front-end in such languages.
    1. This requires a change to the current functionality (which incidentally, and accidentally, slipped through recently but is not live yet).
    2. Any remaining, or new, code that modifies the capitalization of Wikidata descriptions should be language specific (i.e. no more global rules).

Confusing

  1. We want consistency in how we display Wikidata descriptions. Specifically, we want Wikidata descriptions to be displayed with the first letter capitalized in Latin languages.
    1. The guidelines for Wikidata descriptions in English specify that Wikidata descriptions should start with a lowercase letter — "Descriptions begin with a lowercase letter except when uppercase would normally be required or expected". So in other words, what Wikipedia wants (in Latin languages) is inconsistent with what Wikidata recommends.
    2. Even if the Wikidata guideline was changed, it seems fair to assume that there would never be perfect consistency among Wikidata descriptions (which is probably okay, but worth noting).

Open questions

  1. Assuming we fix the issue with languages that are being incorrectly capitalized, does anyone have an issue with capitalizing Wikidata descriptions in Latin languages on the front-end of Wikipedia?
  2. Is anyone familiar with Wikidata's general policy in terms of opinionated vs. unopinionated data? Or, said another way, how do we resolve the tension between:

@Jdlrobson T131013#3544541
editors expect how descriptions display to match case when they edit them and this can cause edit confusion

and

@Jdlrobson T131013#3544541
Wikidata is a data store. Just as we wouldn't expect clients to have to render dates mm/dd/yy we shouldn't expect them to have to use case. We should be caring about the content not how it's used.

I am either misunderstanding this, or there is a tension/contradiction. If an editor expects their Wikidata input to match the output/display on Wikipedia, then the "data store" would indeed be opinionated, specifically towards what Wikipedia wants. This seems like a larger conversation. It also seems like one that is probably ongoing somewhere.

In conclusion
It seems like we then have two options (I'm assuming we fix the obvious issue discussed above with languages like Georgian):

  1. Continue formatting Wikidata descriptions on the front-end of Wikipedia for Latin languages. The benefit of this option is that we retain the level of consistency we have now. The drawback is that Wikipedia does not mirror what is on Wikidata (although it's unclear that this is even desirable).
  1. Start a conversation with Wikidata to see if they are willing to update their guidelines for item descriptions. If they are willing, we can drop the code that forces capitalization in Latin languages. The benefit of this option is that Wikipedia mirrors exactly what is in Wikidata (again, unclear that this is actually desirable). The drawback is for an indeterminate amount of time where we'd have inconsistency in how Wikidata descriptions look on Wikipedia, however theoretically this would eventually sort itself out.

So in other words, what Wikipedia wants (in Latin languages) is inconsistent with what Wikidata recommends.

Sorry if it was already mentioned and I missed it, but where is it written that Wikipedias in the Latin alphabet want this?

@Amire80

So in other words, what Wikipedia wants (in Latin languages) is inconsistent with what Wikidata recommends.

Sorry if it was already mentioned and I missed it, but where is it written that Wikipedias in the Latin alphabet want this?

My apologies for not explaining this statement. The design recommendation is for consistency, specifically capitalization (as stated in the comments). Have there been any concerns raised by people using Wikipedias in the Latin alphabet? I had not seen such concerns (aside from ones that are conflated with the issue of languages that don't have capitalization) so was assuming that what we do currently is agreeable to most folks (with the understanding that there are edge cases).

I am either misunderstanding this, or there is a tension/contradiction. If an editor expects their Wikidata input to match the output/display on Wikipedia, then the "data store" would indeed be opinionated, specifically towards what Wikipedia wants. This seems like a larger conversation. It also seems like one that is probably ongoing somewhere.

Yes, this is the heart of the remaining disagreement. There is a false assumption that the user entering the data will have presentation control across all context (wikipedia, apps, APIs, etc etc) this isn't an accurate intuition and will grow increasingly inaccurate. Wikidata as a central data repository does not nearly imply it is also the determiner of presentation.

In T131013#4929585, @alexhollender wrote:

@Amire80

So in other words, what Wikipedia wants (in Latin languages) is inconsistent with what Wikidata recommends.

Sorry if it was already mentioned and I missed it, but where is it written that Wikipedias in the Latin alphabet want this?

My apologies for not explaining this statement. The design recommendation is for consistency, specifically capitalization (as stated in the comments).

Whose design recommendation?

Have there been any concerns raised by people using Wikipedias in the Latin alphabet? I had not seen such concerns (aside from ones that are conflated with the issue of languages that don't have capitalization) so was assuming that what we do currently is agreeable to most folks (with the understanding that there are edge cases).

It's a rather anecdotal statement, but I'll make it anyway: In the larger languages written in the Latin alphabet, there's an overlap between people who complain about bugs and people who strongly prefer to use the desktop site (often even on their smartphones). My guess is that in these larger languages written in the Latin alphabet the experienced editors don't care very much if it's lowercase or uppercase.

I'm quite annoyed by people who use lowercase and uppercase letters incorrectly in sentences and personal names, but in this case, I see no problem with using a small letter. The description is not really a sentence.

In T131013#4929436, @alexhollender wrote:

Open questions

  1. Assuming we fix the issue with languages that are being incorrectly capitalized, does anyone have an issue with capitalizing Wikidata descriptions in Latin languages on the front-end of Wikipedia?

Yes because as things are right now edits made to the descriptions from Wikipedia are wrong because of this. People are under the mistaken assumption (because of how it is displayed before editing) that they should always capitalize a description.

  1. Is anyone familiar with Wikidata's general policy in terms of opinionated vs. unopinionated data? Or, said another way, how do we resolve the tension between:

@Jdlrobson T131013#3544541
editors expect how descriptions display to match case when they edit them and this can cause edit confusion

and

@Jdlrobson T131013#3544541
Wikidata is a data store. Just as we wouldn't expect clients to have to render dates mm/dd/yy we shouldn't expect them to have to use case. We should be caring about the content not how it's used.

I am either misunderstanding this, or there is a tension/contradiction. If an editor expects their Wikidata input to match the output/display on Wikipedia, then the "data store" would indeed be opinionated, specifically towards what Wikipedia wants. This seems like a larger conversation. It also seems like one that is probably ongoing somewhere.

There is a difference between storage and display here. Wikidata generally doesn't care how you display the data it provides you. You can do calendar model conversion and more. The issue starts when editing. The current transformation only works in one direction.

In conclusion
It seems like we then have two options (I'm assuming we fix the obvious issue discussed above with languages like Georgian):

  1. Continue formatting Wikidata descriptions on the front-end of Wikipedia for Latin languages. The benefit of this option is that we retain the level of consistency we have now. The drawback is that Wikipedia does not mirror what is on Wikidata (although it's unclear that this is even desirable).

If it was only display we wouldn't care. If we could make users not associate the way it is displayed with the way they put it in we wouldn't care.

  1. Start a conversation with Wikidata to see if they are willing to update their guidelines for item descriptions. If they are willing, we can drop the code that forces capitalization in Latin languages. The benefit of this option is that Wikipedia mirrors exactly what is in Wikidata (again, unclear that this is actually desirable). The drawback is for an indeterminate amount of time where we'd have inconsistency in how Wikidata descriptions look on Wikipedia, however theoretically this would eventually sort itself out.

The policy change is extremely unlikely.

In general: If you're just worried about some mistakes in the data then please please don't hide them. Show them. Expose them to people. Otherwise they'll not get fixed and everyone who doesn't implement your workaround is still exposing their users to the mistakes. We need to make the data quality better for everyone.

As one of the people behind capitalising the descriptions in the first place, I (reluctantly) agree that they shouldn't be capitalised any more. I stand by the original decision, and think it was the correct decision at the time, but circumstances have changed.

The statement of the problem is that capitalising descriptions encourages new editors to write descriptions that are capitalised, but Wikidata policy says that descriptions shouldn't be capitalised (with limited exceptions for descriptions beginning with proper nouns, etc.)

Back when descriptions were first added, description editing in the apps wasn't even something that had seriously crossed our minds, so capitalising them kept scan lines consistent, normalised the display, and so on; see T131013#2289870 for @Nirzar's comprehensive explanation of the benefits. I stand by that decision. But, now, description editing is a serious proposition. If we're going to take description editing seriously, then having all the descriptions capitalised is going to push people in the direction of capitalising any descriptions they write. As a movement we already struggle with onboarding new editors due to overly complex policies and unrealistically high standards—there's an entire programme in the annual plan dedicated to new editors for a reason—and we're only going to make it worse by intentionally setting up people to fail by subtly suggesting they should do things one way when they really the policies say the other. It's also important for adoption of description editing more generally, as having serious acceptance of client-side editing of descriptions by Wikidatans is unlikely to happen if app users keep (unintentionally) violating site policy when writing descriptions.

The way I see it, there's a few ways to solve this problem:

  1. Change Wikidata's policy so that descriptions should be capitalised.
    1. Attempts to make even less drastic changes to the description policy have failed in the past, so this is unlikely to happen.
  2. Keep the descriptions capitalised, and change the editing experience to tell people that they shouldn't capitalise descriptions.
    1. This gets confusing really fast, and people aren't likely to really understand. ("They're capitalised, but don't capitalise it yourself, we'll capitalise it for you, until you try editing it again in which case we won't, but we'll do it again afterwards...")
  3. Display descriptions as-is.
    1. Creates inconsistent scan lines, leads to a sense of incompleteness when reading descriptions, etc.
  4. Stop encouraging people to edit descriptions.
    1. Violates the "anyone can edit" product principle.

From my perspective, the third solution is really the only viable one.

P.S. The mobile apps were the first product to use Wikidata descriptions below article titles and in search results, and I was the product owner of the apps at the time, so I guess you could say that all the blame for this rests on my shoulders. ;-)

Yes because as things are right now edits made to the descriptions from Wikipedia are wrong because of this. People are under the mistaken assumption (because of how it is displayed before editing) that they should always capitalize a description

Is there actual evidence of this? In the past there has been a strong bias against mobile based editors and claims of "bad" edits seem to be anecdotal.

Again, I don't agree that data entry and data display must align in order for editors to "do the right thing". There are other solutions than making the UX for readers and expectations of how subtitles work in a language/culture a global demand based on an anecdotal sense of data purity.

so I guess you could say that all the blame for this rests on my shoulders.

Not even remotely Dan. Use of descriptions as subtitles is widespread and subtitles in latin languages are generally capitalized. I don't think a signle designer or product manager (or non-Wikidatan) has ever suggested these be lower case in EVERY display context globally across all prodcts. Again, there is no need for such an arbitrary policy. So this is on the shoulders of more than a dozen professional user experience designers and product managers.

Yesterday, offline, I mentioned to @Jdlrobson that I think we are discussing too many different things in this task simultaneously, and we should break the conversations out into separate tasks. His response was, ironically, that originally these were separate discussions and we combined them into one. However at this point I wonder if it's worth considering separate tasks again?

Conversation 1) Don't capitalize because some languages have no notion of capitalization (@Amire80 & others)
-thankfully this is somewhat easy to fix

Conversation 2) Don't capitalize because descriptions/subtitles aren't sentences (@Amire80, @Sjoerddebruin)
-this is up for debate. I think of the Wikidata description as a subtitle, and even though subtitles aren't sentences, they are typically capitalized. @Amire80 the opinion to capitalize seems to be supported by some web design and product members: T131013#2244140, T131013#2289870, T131013#4932344, (and myself).

Conversation 3) Don't capitalize because there are some edge-cases for which capitalization might cause awkwardness (@Jdlrobson, @Volker_E )
-this seems rare enough that maybe we can maybe defer?

Conversation 4) Don't capitalize because Wikipedia editors will get confused and think (incorrectly) that when editing Wikidata (via the Wikipedia apps) they should capitalize the descriptions (and in general will be confused about the relationship between the two) (@Lydia_Pintscher, @JMinor)
-It seems there is consensus that data storage (Wikidata) and presentation of content (Wikipedia) should be de-coupled. However while there continue to be interfaces that allow for editing Wikidata off-Wikidata there needs to be additional work to clarify this relationship. @Deskana, @JMinor and @Jdlrobson have proposed recommendations here.

This comment relates to Conversation 4 (see T131013#4933393 for a key)

@Deskana I wonder if there's a variation of the option 2 you proposed:

  1. Keep the descriptions capitalised, and change the editing experience to tell people that they shouldn't capitalise descriptions.
    1. This gets confusing really fast, and people aren't likely to really understand. ("They're capitalised, but don't capitalise it yourself, we'll capitalise it for you, until you try editing it again in which case we won't, but we'll do it again afterwards...")

What if we allowed people to edit descriptions, but instead of telling them not to capitalize we just automatically forced the first letter to be lowercase when we stored it in Wikidata? Of course this would be obscuring things from editors, however maybe it is the best compromise. The end result would be what's desired for both services: descriptions wouldn't be capitalized in Wikidata, and they would be in Wikipedia.

In T131013#4933409, @alexhollender wrote:

What if we allowed people to edit descriptions, but instead of telling them not to capitalize we just automatically forced the first letter to be lowercase when we stored it in Wikidata? Of course this would be obscuring things from editors, however maybe it is the best compromise. The end result would be what's desired for both services: descriptions wouldn't be capitalized in Wikidata, and they would be in Wikipedia.

The problem with this is that there's a sizeable number of descriptions that are capitalised at the start, such as almost any description about a person or collection of people:

So, lower casing by default would be putting the shoe on the other foot, because it'd make some better but make others worse.

I personally don't think this is a huge issue in principle, as it's the wiki way that people make good faith contributions with errors in and that others fix them, but others disagree...