Page MenuHomePhabricator

Provide further options than only binary gender
Closed, DeclinedPublic

Description

While I'm aware changing language is not an easy thing to do, saying that we can do nothing because it is too hard, or waiting for someone else to change first is insufficient in my mind. Mediawiki software currently allows gender selection in order for the software to address users or refer to their actions to other users in a grammatically (gendered) way.

An example would be "She edits her talk page" "Jared asked you to help with his article draft"

Since some languages already have 3 grammatical genders this should be a relatively small change for those languages, for languages where the neuter form is the same as masculine the 3rd options may end up referring to the user as such even in the case where that is incorrect, and we'll have to wait for language to catch up. For english I would propose we use (They/Their/Their)(She/Her/Hers)(He/Him/His) as the pronouns when referring to users actions and Other/Female/Male as system classification only (not shown to the user)

Further reading
https://en.wikipedia.org/wiki/T-V_distinction
https://translatewiki.net/wiki/Gender
https://en.wikipedia.org/wiki/Gender-specific_and_gender-neutral_pronouns
https://www.wikidata.org/wiki/Q1189745


Version: 1.23.0
Severity: enhancement
See Also:
T55834: Pronoun selection has gender bias
T35343: Binary Gender
T29744: Add the neuter gender

Details

Reference
bz59643

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@Jackmcbarn

Thats basically it.

When that options is chosen we'd use a gender neutral pronoun to talk about the users actions on site, where possible depending on the UI language.

Change 106179 had a related patch set uploaded by Jackmcbarn:
Use gender-neutral wording in Special:Preferences

https://gerrit.wikimedia.org/r/106179

«When selecting gender in Special:Preferences, use "They edit wiki pages"
instead of "I prefer not to say", to be more in line with the "He edits wiki
pages" and "She edits wiki pages" options.»
It's not "more in line", it doesn't make any sense because it has no relevance to the question. "They" is not a description of *me*.

I fail to see how this construction works in any other language than English (at least in those 3 other languages that I know), and I'm not convinced adding a "Translators: Come up with something that works work in your language" comment helps avoiding really weird constructions in other languages.
Do certain languages have an option to disable this new option by default when it's untranslatable, or would they have to suffer from working around a linguistic problem in one language (which is the default language in MediaWiki)?

How do other languages handle users who pick the gender-neutral preference currently?

If you take a look at https://en.wikipedia.org/wiki/Gender-specific_and_gender-neutral_pronouns there are many languages that seem to have the concept of a gender neautral pronoun, not all of course but many.

While some would argue that "they" is not grammatically correct it is becoming so for people who need a singular gender neutral pronoun.

you can read more here
http://www.princeton.edu/~achaney/tmve/wiki100k/docs/Singular_they.html
http://articles.latimes.com/2007/feb/19/opinion/oe-yagoda19
https://en.wikipedia.org/wiki/Singular_they

Swedish wikipedia should have no issues
http://www.care2.com/causes/sweden-adopts-a-gender-neutral-pronoun.html

Implementing a 3rd or 4th gender (robot?), would actually be fairly trivial (with the exception of one issue).

All it requires is:

  1. Adding extra conditions to Language::gender(), or if you want to do it for English only, LanguageEn::gender().
  2. Adding extra options to $defaultPreferences['gender'] in Preferences::profilePreferences(). If you wanted the options to be language dependent, you could add a hook there and a hook handler to the language subclasses.

The only tricky part is figuring out what the proper order of the values should be in the i18n messages. Should 'unknown' remain the 3rd option or should it always be last? Does this need to be consistent across languages? It would be good to get the opinions of the i18n engineers on this, especially the ones that work closely with the translatewiki community.

(In reply to Ryan Kaldari from comment #14)

The only tricky part is figuring out what the proper order of the values
should be in the i18n messages.

Which first requires showing linguistic evidence of what grammatical consequences the option/category in question would have, in order to define what {{GENDER}} would need to show, then at what conditions, then with what syntax, and only finally with what code...

Which first requires showing linguistic evidence of what grammatical
consequences the option/category in question would have, in order to
define what {{GENDER}} would need to show, then at what conditions, then
with what syntax, and only finally with what code...

None of the genders that are made available by Language::gender are required to be used. They are just options that are available to {{GENDER}} if needed. If we left the order of the existing values intact, we could launch new genders without changing any i18n messages.

The current behavior of Language::gender is rather awkward though (as the default changes based on the number of parameters). If 1 or 2 parameters are specified, the first parameter is the default. If 3 parameters are specified, the 3rd parameter becomes the default. It would be much more intuitive if the first parameter was always the default:
{{GENDER|unknown/generic masculine|specified feminine|specified masculine|etc.}}

I'm just curious if the i18n engineers would be open to fixing this in the process.

(In reply to Ryan Kaldari from comment #16)

I'm just curious if the i18n engineers would be open to fixing this in the
process.

That's not a bug, it's a feature. It's certainly not going to change, especially as it's shared by PLURAL. https://translatewiki.net/wiki/Plural#Plural_syntax_in_MediaWiki

Ah, I see what you mean. I still think it makes a lot more sense to have the default be the first parameter for gender, regardless of how it works for PLURAL (since gender is most commonly unknown). As it works now, the meaning of the first parameter actually changes based on the number of parameters (According to https://translatewiki.net/wiki/Gender), which seems rather confusing. It also doesn't actually match the behavior of PLURAL, as the last parameter is not always the default in GENDER.

Regardless, it probably doesn't make sense to discuss that here since it's a more complicated issue and should probably be discussed in a separate bug.

The only thing we really need to figure out for this bug is whether 'unknown' should be the 3rd parameter or the last parameter.

Show me evidence that this is not just a gender choice, but an actual software necessity (e.g. grammatical distinction in the User namespace) and it might be more easily resolved. Otherwise it would just add additional burden to the translation teams without a neuter form, so calling this WONTFIX again.

(In reply to Jared Zimmerman (WMF) from comment #6)

@Jackmcbarn If i ask you if you want pizza, a burger, or salad, and give you
the options
A. Prefer not to say
B. Salad
C. Pizza
does A map to a burger? many people WOULD prefer to say, but we don't give
them an option to do so.

That seems to be a separate issue that should be split off to another bug ticket. My proposed wording would simply be to say "None of the (above|below)", which is impartial enough for our purposes and avoids offense.

Change 182994 had a related patch set uploaded (by Kaldari):
Add capability for a neutral gender option in preferences

https://gerrit.wikimedia.org/r/182994

Patch-For-Review

Change 182994 had a related patch set uploaded (by Kaldari):
Add capability for a neutral gender option in preferences

  1. This is half baked: a revolution in meaning of GENDER can't fit the current formulation of the preference. Also, when GENDER is used, we're *always* talking of a specific person, not of an indeterminate antecedent. The supporters of the patch *seem* to miss this distinction, which is the cornerstone of the grammatical issue here.
  2. Show me a source which calls the singular they "neuter" please. Neuter existed in Old English, e.g. "land" was neuter: «sē cyning ("king"–masculine); sēo lufu ("love"–feminine); þæt land ("land"–neuter)» (Oxford English Grammar, 12th edition, 4.9). Is this what you're talking about? If you call it neuter, this causes severe i18n issues because in other languages this grammatical category is used for inanimated objects and/or animals. The addition of neuter has never been asked on this bug, it was asked at T29744 (to correctly talk of bot accounts).
  3. Such a change can't be merged without checking all the usages of GENDER in core and extensions, at least in the English locale, to verify they are consistent with the proposed grammatical framework.
  1. The use case for this feature is not well defined. I'm not able to judge the Swedish case (cc Lokal_Profil), but I can comment the other one: «specialized 3rd party wikis such as Gender Wiki (http://gender.wikia.com) where a strictly binary choice may not be appropriate». For this case, I propose a configuration setting to disable GENDER entirely: the preference would be removed from Special:Preferences and the system would always output "unknown/undefined" as gender for the person. This would remove (binary) (grammatical; I don't know about natural/lexical) gender entirely from the wiki, without causing grammatical issues as the wiki in question is monolingual.
  1. "a revolution in meaning of GENDER can't fit the current formulation of the preference." I'm not sure what you mean here. The preference is simply to choose your preferred grammatical gender. Adding a neutral option seems completely appropriate here (and matches what other sites are doing). Whether the antecedent is determined or indetermined makes no difference. The grammar, in English, is identical in both cases. For other languages, translators will use whatever grammar is appropriate for their language. Can you explain why you believe this is an issue?
  2. Neuter just means classified as neither male nor female[1][2] it has nothing to do with animate vs. inanimate or animal vs. human. It is just a more technical term than 'neutral': "Third person pronouns have genders (masculine and feminine) and a neuter category. The gender pronouns are clear; the neuter pronouns are 'they,' 'them' and 'their.'"[2] The QQQ message specifically says to use a pronoun appropriate for a human. That's also why the example uses 'They' instead of 'it'.
  3. I already checked them with a regex. All existing messages (in English at least) are compatible.
  4. The use case is simple: To allow people to set their grammatical gender to a neutral form. I know several people personally who prefer to go by "them" instead of "he" or "she". It's not that uncommon, especially online.[4][5]
  1. http://www.merriam-webster.com/dictionary/neuter
  2. http://www.thefreedictionary.com/neuter
  3. http://www.write.com/writing-guides/general-writing/grammar/third-person-pronouns/
  4. http://theyismypronoun.tumblr.com/
  5. http://wapo.st/1pQR76t

I don't think a fourth gender option should be added. I prefer Withoutaname's solution of combining "other/none/prefer not to say" all into the same option (since they all require gender-neutral pronouns anyway).

Can we please focus the discussion here on the technical issue at hand rather than any personal issues that people may have with the concept of allowing users to choose gender neutral pronouns. According to what @kaldari has posted, this change will not pose any technical challenges, and the translation messages are clear.

As kaldari mentioned this is not groundbreaking in any way see these two examples from other common site and their use of they/them/their

@Jackmcbarn: In most gendered languages, neutral is not the default. Typically, masculine gender is the 'natural gender'. Thus 'unknown' and 'neutral' will not always map to the same grammar.

@Jackmcbarn: In most gendered languages, neutral is not the default. Typically, masculine gender is the 'natural gender'. Thus 'unknown' and 'neutral' will not always map to the same grammar.

In those languages neutral is not the default, because there is no such option available in the language. It's just a matter of whether third form is provided or not. Please show me the case where the translation for "unknown" would be different from "neutral".

As kaldari mentioned this is not groundbreaking in any way see these two examples from other common site and their use of they/them/their

As far as I can see from the screenshots, they only have three options, where the name of one option differs from ours. How is this an argument for us to add four options, two of which are pretty much identical?

  1. The use case for this feature is not well defined. I'm not able to judge the Swedish case (cc Lokal_Profil),...

The Swedish case is equivalent to "they" although it makes it more obvious that not specifying gender was an active choice (since it has no alternative meaning). It's the same as the form which should be used if gender is unspecified.

@Lokal_Profil: Would the choice in Swedish be 'hen' for both unknown and gender-neutral? Just want to make sure we are talking about the same case.

Yes, in most languages the unknown and gender neutral would be the same, I think one of the main points is to allow users to actively express a preference, while the output may be the same the intention behind the two choices is different.

@Jaredzimmerman-WMF: It is certainly possible to have any number of options available and map them to the same grammars. I just want to make sure there are no cases where unknown and gender-neutral would require separate translations. I'll see if I can do a bit more research in it.

@kaldari: Yes you would use 'hen' for both cases.

@Nikerabbit: I discussed the issue with Gabriel Wicke who is a native German speaker. German, as you may know, is a strongly gendered language and typically does not offer neuter word forms (especially in relation to people). It is also a language in which the masculine gender is always the default. If you look at de.json, 100% of the 'unknown' translations are masculine. I asked Gabriel how you would handle grammar in German for someone who explicitly identifies as gender neutral. He said that there is no elegant solution, but you would probably either use slashes (like 'eine/n Informatiker/in') or rewrite the sentence. According to https://en.wikipedia.org/wiki/Gender_neutrality_in_languages_with_grammatical_gender#German and http://www.theguardian.com/world/2014/mar/24/germans-get-tongues-around-gender-neutral-language, there are other strategies as well. Leaving them with the default masculine form doesn't seem like a great solution.

@kaldari: We don't need a patch for that. German translators can already provide the third form for the gender magic word with an approach of their choosing.

Perhaps I'm misunderstanding these but it appears that their would be a non-masculine neutral form according to those sources, then again I'm not a native speaker… @kaldari did you ask Gabriel about those specifically?

@Nikerabbit: The third form (unknown) in German is likely always going to be masculine as that is the default gender grammar in German. Using a masculine gender, however, is not very appropriate for someone who identifies as gender-neutral. Here's an actual example from de.json:
"login-userblocked": "{{GENDER:$1|Dieser Benutzer|Diese Benutzerin|Dieser Benutzer}} ist gesperrt..."
If we had a 4th option, the i18n message could be:
"login-userblocked": "{{GENDER:$1|Dieser Benutzer|Diese Benutzerin|Dieser Benutzer|Diese/r Benutzer/in}} ist gesperrt..."

Perhaps this is a rare case, but it does seem to be a valid example. And since the fourth form would always be optional, I don't really see how it would make the translators jobs any more difficult. There would rarely be a need for them to use it.

If you still strongly oppose having a 4th gender grammar, what is your opinion of having a 4th preference option (for some wikis) that maps to the 3rd ('unknown') gender grammar? It sounds like that would work for Swedish at least (if not German). Even if we were to combine the grammars, I agree with Jared that it makes sense to have separate pref options as they convey very different information about the user and could potentially have different implications (outside of grammar). For example, if someone ever wanted to analyze the gender-breakdown of editors who edit trans-related articles, 60% unknown is very different than 60% gender-neutral.

Re-purposing the 'prefer not to say' pref option to be a catch-all would be my last choice, personally.

"login-userblocked": "{{GENDER:$1|Dieser Benutzer|Diese Benutzerin|Dieser Benutzer}} ist gesperrt..."

This is not how GENDER was meant to be used. If the third form is the same as the first, it should be left out. Or in this case the German translators could replace it with Diese/r Benutzer/in if they want to.

I agree with Jared that it makes sense to have separate pref options as they convey very different information about the user and could potentially have different implications (outside of grammar). For example, if someone ever wanted to analyze the gender-breakdown of editors who edit trans-related articles, 60% unknown is very different than 60% gender-neutral.

I do not like the mixing of grammatical gender with gender identity. If you want anywhere near sensible gender numbers there should be separate preference for that.

For adding fourth option in the current preference mapping to third form, I would probably give -1 or not vote since I don't feel having similar authority over user preferences as I do over i18n.

Here is summary of the arguments I have noted and my opinions on those:

  • Use gender neutral forms when possible
    • We already have that, this would just be "please try harder using gender neutral language for me"
  • Gender statistics and other implications outside GENDER in translations
    • Use a dedicated preference or survey
  • Give user feeling they have more choice/control
    • I don't see how "asking users if they like pizza, and then giving pizza anyway" is beneficial

I was negative towards this, but I think Kaldari convinced me that his solution is at least not worse than status quo :). (By the way, Polish grammar works very similar to German in this case.)

@Nikerabbit: What would be your suggested course of action for addressing the problem that some users don't want to be referred to with gendered (masculine or feminine) grammar. The current system has two problems: It isn't clear to the user that 'prefer not to say' means gender-neutral; It isn't clear to the translators that 'unknown' should be gender-neutral.

It isn't clear to the translators that 'unknown' should be gender-neutral.

'unknown' isn't even really gender-neutral, more like "masculine or feminine, but not specified which" – and that's in fact good, because while English has the extremely convenient singular 'they' pronoun, other languages might not. In Polish, you can say "His article" or "Her article" or "Her/His article", but you can't say "Their article" (when this is about just one person) – the only equivalent would be "The person's article", which is awkward and definitely shouldn't be used when talking about people who just haven't specified how they want to be referred to.

@matmarex: I pretty much agree with that. Pushing gender-neutral grammar as the default is going to be awkward in some languages. Allowing translators to set separate 'unknown' and 'neutral' grammars when needed seems like the most sensible solution to me. I'm interested to know if Nikerabbit has any other ideas or suggestions though.

I proposed earlier amending (I prefer not to say) with text like When addressing you, the software will use gender neutral words whenever possible. We can revise our documentation for translators to see how to make it clearer to them.

Change 182994 abandoned by Kaldari:
Add capability for a neutral gender option in preferences

Reason:
no consensus

https://gerrit.wikimedia.org/r/182994

Change 106179 had a related patch set uploaded (by Kaldari):
Clarify that gender-unknown option is gender neutral

https://gerrit.wikimedia.org/r/106179

I rebooted Jack's implementation (that doesn't introduce any new gender parameters), but using Nikerabbit's suggested wording. I'm not sure it provides a real resolution to this bug, but it at least clarifies that gender-unknown should be gender neutral.

Change 106179 merged by jenkins-bot:
Clarify that gender-unknown option is gender neutral

https://gerrit.wikimedia.org/r/106179

The current system, with a choice between masculine, feminine, and unknown is inadequate for most languages. It shows a confusion between natural gender and grammatical gender. As I understand it, WikiMedia projects have no interest in their users' natural gender – that is what commercial data miners would have. We only care about grammatical gender.

Since grammatical gender highly depends on the individual language, there can be no one-size-fits-it-all solution for all languages. I think the only way to go is having a variable number of options depending on the language.

There is a wide variety of ways how languages treat grammatical gender. The category is somewhat fuzzy because it blends over into other categories such as honorifics. The following list shows of different treatments of grammatical gender in various languages. I am afraid that this list has a European bias due to my own knowledge. Some less biased information can be found online in the WALS (e.g. Number of Genders or Chapter Sex-based and Non-sex-based Gender Systems). The WALS information is however mainly about grammatical gender as a property of nouns, and not about grammatical gender in referring to persons/users (but see Chapter Politeness Distinctions in Pronouns).

  • There are languages that have no grammatical gender whatsoever, e.g. Turkish.
  • There are languages that have a grammatical animate-inanimate distinction, e.g. Finnish.
  • There are languages that have a two-way grammatical gender distinction, e.g. French.
  • There are languages that have a two-way grammatical gender distinction and an additional inanimate grammatical gender, e.g. English
  • There are languages that have a three-way grammatical gender distinction, e.g. Alemannic.
  • There are languages that have even more complex grammatical systems, e.g. Japanese.
  • Among the languages that have at least two grammatical genders, there are languages where the grammatical gender used for referring to a certain person can depend on the part of speech, e.g. in Kölsch or Alemannic where there are persons whose name appears in the neuter grammatical gender while pronouns that refer to the same persons appear in the feminine grammatical gender. In practical terms, this would just be another option.

Languages like Turkish that have no grammatical gender are covered, but in the Wikipedia preferences, users are being confronted with an entirely superfluous question about their natural gender.

Languages like Finnish or English that have an inanimate grammatical gender (in English: it), are not properly covered because the inanimate option cannot be chosen. It would be the most adequate choice for referring to bots, and I am certain that the there are users that would choose this option if it were provided (there are thousands of user names that start with “The” – some of these would certainly prefer not to be referred to as he, she, or they, but as it).

Languages like Alemannic with a three-way distinction are not covered. In Alemannic, some people (mostly men) are referred to in the masculine grammatical gender, some people (mostly women) in the feminine grammatical gender, and some people in the neuter grammatical gender. There are some correlations (by region, by natural gender, by age or by the form of the name), but it is ultimately an individual choice. You will easily find sources for this in the dialect grammars, e.g. in: Werner Marti (1985): Berndeutsch-Grammatik. Bern: Francke, p. 81.

More complex languages are not covered, obviously.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 4 2015, 12:34 PM
Nemo_bis added a comment.EditedSep 5 2015, 7:44 AM

inadequate for most languages. [...] Some less biased information can be found online in the WALS

This source is very interesting but quite useless for our purposes: it includes a lot of obscure languages but misses e.g. Italian, Polish and Portuguese. Anyway, it doesn't agree with your statement: they have 195 languages which can fit in our binary GENDER and only 62 which have more than 2.

there are thousands of user names that start with “The” – some of these would certainly prefer not to be referred to as he, she, or they, but as it

In my experience they don't, do you know any example?

Very valid points are raised in this conversation. One important point seems to be missing: while the gender selection is for the benefit of the person, it mostly affects other users.

In other words, the language of the person selecting the preference is X. The languages of the people affected by this selection is the set of all languages.

The more multilingual the wiki is, the more irrelevant is the X. Even in the Finnish Wikipedia there are many users using the interface in some other language than Finnish.

I have been thinking this a little while creating a PHP i18n library based on MediaWiki, but better (of course).

I think the current three way distinction is a good base for all languages. On top of that we can have language specific specializations with more distinctions. In addition we need mapping from X's distinction to the three-way distinction (for other languages) and from the three-way distinction to X's (people choosing gender using other languages).

Now, I wonder if someone can come up with a usable hierarchical presentation of gender distinctions for languages of the world. This would help to use same underlying identifiers for compatible the gender options, making the mappings detailed above lose less information.

PS: In Finnish the animate-inanimate distinction manifests in only the third person singular pronoun (hän/se). In all of the MediaWiki messages I do not remember a single case where these pronouns were used in relation to users.

This source is very interesting but quite useless for our purposes

I said that the WALS data does not really match our needs. I only referred to it in order to counteract the English-bias of this discussion.

Anyway, it doesn't agree with your statement: they have 195 languages which can fit in our binary GENDER and only 62 which have more than 2.

My point was that the 145 languages without any grammatical gender do not fit our binary distinction either. In these languages, users are bugged with a pointless question about their natural gender that is worthy of a commercial data miner.

But even systems with a two-way distinction can be incompatible with the current implementation. Consider the case of the German localization. Like English, it is a language that really has a three-way grammatical gender distinction but only uses two of them in the localizations. However, unlike English, it does not use gender-unknown localizations. Instead, it defaults to the masculine grammatical gender. This means users are bugged with a pointless and highly confusing three-way choice when there are really only two options.

there are thousands of user names that start with “The” – some of these would certainly prefer not to be referred to as he, she, or they, but as it

In my experience they don't, do you know any example?

In my experience, users do not have the possibility of choosing the form it because the software does not allow for it. And don't you think it would be more adequate for bots than he or she? Anyway, the case for including a language's three-way grammatical gender distinction is not as clear for English as it is for other languages. The WALS discusses several languages where three different grammatical genders may be used when referring to people, including varieties of Polish (see Chapter Sex-based and Non-sex-based Gender Systems § Variety in sex-based systems).

I think the current three way distinction is a good base for all languages.

I disagree. The current three-way choice is only good for a language like English where the localization really provides three options. This is the typical English-bias of computing. For languages where the localization provides less than three options, for instance German, the current three-way choice is only confusing.

On top of that we can have language specific specializations with more distinctions. In addition we need mapping from X's distinction to the three-way distinction (for other languages) and from the three-way distinction to X's (people choosing gender using other languages).

This is not feasible. Please excuse my constant references to the language I know best: In Alemannic, the neuter grammatical gender can be used for men in some varieties, and for women in other varieties. This means you cannot map the neuter grammatical gender of Alemannic to either the male or the female grammatical gender of English (unless you include a library of Alemannic names).

I think the preference for grammatical gender should be set separately for each language. When switching to a new language, it should fall back to gender-unknown. It is a little bit like this on Wikipedia, where the setting appears not to be global but can be set differently for each project (though of course it persists from language to language within the same project).

PS: In Finnish the animate-inanimate distinction manifests in only the third person singular pronoun (hän/se). In all of the MediaWiki messages I do not remember a single case where these pronouns were used in relation to users.

What if the user is not a person, but a bot?

I think the current three way distinction is a good base for all languages.

I disagree. The current three-way choice is only good for a language like English where the localization really provides three options. This is the typical English-bias of computing. For languages where the localization provides less than three options, for instance German, the current three-way choice is only confusing.

You ignored my first point, so I will ignore your last point.

My justification for the current set is that it should be reasonable sensible even for speakers of languages with no grammatical distinction. We might want to grow it to four by adding inanimate.

This is not feasible. Please excuse my constant references to the language I know best: In Alemannic, the neuter grammatical gender can be used for men in some varieties, and for women in other varieties. This means you cannot map the neuter grammatical gender of Alemannic to either the male or the female grammatical gender of English (unless you include a library of Alemannic names).

This is something like an implementation detail which can likely be solved.

I think the preference for grammatical gender should be set separately for each language. When switching to a new language, it should fall back to gender-unknown. It is a little bit like this on Wikipedia, where the setting appears not to be global but can be set differently for each project (though of course it persists from language to language within the same project).

Having user to define this separately for every language is definitely a no-go from usability point of view. I do not think your suggestion is reasonable, because we have big list of common languages where it is very easy to map he/she between languages. Your suggestion would make the feature almost useless for those languages in favor of making it perfect for smaller set of languages.

PS: In Finnish the animate-inanimate distinction manifests in only the third person singular pronoun (hän/se). In all of the MediaWiki messages I do not remember a single case where these pronouns were used in relation to users.

What if the user is not a person, but a bot?

My point was that the pronouns hän/se do not appear in interface messages.

My justification for the current set is that it should be reasonable sensible even for speakers of languages with no grammatical distinction. We might want to grow it to four by adding inanimate.

I think it is very bad from a usability point of view if a user is offered choices that do not have any effect in the current language.

Consider, for instance, the German localization: How can you explain that the options _gender-unknown_ and _gender-male_ make no difference in the German localization, but they may make a difference in another localization, and also the information will be public, but at the same time it is optional? And to make things even worse, the message https://translatewiki.net/wiki/MediaWiki:Gender-unknown/de has suggested for many years that users could hide their gender even though the setting _gender-unknown_ had really no effect at all – a totally false promise caused by direct translation of the English message.

In an ideal implementation, I would envision that the user can choose a language, and afterwards, when the UI is updated to show the new language, the number of grammatical gender options would change according to that language. For Finnish, there would be no options, for German, there would be two options, and for English, there would be three options.

I think the preference for grammatical gender should be set separately for each language. When switching to a new language, it should fall back to gender-unknown. It is a little bit like this on Wikipedia, where the setting appears not to be global but can be set differently for each project (though of course it persists from language to language within the same project).

Having user to define this separately for every language is definitely a no-go from usability point of view. I do not think your suggestion is reasonable, because we have big list of common languages where it is very easy to map he/she between languages. Your suggestion would make the feature almost useless for those languages in favor of making it perfect for smaller set of languages.

You are right, we undeniably have a strong indoeuropean bias, so we can easily keep basic settings between languages. Maybe _default_ (gender-unknown), _feminine_, _masculine_, and possibly _neuter_. Though _neuter_ might just belong to _other1_, _other2_, _other3_, etc. which are not mapped between languages. When you switch to a language where your selected setting is not available, you will get the default gender for that language.

For example, assume you start in the Finnish localization. It offers no choices at all. Underlyingly, you have the setting _default_, but there is no UI for changing it (maybe there is an indication that the current language has no grammatical gender settings). Then you switch to the English localization. Now the UI shows three choices, _default_ (which displays as gender-unknown), _feminine_ and _masculine_. There, you select _masculine_. Then you switch to the German localization. Now, the UI shows two choices, _default_ (which displays as masculine) and _feminine_. There, you select _feminine_. Then you select _default_. Now the UI looks the same as when you first switched to the German localization (masculine), but underlyingly, you have selected a different setting (_default_, not _masculine_). You cannot select the setting _masculine_ again unless you switch back to English.

Alternatively, the German localization would show the two choices _masculine_ and _feminine_. While users would start with the same _default_ setting as in all localizations, the UI would display as if _masculine_ were chosen due to a hierarchy in the settings. If someone from the German localization who never touched their settings would switch to English, they would end up in the _default_ setting. But if someone in the German localization selected _feminine_ and then _masculine_ and then switched to the English localization, they would end up in the _masculine_ setting.

In the end, excessive gender and language switching will always produce little incongruencies like these. I don't think it matters much because we can assume that most users don't change their gender all the time unless in a playful manner, and changes in the UI language will not happen that often either.

PS: In Finnish the animate-inanimate distinction manifests in only the third person singular pronoun (hän/se). In all of the MediaWiki messages I do not remember a single case where these pronouns were used in relation to users.

What if the user is not a person, but a bot?

My point was that the pronouns hän/se do not appear in interface messages.

There seem to be very few cases. https://translatewiki.net/wiki/MediaWiki:Mobile-frontend-thanked-notice/fi seems to be a translation error where singular English _their_ was translated by plural _heidän_ instead of _hänen_, if I am not mistaken. But if people were talking about a wiki bot, would it be possible that they referred to it with _se_ instead of _hän_?

My justification for the current set is that it should be reasonable sensible even for speakers of languages with no grammatical distinction. We might want to grow it to four by adding inanimate.

I think it is very bad from a usability point of view if a user is offered choices that do not have any effect in the current language.

For example, if I use Finnish interface language in the English Wikipedia, it does make sense for me to be able to set whether I am "he" or "she".

And to make things even worse, the message https://translatewiki.net/wiki/MediaWiki:Gender-unknown/de has suggested for many years that users could hide their gender even though the setting _gender-unknown_ had really no effect at all – a totally false promise caused by direct translation of the English message.

It would be nice if you could correct the translation.

My point was that the pronouns hän/se do not appear in interface messages.

There seem to be very few cases. https://translatewiki.net/wiki/MediaWiki:Mobile-frontend-thanked-notice/fi seems to be a translation error where singular English _their_ was translated by plural _heidän_ instead of _hänen_, if I am not mistaken. But if people were talking about a wiki bot, would it be possible that they referred to it with _se_ instead of _hän_?

Yes, that was mistranslated. If inanimate gender were available, it would be {{GENDER:$2|hänen muokkauksestaan|sen muokkausesta}} I believe – although it feels weird to use käyttäjä and sen together.

I think it is very bad from a usability point of view if a user is offered choices that do not have any effect in the current language.

For example, if I use Finnish interface language in the English Wikipedia, it does make sense for me to be able to set whether I am "he" or "she".

I finally get it. I am sorry it has taken so long. I still think, though, that the danger of confusion about settings that have no effect in the current language but only in other languages (and also the fineprint says something about giving away personal information on the internet) outweighs the possible benefits for advanced users. Also, if you want to use the Finnish interface on the English wikipedia, you will have to change the language setting manually on the preferences page on the English wikipedia, so you will encounter the information about the English grammatical gender setting at least once.

And to make things even worse, the message https://translatewiki.net/wiki/MediaWiki:Gender-unknown/de has suggested for many years that users could hide their gender even though the setting _gender-unknown_ had really no effect at all – a totally false promise caused by direct translation of the English message.

It would be nice if you could correct the translation.

That is what I am doing. I have proposed using the same wording for MediaWiki:Gender-unknown/de as for MediaWiki:Gender-male/de – a short list of examples (as in the original MediaWiki:Gender-male/en). The parenthesis around MediaWiki:Gender-unknown are a disturbance. If I am correct, I will need to open a new issue here in order to get rid of them. I don't know any solution that could explain how this setting affects users who might be viewing the German wikipedia with a different interface language.

My justification for the current set is that it should be reasonable sensible even for speakers of languages with no grammatical distinction. We might want to grow it to four by adding inanimate.

Please do not confuse "inanimate" and "neuter". While in English all inanimate objects are referred to with the "it" pronoun (except for ships, apparently), this is not the case in other languages (like Polish), where the pronoun always matches the grammatical gender – a book is a she, a key is a he, a child (regardless of its gender) is an it. If anything, it would have to be a separate checkbox.

(I'll also note that it amuses me that we're discussing in a language which does not even have a separate term for "grammatical gender".)


…my point being that the system being proposed here may be too complicated to be practical. (Although the discussion is enlightening.)

Apart from the masculine/feminine/neuter differentiation, we'd probably have separate choices for animate/inanimate, formal/informal (for this is not really about the gender, but about how the user wants to be addressed! – this point appears to have been lost), and no doubt a few more could be derived from the pages @j_mach_wust linked.

My own language has merely three grammatical genders in singular and two (different) ones in plural forms, so you may go ahead and call me biased, but personally I find the current state a good compromise between supporting a wide range of persons and languages and preserving sanity in the common case. The original proposal by @kaldari (https://gerrit.wikimedia.org/r/#/c/182994/), adding a "neuter" option to the current unknown/masculine/feminine choice, also seemed reasonable.

Please do not confuse "inanimate" and "neuter". While in English all inanimate objects are referred to with the "it" pronoun (except for ships, apparently), this is not the case in other languages (like Polish), where the pronoun always matches the grammatical gender – a book is a she, a key is a he, a child (regardless of its gender) is an it. If anything, it would have to be a separate checkbox.

Can you give examples such as MaintenanceBot deleted its user page and Bartosz deleted his user page in Polish to illustrate the above?

MaintenanceBot deleted its user page
Bartosz deleted his user page

A robot, barring other hints, is also gramatically masculine, so both would be the same:

MaintenanceBot usunął jego stronę użytkownika.
Bartosz usunął jego stronę użytkownika.

If you're looking for examples of other genders:

Dziecko usunęło jego stronę użytkownika. (=Child deleted its…, neuter; the pronoun "jego" (=theirs) is the same as for masculine gender)
Kobieta usunęła jej stronę użytkownika. (=Woman deleted her…, feminine)

As I probably mentioned earlier, there is no gender-neutral pronoun and the neuter version of the verb is not appropriate, so in cases where the gender is not known and the text's author wishes to underline this fact, constructions like this are often used (you'll find a few in MediaWiki's translation):

Osoba usunął(-ęła) swoją stronę użytkownika. (=A person deleted…)
Osoba usunął/ęła swoją stronę użytkownika.

(Aside: the English source you've given me is not clear on whether the user page being deleted belongs to the deleter. If it does, a self-referential pronoun "swoją" (=their own) could be used instead of "jej"/"jego" (=her/his). This is universal for all genders, and instead varies on the grammatical gender of the object (e.g. the page being deleted; "strona" (=page) is feminine; would be "swój" for masc. and "swoje" for neuter and plural forms.)

(Further aside: "user page" could also vary based on the grammatical gender of the user, if it is known. I just used "strona użytkownika" above, which is masculine, but one could say "strona użytkowniczki" for feminine users. There is no neuter form.)

The binary gender + gender-neutral choice is adequate for the English language, but it may be inadequate for other languages. Different languages should allow for different numbers of choices.

It is a typical example of English bias: You take a peculiarity of English grammar (two genders + a gender-neutral way of referring to persons) and then try to apply it to other languages, whether it fits or not.

And to make things even worse, the message https://translatewiki.net/wiki/MediaWiki:Gender-unknown/de has suggested for many years that users could hide their gender even though the setting _gender-unknown_ had really no effect at all – a totally false promise caused by direct translation of the English message.

It would be nice if you could correct the translation.

The German translation of the three-way choice, which can only produce two different results in the German localization, now looks as follows (see https://de.wikipedia.org/wiki/Spezial:Einstellungen#mw-prefsection-personal-i18n):

  • („Der Benutzer“, „seine Diskussion“, „er bearbeitet“ usw.)
  • „Die Benutzerin“, „ihre Diskussion“, „sie bearbeitet“ usw. (weiblich)
  • „Der Benutzer“, „seine Diskussion“, „er bearbeitet“ usw. (männlich)

In English, this can be roughly translated thusly (like Polish, German has no gender-neutral direct translation of the word user):

  • (“The he-user”, “his discussion”, “he edits” etc.)
  • “The she-user”, “her discussion”, “she edits” etc. (female)
  • “The he-user”, “his discussion”, “he edits” etc. (male)

This hopefully makes it obvious that options 1 and 3 produce the same result. Since there are only two possible results on the German localization, a two-way choice would be much more sensible and less confusing:

  • „Der Benutzer“, „seine Diskussion“, „er bearbeitet“ usw. (männlich)
  • „Die Benutzerin“, „ihre Diskussion“, „sie bearbeitet“ usw. (weiblich)

MaintenanceBot deleted its user page
Bartosz deleted his user page

A robot, barring other hints, is also gramatically masculine, so both would be the same:
MaintenanceBot usunął jego stronę użytkownika.
Bartosz usunął jego stronę użytkownika.

I suppose that the Polish localization currently gets this right and correctly refers to bots in the correct masculine gender. On the other hand, the English localization currently gets it wrong and refers to bots in the incorrect masculine gender:

MaintenanceBot deleted his user page

At first sight, this might seem to contradict my above claim that the current situatin is the result of typical English bias. Yet I still think we have a typical case of English bias. The current three-way choice (gender-unknown, feminine, masculine) perfectly fits the English language if you only think of potential animate users. I guess that people forgot to consider that there might be inanimate users such as bots.

(Further aside: "user page" could also vary based on the grammatical gender of the user, if it is known. I just used "strona użytkownika" above, which is masculine, but one could say "strona użytkowniczki" for feminine users. There is no neuter form.)

The wiki software potentially allows for such a distinction, though not yet in all cases.


@matmarex: The WALS mentions varieties of Polish where women might be referred to in the neuter grammatical gender (see Chapter Sex-based and Non-sex-based Gender Systems § Variety in sex-based systems). Is it possible that some Polish wiki users would follow that usage and prefer the neuter grammatical gender, or is this something extremely dialectal that would never be used on a Polish wiki?

@matmarex: The WALS mentions varieties of Polish where women might be referred to in the neuter grammatical gender (see Chapter Sex-based and Non-sex-based Gender Systems § Variety in sex-based systems). Is it possible that some Polish wiki users would follow that usage and prefer the neuter grammatical gender, or is this something extremely dialectal that would never be used on a Polish wiki?

I've never encountered this usage and I was not aware it exists; it's likely archaic. The referenced book appears to not be available online, but I found another reference to it at Google Books [1] which says that this is limited to "certain Silesian dialects". The form that appears there ("Zosię" derived from the name "Zosia") seems reminiscent of the words for young animals, like "źrebię" (=foal), "szczenię" (=puppy), "prosię" (=piglet), "cielę" (=calf) and others (all of which are, by the way, gramatically neuter), so I'm guessing that is the source and that it really applies only to children (I am, however, not a linguist). I doubt that it would come up in practice for wiki editors.

[1] https://books.google.pl/books?id=NKxhAAAAMAAJ&q=Osobliwa+zmiana+rodzaju+naturalnego+w+dialektach+polskich&dq=Osobliwa+zmiana+rodzaju+naturalnego+w+dialektach+polskich&hl=pl&sa=X&ved=0CBYQ6AEwAWoVChMIs4SBgZTlxwIVQrYUCh1K8wKg (not sure if the link will work for you)

The binary gender + gender-neutral choice is adequate for the English language, but it may be inadequate for other languages. Different languages should allow for different numbers of choices.
It is a typical example of English bias: You take a peculiarity of English grammar (two genders + a gender-neutral way of referring to persons) and then try to apply it to other languages, whether it fits or not.

I have had a look at the current localizations of the top Wikipedia languages in order to see how many languages would currently benefit from a reduction of the English-biased three-way choice. (This is only about the current situation, so benefits from a potential future expansion of the three-way choice are not discussed.)

Languages where the three-way choice is adequate:

  • English: 32 messages with three-way choice, 0 messages with two-way choice
  • Swedish: 13 messages with three-way choice, 2 messages with two-way choice

Languages where the three-way choice is more or less adequate. They have a significant number of three-way choice messages, even though they ignore it in a majority of the messages:

  • Dutch: 15 messages with three-way choice, 53 messages with two-way choice
  • Italian: 105 messages with three-way choice, 65 messages with two-way choice
  • Polish*: 36 messages with three-way choice, 417 messages with two-way choice
  • Portuguese*: 31 messages with three-way choice, 124 messages with two-way choice

Languages where a two-way choice would be more adequate. The number of three-way choice messages is insignificant (<3%); if a user chooses the supposedly gender-neutral option in these languages, that choice will almost always (>97%) be ignored and the user will be referred to with the male grammatical gender instead:

  • German*: 5 messages with three-way choice, 242 messages with two-way choice
  • French*: 7 messages with three-way choice, 234 messages with two-way choice
  • Russian*: 10 messages with three-way choice vs. 507 messages with two-way choice
  • Spanish*: 5 messages with three-way choice, 267 messages with two-way choice (including a couple of three-way choice messages where the supposedly gender-neutral choice is a really male)

Languages where no choice at all would be more adequate:

  • Vietnamese: 1 message with two-way choice
  • Japanese
  • Chinese (simplified Chinese, though, has 22 messages with a three-way choice, but they all appear to have an erroneous plural translation of the English singular "they")

In languages marked with an asterisk (*), there is a two-way choice in the central message MediaWiki:Group-user-member.

Most of these languages' localizations appear to include a few messages where the English gender-neutral singular they has been erroneously translated as if it were a plural, especially in the message MediaWiki:Ep-user-roles-message-additional. I have corrected these messages in the languages I am proficient in (de es fr it pt). I suspect that it wrong in numerous localizations.

You might argue that the translators for languages with few three-way choice messages are working on adding them. Judging from the German case I know best, this is not true. It would require a major revolution in the Wikipedia community. There would be fierce opposition and endless debates. The few messages that already offer a three-way choice are not the beginning of a trend, but mere outliers. More importantly, the choices should be useful to the users now, and not in some distant future. And in the flexible solution I envision, the German Wikipedia community can decide in the future that they want to add a third gender-neutral choice. Then they could change the setting for number of grammatical genders from the value of 2 to the new value of 3. They could even complete the necessary translations before increasing the value, so the new system would work at once.

I found another reference to it at Google Books [1] which says that this is limited to "certain Silesian dialects".

Thanks for your information. It figures; the WALS says it occurs “in some southern Polish dialects” and that “the communities involved are small”. It appears to be less common than it is in Alemannic.

Nemo_bis added a comment.EditedSep 10 2015, 7:45 AM

Which repositories did you check? https://translatewiki.net/wiki/Special:SearchTranslations might be used to check all of them at once.

even though they ignore it in a majority of the messages: [...]
Italian: 105 messages with three-way choice, 65 messages with two-way choice

The Italian locale does not "ignore" the unspecified option; instead, many translations were switched to words which work for any gender (for instance "utente" and other -e words which are invariable).

Which repositories did you check? https://translatewiki.net/wiki/Special:SearchTranslations might be used to check all of them at once.

I am not yet familiar with the repositories, so I only checked Special:SearchTranslations and then grepped for instances of “{{GENDER” followed by two forms, by three forms, or by three forms where the first is equal to the third. The links I used were e.g.:
https://translatewiki.net/w/i.php?title=Special:SearchTranslations&query=GENDER&language=it&limit=500
https://translatewiki.net/w/i.php?title=Special:SearchTranslations&query=GENDER&language=it&limit=500&offset=500).

even though they ignore it in a majority of the messages: [...]
Italian: 105 messages with three-way choice, 65 messages with two-way choice

The Italian locale does not "ignore" the unspecified option; instead, many translations were switched to words which work for any gender (for instance "utente" and other -e words which are invariable).

My bad. The Italian localization (unlike the Dutch, Polish or Portuguese localization) really honours the gender-neutral option in a majority of cases (105 messages with three-way choice), even though it still has a significant number of messages where the gender-neutral option is ignored and users are by default referred to as if they were male (65 messages with two-way choice). In the end, this does not change my above classification that counts Italian among the languages “where the three-way choice is more or less adequate”.

The use of an invariable noun such as utente is an elegant solution. Of course, though, the noun utente still takes variable forms of gender agreement, e.g. questo utente vs. questa utente vs. questo/a utente (so the elegant solution is broken by two-way choices such as https://translatewiki.net/wiki/MediaWiki:Usermerge-protectedgroup/it). The Italian localization with its invariable translation of user contrasts with other localizations where the default translation of user takes the male form, e.g. the Polish, Portuguese, German, French, Russian, or Spanish localization.

Nemo_bis closed this task as Declined.Mar 14 2017, 3:43 PM
Nemo_bis added subscribers: jeblad, zhuyifei1999.

There is still no demonstrated need for this feature. Translators are currently able to use 1, 2 or 3 grammatical genders. We don't have a significant demand for 4+ gender options.

Restricted Application removed a subscriber: Liuxinyu970226. · View Herald TranscriptMar 14 2017, 3:45 PM
TheDJ added a subscriber: TheDJ.Nov 8 2018, 9:47 AM

Been thinking a bit about this. Part of this problem comes from the mixing of gender and grammatical gender in our implementation of course. It also seems that we don't particularly care about peoples actual gender within WMF wikis... What if:

  1. we introduce a new preference 'grammatical-gender' with values feminine, masculine, neuter
  2. default of neuter for the languages that have neuter
  3. copy existing values from 'gender' to the grammatical-gender on wmf properties
  4. hide the existing 'gender' option for WMF only (we might want to keep it for MediaWiki in general)
  5. adapt the {{GENDER:}} wikitext keyword to source from gender first, and from grammatical-gender second
  6. adapt the gender in translation engine to source from grammatical-gender first and gender second. (or maybe we should just have a modifier on the GENDER magic word, to choose between the gender prefs ?)

Now this is a massive amount of work and additional complexity. I'm not entirely in favour of it honestly (we have bigger problems), but it seems more practical/correct to me than earlier suggestions.
It would also be interesting to have an analysis bot on the translations, showing the completeness of Neuter translations for the languages

I still see no real demonstrated need for inanimate grammer forms, but they could theoretically be added to that list.

The simple example for inanimate option are the various kind of bot accounts that should use it or its in English (instead of singular they).

I'd like to remind the list of choices we present should not be tied to the user's interface language, as its primary use is to display messages to other users who may use a different language.

In my opinion we should also not mix grammatical gender (in the sense that grammatical gender of names are lexically defined) here. That is job for {{GRAMMAR}}.

Apap04 added a subscriber: Apap04.Nov 20 2018, 7:10 PM