Page MenuHomePhabricator

SecurePoll should use active wiki (or allow users to select a wiki)
Open, Needs TriagePublicFeature

Description

Feature summary:
SecurePoll should not use home wiki to log where voters are voting from in SecurePoll voting lists.

Instead, it should either:

  • use voters' active wikis, or
  • allow users to select a wiki to "represent" when they vote.

Use case(s) (list the steps that you performed to discover that problem, and describe the actual underlying problem which you want to solve. Do not describe only a solution):
Currently, when one votes in a SecurePoll election on votewiki, no matter where they clicked the link to vote, their home wiki is logged as the home wiki as defined in CentralAuth / by SUL. This is not always the wiki the user is most active on, and skews quite heavily towards large wikis like English Wikipedia and Meta-Wiki.

This is visually apparent in the statistics for the recent UCoC ratification vote.

To use an example from that recent vote, User:BP OMowe was logged in the voter list as being an English Wikipedian, but (while they are very established there) they have more presence on the Swedish Wikipedia. It is plausible that they would consider themselves a member of the Swedish Wikipedia community first.

Benefits (why should this be implemented?):

  • Voting statistics would more accurately represent where voters are actually coming from.
  • With additional changes, voting stats like those collected in 2011 for the image filter referendum could be more reliable. This would allow for future global elections to split votes by project. (We'd need to evaluate the implications of this in a separate task)

Event Timeline

Thank you for creating this ticket. The current situation creates a severe lack of understanding of the representatiom feeling of our voting communities and tends to homogenise participation towards major wikis.

Another possibility, even though less desirable than the one proposed, would be to extract the first language (N-) from the Babel infobox of that user in Meta, in case it is available. Or to ask the users in the preferences whether their N- Babel language matches their most active wiki at some point during the OAuth or in other occasions.

Technically, the Home is frequently (and often wrongly) set to the English Wikipedia, because new accounts e.g. created via the Events & Progams dashboard are always created on the English Wikipedia, while the user is possibly most of the time active on other Wikipedia language's. Therefore tools should create the account on the correct Wikipedia language. I have filed T307878 for this problem.

As an alternative, the polling system could use the Wikipedia where the user has the most contributions.

As an alternative, the polling system could use the Wikipedia where the user has the most contributions.

I think this is a fine idea, though it does mean that wikis where users can "accidentally" make lots of edits would have bias. For example my staff account has 24k+ edits there mostly due to translation tagging and importing: https://meta.wikimedia.org/wiki/Special:CentralAuth/JSutherland_(WMF)

Self-identification is prone to many unintended consequences and reality distortion. If an alternative to "first wiki" is created, then "most active recent wiki" is the most logical.

At least in the case of global votes i.e. Board elections, a script must be run to identify the eligible voters (i.e. +300 edits total and 20 in the past 6 months). That script could also identify the wiki with most edits in the past 6 months, and that would be the alternative offered. In theory, this wouldn't even require to touch SecurePoll.

However, chances are that if we remove "first wiki" and assign people directly to "most active recent wiki", someone will complain as well, for the opposite reasons, saying that "first wiki" is what they identify with, etc. So maybe introduce a selection of two options in the ballot, first and most active recent?

Potential problem... Widata contributors will know this better but... A single session of Wikidata editing might get a higher number of edits than regular days on regular Wikipedia editing. Many voters might be surprised to be assigned to Wikidata because of a couple of things they did there in the past months. If this is a problem indeed, we could introduce... three options in the selector: first wiki, most active recent, and second most active recent. One of the three should do it. :)

I agree with @Qgik comment. But that's why the "most active wiki" should not rely on recent editions that can disrupt the long term editing practice, but rather focus in "all-time most active wiki". That way we will match the identification feeling of a user, in most cases, within a solid timeline.

Hello!

I had a look into this topic and i would like to ask some questions / do proposals.

The most likely options, a voter wants his "home wiki" should be, include

  • first registration wiki
  • most edits wiki
  • most recently logged in wiki
  • based on a setting in the user preferences (CentralAuth preferences)

Every option has its flaws, like you discussed here earlier.

Propably the most simple implementation would be in that case, to give the user the choice of his "home wiki" on the voter page via a mandatory input box (dropdown or similar).

In that case the question would be, what options do the user see? What would be the best compromise of good UX and selection options?

  • All wikimedia wiki projects but with highlighted options for the most likely choice(s)
  • Wikimedia wiki projects where user has edits (with or without highlighted options for the most likely choice(s))
  • Wikimedia wiki projects where users account is attached (with or without highlighted options for the most likely choice(s))
  • Only the most likely choices

Which of the most likely choices could be used as the preset default value?
Should the choice be stored somewhere, e.g.

  • in a cookie?
  • in user central auth preferences (in case there has been this option introduced)?

I hope i could summarize the problem and i looking forward to your opinion.

Sorry for the delay in getting back to you here @Driedmueller.

The most likely options, a voter wants his "home wiki" should be, include

  • first registration wiki

This is (usually) how "homewiki" is defined now and in the centralauth database, which of course we do not want to change.

Propably the most simple implementation would be in that case, to give the user the choice of his "home wiki" on the voter page via a mandatory input box (dropdown or similar).

I think this is great if possible. The simplest solution and the one that would normally work would be the most-edited wiki I think, but there are certainly wikis where one can accumulate a lot of edits without realising (Wikidata comes to mind immediately there).

In that case the question would be, what options do the user see? What would be the best compromise of good UX and selection options?

  • All wikimedia wiki projects but with highlighted options for the most likely choice(s)
  • Wikimedia wiki projects where user has edits (with or without highlighted options for the most likely choice(s))
  • Wikimedia wiki projects where users account is attached (with or without highlighted options for the most likely choice(s))
  • Only the most likely choices

I would say the best solution here is probably limiting this to wikis where the user has edited. We could even limit it to wikis where the user has more than (e.g.) 10% of their edits.

For example, if I have 10,000 edits on enwiki, 2,000 on commonswiki, 600 on dewiki, and 20 on ruwiki (and maybe 1 or 2 on a few more wikis) we might want to only show enwiki, commonswiki, and dewiki as options, since they are more likely to be considered by the user as their "most active wiki". This is especially true for users editing on much smaller wikis I think.

Which of the most likely choices could be used as the preset default value?

The most-edited wiki I would say.

Should the choice be stored somewhere, e.g.

  • in a cookie?
  • in user central auth preferences (in case there has been this option introduced)?

Ideally not a cookie for privacy reasons, I think there is not a CentralAuth setting for this currently. My understanding is that it would take a lot of effort to create one.

Hopefully this helps. I'm happy to clarify anything further!

Ok, cool!
So then, we would implement a "most active wiki" input field on the vote page and populate it with the wikis where the user has more than X% of their edits.
I would suggest introducing a config variable to define X, e.g. with value of 10 for the wikis with more than 10% edits from the user.
The users choice is not stored, for now.
We can get the necessary data from CentralAuthUser::queryAttached method. e.g.

$centralUser = CentralAuthUser::getInstanceByName( $user->getName() );
$attached = $centralUser->queryAttached();

One technical question:
Where does the logging of the users "homewiki" actually occur in the code? We were not able to find it.
Like the data to create statistics that can be seen in recent UCoC ratification vote. Where does that come from?
In VotePage::logVote there is this snippet which inserts logging data into the database:

		$dbw->insert(
			'securepoll_votes',
			[
				'vote_election' => $this->election->getId(),
				'vote_voter' => $this->voter->getId(),
				'vote_voter_name' => $this->voter->getName(),
				'vote_voter_domain' => $this->voter->getDomain(),
				'vote_record' => $encrypted,
				'vote_ip' => IPUtils::toHex( $request->getIP() ),
				'vote_xff' => $xff,
				'vote_ua' => $_SERVER['HTTP_USER_AGENT'],
				'vote_timestamp' => $now,
				'vote_current' => 1,
				'vote_token_match' => $tokenMatch ? 1 : 0,
				'vote_struck' => 0,
				'vote_cookie_dup' => 0,
			],
			__METHOD__
		);

By looking at this, i could guess that 'vote_voter_domain' => $this->voter->getDomain() contains the info, maybe. In this case the corresponding data to create homewiki statistics is stored in the db field vote_voter_domain.
If you could point us to that, that would be really helpful :)

Ok, cool!
So then, we would implement a "most active wiki" input field on the vote page and populate it with the wikis where the user has more than X% of their edits.
I would suggest introducing a config variable to define X, e.g. with value of 10 for the wikis with more than 10% edits from the user.
The users choice is not stored, for now.
We can get the necessary data from CentralAuthUser::queryAttached method. [...]

I think that sounds great if it is possible! I think a config variable is a great idea, that might be set up for "All wikis" elections in the votewiki UI.

One technical question:
Where does the logging of the users "homewiki" actually occur in the code? We were not able to find it.
[...]
By looking at this, i could guess that 'vote_voter_domain' => $this->voter->getDomain() contains the info, maybe. In this case the corresponding data to create homewiki statistics is stored in the db field vote_voter_domain.
If you could point us to that, that would be really helpful :)

As we spoke about on the call today, I don't know the technical answer to this :( but here's what's in the SQL table on votewiki (this is the first result in the table so there is no IP address information in it. I also redacted the PGP code block for readability / privacy):

select * from securepoll_votes limit 1;
+---------+---------------+------------+-----------------+-------------------+-------------+-------------+---------+----------+---------+----------------+--------------+------------------+-----------------+
| vote_id | vote_election | vote_voter | vote_voter_name | vote_voter_domain | vote_struck | vote_record | vote_ip | vote_xff | vote_ua | vote_timestamp | vote_current | vote_token_match | vote_cookie_dup |
+---------+---------------+------------+-----------------+-------------------+-------------+-------------+---------+----------+---------+----------------+--------------+------------------+-----------------+
|       1 |           290 |          1 | Jamesofur       | en.wikipedia.org  |           1 | <pgp code>  |         |          |         | 20130607190210 |            0 |                1 |               1 |
+---------+---------------+------------+-----------------+-------------------+-------------+-------------+---------+----------+---------+----------------+--------------+------------------+-----------------+
1 row in set (0.002 sec)

Implementation is nearly finished.
This is what it looks like currently:

image.png (381×791 px, 27 KB)

@TAdeleye_WMF Is the UI and labeling ok?

MR: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/SecurePoll/+/1017465

@TAdeleye_WMF confirmed me in our today's meeting that the UI and labeling are ok