Page MenuHomePhabricator

Update authentication metrics for IP masking
Closed, ResolvedPublic

Description

The authentications and authentication-metrics Grafana boards count account creations. After IP masking that won't be super useful; we should split named user creations and temp user creations.

Event Timeline

kostajh subscribed.

@Tgr does your team have plans to work on this anytime soon? Do you need it done before we deploy to pilot wikis?

pmiazga subscribed.

@kostajh it's not a blocker for IP Masking deployment. We need it to monitor account creations to catch possible misbehavior of the system. We will pick it up soon.

pmiazga changed the task status from Open to In Progress.Sep 23 2024, 10:06 AM
pmiazga claimed this task.
pmiazga moved this task from Soon to Current Sprint on the MediaWiki-Platform-Team board.

After a quick analysis of our logging system:

We do not directly track stats nor metrics in auth code. We use the authevents monolog channel, and we log specific actions, like for example Account Creation attempt:

LoggerFactory::getInstance( 'authevents' )->info( 'Account creation attempt', [
	'event' => 'accountcreation',
	'successful' => $success,
	'status' => strval( $status ),
] );

Source: https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/core/+/0eace3287b57a678e4a9b9dd479548c4bf954360/includes/specials/SpecialCreateAccount.php#187

Then on Production environment, in wmf-config/logging.php we inject a custom Monlog Handler that handles the the authevents channel - WikimediaEvents\AuthManagerStatsdHandler from WikimediaEvents extension.
This handler is processing the log $record, and calls the statsFactory to increment cconters based on log context - https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/WikimediaEvents/+/refs/heads/master/includes/AuthManagerStatsdHandler.php#63

Proposed solution:

  • in places where we log on authevents stream and we want to distinguish stats from named and temp users - add the accountType property with one of ['named', 'temp', 'anon']
  • the AuthManagerStatsdHandler should check the accountType from 'context' - ignore when it's not present.
LoggerFactory::getInstance( 'authevents' )->info( 'Account creation attempt', [
	'event' => 'accountcreation',
        'accountType' => /** here goes to logic to pass the named|temp etc **/,
	'successful' => $success,
	'status' => strval( $status ),
] );
  • when $accountType equals temp - add temp to the $keyparts after $entrypoint:
$accountType = $this->getField('accountType', $record['context']);
[...]
// some key parts can be null and will be removed by array_filter
$keyParts = [ 'authmanager', $event, $type, $entrypoint ];
if ($accountType === 'temp']) {
   $keyParts[] = 'temp';
}

This solution will allow a smooth transition, also, all existing graphs will remain untouched and we will keep the history. After updating WikimediaEvents extension existing graphs won't include temporary accounts stats, for those, we will need to or create new graphs, or update existing graphs to include the new buckets.
For all other stats, that do not include accountType,

Open questions: not sure if we need to handle all three options - named, temp and anon. Maybe just passing temp: bool is sufficient, but most likely it doesn't hurt us to allow passing more information, which may be useful.

@Tgr @Krinkle do you have any thoughts on this ?

Thanks for these notes @pmiazga. I was also looking at this code today in context of T357763: [Epic] Create a temporary accounts initiative Grafana dashboard. Specifically, we'd like the eventual solution to support the "Rate of temporary account creation" and "Rate of account creation" stories from that task; it sounds like what you're proposing would work.

@kostajh yes, I think you can get those rates from this work. Once I get a green light I'll implement this solution. I expect to land it by the end of this week.

It's worth thinking through which graphs we want to split. For example, there is no way to create a temp account (they are always autocreated) but the performer can be a temp account (when a temp user is registering to become a proper user) - maybe we want to track that separately from normal account creations? (IMO it's not that useful for monitoring purposes, but maybe interesting as a low-effort, low-resolution metric for the IP masking project - seems useful for T375510.)

At a first glance, the affected metrics are:

  • autocreation
  • central login
  • central autologin (broken into a number of separate dashboards but it's the same metric)
  • account creation, with performer rather than the created user
  • logout (again not super useful, but maybe interesting for T&S)

Wrt metric name, I don't think there is one that's both backwards-compatible and easy to chart (I assume we want something similar to how entry points are handled on the dashboard, ie. a dropdown filter). The minimum-effort solution is to just put the user type somewhere in the metric name, and accept that old data disappears. Alternatively, you can use an alias in the charted function to rename the old metrics to the new format - that's going to lead to quite ugly function definitions but nice dashboards.

Even more alternatively, maybe this is a good time to switch to Prometheus (which is much better suited for multidimensional metrics) as that will result in the loss of old data anyway. Way more effort though.

Open questions: not sure if we need to handle all three options - named, temp and anon. Maybe just passing temp: bool is sufficient, but most likely it doesn't hurt us to allow passing more information, which may be useful.

Generally for authentication actions the subject will never be anonymous (except on failure to authenticate, but that's not an interesting distinction since we already track success status), but for the performer (ie. account creation) all three are possible and it's mildly useful to differentiate them.

Change #1075544 had a related patch set uploaded (by Pmiazga; author: Pmiazga):

[mediawiki/extensions/WikimediaEvents@master] Refactor AuthManagerStatsdHandler to use null coalescing operator.

https://gerrit.wikimedia.org/r/1075544

Change #1075544 merged by jenkins-bot:

[mediawiki/extensions/WikimediaEvents@master] Refactor AuthManagerStatsdHandler to use null coalescing operator.

https://gerrit.wikimedia.org/r/1075544

Change #1075587 had a related patch set uploaded (by Pmiazga; author: Pmiazga):

[mediawiki/core@master] WIP: pass accountType to authevents stream

https://gerrit.wikimedia.org/r/1075587

Change #1075549 had a related patch set uploaded (by Gergő Tisza; author: Pmiazga):

[mediawiki/extensions/WikimediaEvents@master] AuthManagerStatsdHandler: Pass the accountType when present

https://gerrit.wikimedia.org/r/1075549

Change #1075549 merged by jenkins-bot:

[mediawiki/extensions/WikimediaEvents@master] AuthManagerStatsdHandler: Pass the accountType when present

https://gerrit.wikimedia.org/r/1075549

Change #1078408 had a related patch set uploaded (by Pmiazga; author: Pmiazga):

[mediawiki/extensions/CentralAuth@master] Auth: pass accountType to authevents log stream

https://gerrit.wikimedia.org/r/1078408

Change #1078408 merged by jenkins-bot:

[mediawiki/extensions/CentralAuth@master] Auth: pass accountType to authevents log stream

https://gerrit.wikimedia.org/r/1078408

Change #1080656 had a related patch set uploaded (by Pmiazga; author: Pmiazga):

[mediawiki/core@master] chore: Move authevents logging into AuthManager

https://gerrit.wikimedia.org/r/1080656

Change #1075587 merged by jenkins-bot:

[mediawiki/core@master] Auth: pass accountType to authevents log stream

https://gerrit.wikimedia.org/r/1075587

Change #1082265 had a related patch set uploaded (by Gergő Tisza; author: Pmiazga):

[mediawiki/core@wmf/1.43.0-wmf.27] Auth: pass accountType to authevents log stream

https://gerrit.wikimedia.org/r/1082265

Change #1082269 had a related patch set uploaded (by Gergő Tisza; author: Pmiazga):

[mediawiki/core@wmf/1.43.0-wmf.28] Auth: pass accountType to authevents log stream

https://gerrit.wikimedia.org/r/1082269

Change #1082265 merged by jenkins-bot:

[mediawiki/core@wmf/1.43.0-wmf.27] Auth: pass accountType to authevents log stream

https://gerrit.wikimedia.org/r/1082265

Change #1082269 merged by jenkins-bot:

[mediawiki/core@wmf/1.43.0-wmf.28] Auth: pass accountType to authevents log stream

https://gerrit.wikimedia.org/r/1082269

Mentioned in SAL (#wikimedia-operations) [2024-10-23T14:07:50Z] <tgr@deploy2002> Started scap sync-world: Backport for [[gerrit:1082265|Auth: pass accountType to authevents log stream (T341650 T375510 T375505)]], [[gerrit:1082269|Auth: pass accountType to authevents log stream (T341650 T375510 T375505)]]

Mentioned in SAL (#wikimedia-operations) [2024-10-23T14:10:15Z] <tgr@deploy2002> tgr: Backport for [[gerrit:1082265|Auth: pass accountType to authevents log stream (T341650 T375510 T375505)]], [[gerrit:1082269|Auth: pass accountType to authevents log stream (T341650 T375510 T375505)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2024-10-23T14:21:14Z] <tgr@deploy2002> Finished scap sync-world: Backport for [[gerrit:1082265|Auth: pass accountType to authevents log stream (T341650 T375510 T375505)]], [[gerrit:1082269|Auth: pass accountType to authevents log stream (T341650 T375510 T375505)]] (duration: 13m 23s)

Change #1082488 had a related patch set uploaded (by Pmiazga; author: Pmiazga):

[mediawiki/extensions/CentralAuth@master] chore: AuthManager::autoCreateUser log authevents now

https://gerrit.wikimedia.org/r/1082488

Change #1080656 merged by jenkins-bot:

[mediawiki/core@master] chore: Move authevents logging into AuthManager

https://gerrit.wikimedia.org/r/1080656

Change #1082488 merged by jenkins-bot:

[mediawiki/extensions/CentralAuth@master] chore: AuthManager::autoCreateUser log authevents now

https://gerrit.wikimedia.org/r/1082488

Change #1082829 had a related patch set uploaded (by Gergő Tisza; author: Pmiazga):

[mediawiki/core@wmf/1.43.0-wmf.28] chore: Move authevents logging into AuthManager

https://gerrit.wikimedia.org/r/1082829

Change #1082830 had a related patch set uploaded (by Gergő Tisza; author: Pmiazga):

[mediawiki/extensions/CentralAuth@wmf/1.43.0-wmf.28] chore: AuthManager::autoCreateUser log authevents now

https://gerrit.wikimedia.org/r/1082830

Change #1082829 merged by jenkins-bot:

[mediawiki/core@wmf/1.43.0-wmf.28] chore: Move authevents logging into AuthManager

https://gerrit.wikimedia.org/r/1082829

Change #1082830 merged by jenkins-bot:

[mediawiki/extensions/CentralAuth@wmf/1.43.0-wmf.28] chore: AuthManager::autoCreateUser log authevents now

https://gerrit.wikimedia.org/r/1082830

Mentioned in SAL (#wikimedia-operations) [2024-10-24T20:44:28Z] <thcipriani@deploy2002> Started scap sync-world: Backport for [[gerrit:1082829|chore: Move authevents logging into AuthManager (T341650 T375510 T375505)]], [[gerrit:1082830|chore: AuthManager::autoCreateUser log authevents now (T341650 T375510 T375505)]]

Mentioned in SAL (#wikimedia-operations) [2024-10-24T20:46:43Z] <thcipriani@deploy2002> tgr, thcipriani: Backport for [[gerrit:1082829|chore: Move authevents logging into AuthManager (T341650 T375510 T375505)]], [[gerrit:1082830|chore: AuthManager::autoCreateUser log authevents now (T341650 T375510 T375505)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2024-10-24T21:02:38Z] <thcipriani@deploy2002> Finished scap sync-world: Backport for [[gerrit:1082829|chore: Move authevents logging into AuthManager (T341650 T375510 T375505)]], [[gerrit:1082830|chore: AuthManager::autoCreateUser log authevents now (T341650 T375510 T375505)]] (duration: 18m 10s)

Is there anything else to do here, or can we resolve the task? Thanks again for all your work on this!

I need to update dashboards on our side, and then I'll close this ticket.

Dashboards are updated.