Page MenuHomePhabricator

IP Auto-reveal: Agree and implement metrics and instrumentation plan
Closed, ResolvedPublic

Description

Summary

Once IP Auto-reveal is deployed, we will need a way to monitor usage and have metrics in place that will measure success. This task is to agree on what we want to measure and why, and how we will do it.

What we want to measure and why

MetricQuestions / assumptions to test
Average number of times users enable the tool per dayHow frequently is the tool being used? Do users know where to enable the IP Auto-reveal tool?
Number of users who set duration to 30 minutes compared with 1 hourDo users prefer 30 minutes or 1 hour duration time?
Average number of times users disable the tool per dayHow frequently are users turning off the tool?
Number of users who click the extend button in the turn off dialogHow many users need to extend the duration? Do we need to lengthen the duration?
Number of people enabling the tool on mobile compared with desktopAre more people using the tool on desktop?
Nice to have (check feasibility first)
MetricQuestions / assumptions to test
Number of times a user clicks extend button in quick successionIs the default extend duration (10 minutes) long enough?
Number of people trying to extend beyond 24 hoursAre our defaults useful or not?
Number of users using the API directlyAre our defaults useful or not?

User story

As Trust & Safety Product, we want to understand the usage and impact of IP Auto-reveal, in order to measure its success.

Acceptance criteria

Event Timeline

@KColeman-WMF Should this task block T386492: IP auto-reveal: Assign the IP auto-reveal right to user groups, which is essentially the deployment task?

Yes, I think we should agree metrics and have instrumentation in place before deployment.

I've taken a look at this and would like to apologize in advance about the number of questions and comments I have. They'll help me understand the context and what decisions we're trying to make, so that the instrumentation and measurements can help support those. They're also to make sure we've covered all the bases we're interested in.

Getting the key questions out of the way first:

  1. Is there something that would make you un-deploy this feature? From my cursory look at T358853 and T374869, it appears the answer is "no" since there's a clear use case for this tool, but I'm happy to learn that I've missed something.
  2. Are there any design-related decisions that aren't covered? One of the things I jotted down was "do users enable the reminder?" as a suggestion that it should be on by default, but there might be others?
  3. Do you need to monitor this beyond the standard 90-day data retention period?

When it comes to measurements, my reading of this is that we've got two categories. One is amount of usage, and the other is about interface option defaults. For both of these, we're interested in both what happens when the feature is activated (the user clicks "Turn on"), and what happens along the way (is the duration extended?, is it turned off explicitly?). I'll call the time period between activation and deactivation a "session", because that's what it is to me. We then get these three usage-related metrics:

  1. Number of users per day
  2. Number of sessions per day
  3. Average number of sessions per user

The table above used "average … per day", which to me is something we can derive from the data if needed.

When it comes to interface options and such, I jotted down the following metrics:

  1. Number of sessions started with a 30-minute duration.
  2. Number of sessions started with a 60-minute duration.
  3. Number of sessions that were explicitly ended (by clicking the "Turn off" button).
  4. Number of sessions that expired.
  5. Average duration of 30-minute sessions that are extended.
  6. Average duration of 60-minute sessions that are extended.
  7. Number of sessions with the "Remind me" option turned on (maybe also connected to the 30/60-minute option).

Those would then let us know 1) if 30 or 60 minutes is the preferred duration, 2) whether users explicitly end the session early, let it expire, or extend it, and 3) if most sessions get extended, what the preferred session length is.

When it comes to the "quick succession" metric in the nice to have section above, I find that less important than understanding how long folks want the session to be. In a similar vein, I suspect the session duration averages will also give us an idea if going beyond 24 hours is something folks are looking for.

I've deliberately not talked about technical details of the instrumentation yet, because I think that can wait until we have the decisions and metrics needs mapped out first. Sorry again for the long reply!

Thanks for taking a look, @nettrom_WMF!

  1. Is there something that would make you un-deploy this feature? From my cursory look at T358853 and T374869, it appears the answer is "no" since there's a clear use case for this tool, but I'm happy to learn that I've missed something.

I don't think we would be undeploying this in response to any of these metrics. (Undeployment scenarios that come to mind are if there was some emergency technical issue, or if the users over time tell us they don't want it.)

  1. Are there any design-related decisions that aren't covered? One of the things I jotted down was "do users enable the reminder?" as a suggestion that it should be on by default, but there might be others?

We're not implementing the full designs in this first deployment, so the reminder feature won't be available. (Apologies for not making this clear.)

  1. Do you need to monitor this beyond the standard 90-day data retention period?

I don't believe so.

Thanks for answering my questions @Tchanders, it's good to get confirmations about those things! No worries that the reminder feature isn't available at launch, it's trivial to plan for it being added later.

Based on what we've been talking about, I went ahead and sketched out events for starting, ending, and extending sessions in the instrumentation spec spreadsheet. Copying them in here as well, since it's easier to have the conversation and decisions in one place:

  • User starts the session: action: session_start, action_context: { session_length: int (minutes), reminder: bool }
  • User ends the session: action: session_end, action_context: { actor: user }
  • Session expires: action: session_end, action_context: { actor: system }
  • Users extends the session: action: session_extended, action_context:

I was thinking that either the user ends the session by clicking "Turn off" or the system does something similar, and as an analyst I'd love to have specific events for those to make the calculations straightforward. But as always, I'm not a software engineer and am happy to figure out solutions!

Thanks @nettrom_WMF - looks good to me.

The expiry is the only one that doesn't line up with a specific line of code being called, at least at the moment of expiry.

However, we can capture after the fact that an auto-reveal session expired (rather than being switched off manually). There is a line of code that checks whether auto-reveal is on, and updates the database if the mode has expired - there's just no guarantee that this will be called very soon after expiry. So if we want to know at any given time how many sessions have expired, we'd have a lower limit (some might have expired but not been captured by our instrumentation yet).

(unassigning myself for now as I have a few other things I need to get to first.)

The expiry is the only one that doesn't line up with a specific line of code being called, at least at the moment of expiry.

However, we can capture after the fact that an auto-reveal session expired (rather than being switched off manually). There is a line of code that checks whether auto-reveal is on, and updates the database if the mode has expired - there's just no guarantee that this will be called very soon after expiry. So if we want to know at any given time how many sessions have expired, we'd have a lower limit (some might have expired but not been captured by our instrumentation yet).

Ah, it seems that I once again forgot that we're working on the web where everything's asynchronous requests without permanence and we have no guarantees about much, if anything :D

Jokes aside, I remembered that I based my notes on the design flow in T374869 (F58788199), where there is that little status box in the bottom right corner counting the session time down. Is that still part of the design, and if so, what happens to that box when the time expires?

If that's no longer part of the tool, then I think it's easier for us to not instrument session expiry and instead infer it based on the parameters from the session start plus any "+10m" clicks (for sessions where the user doesn't explicitly turn the feature off, of course).

Ah, it seems that I once again forgot that we're working on the web where everything's asynchronous requests without permanence and we have no guarantees about much, if anything :D

:)

Jokes aside, I remembered that I based my notes on the design flow in T374869 (F58788199), where there is that little status box in the bottom right corner counting the session time down. Is that still part of the design, and if so, what happens to that box when the time expires?

The bottom-right box is not part of the MVP first version, but we could instrument it if/when it's added later. It wouldn't be guaranteed to reach zero (e.g. if the user closes the page before expiry), so we'd also be working with a lower limit.

If that's no longer part of the tool, then I think it's easier for us to not instrument session expiry and instead infer it based on the parameters from the session start plus any "+10m" clicks (for sessions where the user doesn't explicitly turn the feature off, of course).

That sounds fine to me.

If that's no longer part of the tool, then I think it's easier for us to not instrument session expiry and instead infer it based on the parameters from the session start plus any "+10m" clicks (for sessions where the user doesn't explicitly turn the feature off, of course).

That sounds fine to me.

Great! I've updated the instrumentation spec to reflect that we don't track session expiry.

I made a couple of comments on this.

I had a go at filling this in, but it was the first time I've done this! @KColeman-WMF @nettrom_WMF - does that look OK?

Change #1155725 had a related patch set uploaded (by Tchanders; author: Tchanders):

[operations/mediawiki-config@master] WIP Configure event stream for IP auto-reveal instrument

https://gerrit.wikimedia.org/r/1155725

Change #1155743 had a related patch set uploaded (by Tchanders; author: Tchanders):

[mediawiki/extensions/CheckUser@master] WIP Add functinality for instrumenting IP auto-reveal

https://gerrit.wikimedia.org/r/1155743

Change #1155744 had a related patch set uploaded (by Tchanders; author: Tchanders):

[mediawiki/extensions/CheckUser@master] WIP Instrument IP auto-reveal interactions

https://gerrit.wikimedia.org/r/1155744

I'll take another look at the Measurement plan tomorrow when I have more time. In the meantime I wanted to expand on a comment I left in the Instrumentation plan where I suggested we use the funnel_name and funnel_entry_token fields in the schema.

In T387600#10818768, I mention the concept of a "session" and define it as what happens between activation and deactivation. There are (at least) two problems with this: 1) I mistakenly thought we'd have deactivation events, but we won't have that in many cases, and 2) this concept of "session" doesn't match any of the session-related contextual attributes we have (e.g. a user can have multiple IP Auto-reveal sessions within a browser session).

Having some kind of session identifier that matches the user experience in IP Auto-reveal will make calculating the various statistics we're interested in a whole lot easier, because it allows us to group together everything that happens within each session. The Metrics Platform design also prepares for this by having the concept of a funnel, and making ordering events easy by having a counter (ref interaction data).

The way I've been thinking about this is as follows:

  1. Whenever a user starts a session, it also starts a funnel. If a user is able to start a session multiple time on the same page view, they'll each need a unique funnel_entry_token.
  2. The events that occur within that session, e.g. if the user extends the session, are then connected to that session. This means that we'll be able to group these together using funnel_entry_token in the data.

I'm not sure if the Metrics Platform support for this is fully developed, but I know that having something like that will make analysis a whole lot easier than if I have to group things together by other values and figure out clever ways to split out the sessions.

Thanks for the pointer about funnels. I see that other instruments are using funnels, e.g. the ReportIncident extension, so I assume they're usable!

If a user is able to start a session multiple time on the same page view, they'll each need a unique funnel_entry_token.

(Taking "session" to mean a continuous time of auto-reveal mode being switched on...)

A user cannot start multiple sessions - even across different wikis - since the mode is governed by a global preference. Here's what's possible:

  • If a user starts a session, they can only extend or end the session
  • If a user extends a session, they can only extend it again or end it
  • If a user ends a session (or the session ends automatically by expiring), they can only start a new session

Given this, would it be sufficient to use the performer_id and time to group activity into sessions?

Ticking these off, having met with @nettrom_WMF and updated them accordingly

Tchanders renamed this task from IP Auto-reveal: Agree metrics and instrumentation plan to IP Auto-reveal: Agree and implement metrics and instrumentation plan.Jun 18 2025, 3:34 PM

For documentation purposes, here are the key topics we discussed and made decisions about:

  • Do we need a "session" concept (i.e. properly track the funnel)? We identified that user- and time-based measurements (e.g. total number of sessions started, average number of sessions started per user in a day) will provide meaningful answers. Being able to correctly measure average session length is lower priority. Decision: no need for a funnel token (or something similar), we kept/added the contextual attributes with session tokens and can use those as needed.
  • What will action_context look like for action = "session_start", do we need reminder to be present even if that feature isn't implemented? Morten has no strong opinion about the latter, but we did note that action_context is expected to be a JSON blob (and it's limited to 64 bytes).
  • Does usage of the feature span multiple wikis? Potentially yes, some functionaries might have global extended rights. Decision: added performer_name to the list of contextual attributes so that this can be measured as needed.
  • How frequently do we plan to monitor this data, and do we need to plan a decommission of the instrument? That's for TSP to decide, Morten doesn't have strong opinions.

Change #1155743 merged by jenkins-bot:

[mediawiki/extensions/CheckUser@master] Add functionality for instrumenting IP auto-reveal

https://gerrit.wikimedia.org/r/1155743

Change #1155725 merged by jenkins-bot:

[operations/mediawiki-config@master] Configure event stream for IP auto-reveal instrument

https://gerrit.wikimedia.org/r/1155725

Mentioned in SAL (#wikimedia-operations) [2025-06-23T21:01:54Z] <kharlan@deploy1003> Started scap sync-world: Backport for [[gerrit:1163004|Reapply "ores: Disable AbuseFilter integration by default" (T364705)]], [[gerrit:1155725|Configure event stream for IP auto-reveal instrument (T387600)]], [[gerrit:1160157|Reapply "Use GetSecurityLogContext hook for goodpass/badpass logging" (T395204)]]

Mentioned in SAL (#wikimedia-operations) [2025-06-23T21:04:28Z] <kharlan@deploy1003> kharlan, tgr, tchanders: Backport for [[gerrit:1163004|Reapply "ores: Disable AbuseFilter integration by default" (T364705)]], [[gerrit:1155725|Configure event stream for IP auto-reveal instrument (T387600)]], [[gerrit:1160157|Reapply "Use GetSecurityLogContext hook for goodpass/badpass logging" (T395204)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now

Mentioned in SAL (#wikimedia-operations) [2025-06-23T21:16:46Z] <kharlan@deploy1003> Finished scap sync-world: Backport for [[gerrit:1163004|Reapply "ores: Disable AbuseFilter integration by default" (T364705)]], [[gerrit:1155725|Configure event stream for IP auto-reveal instrument (T387600)]], [[gerrit:1160157|Reapply "Use GetSecurityLogContext hook for goodpass/badpass logging" (T395204)]] (duration: 14m 51s)

Change #1155744 merged by jenkins-bot:

[mediawiki/extensions/CheckUser@master] Instrument IP auto-reveal interactions

https://gerrit.wikimedia.org/r/1155744

Let's leave this open until we've confirmed data is being collected, from wmf.8 onwards.

I check event.mediawiki_product_metrics_checkuser_ip_auto_reveal_interaction and confirmed that there are events being recorded.

Change #1192533 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[operations/mediawiki-config@master] EventStreamConfig: Fix user-agent exclusion config

https://gerrit.wikimedia.org/r/1192533

Change #1192533 merged by jenkins-bot:

[operations/mediawiki-config@master] EventStreamConfig: Fix user-agent exclusion config

https://gerrit.wikimedia.org/r/1192533

Mentioned in SAL (#wikimedia-operations) [2025-10-09T07:35:43Z] <kharlan@deploy2002> Started scap sync-world: Backport for [[gerrit:1192533|EventStreamConfig: Fix user-agent exclusion config (T387600)]], [[gerrit:1194733|EventStreamConfig: fix IP auto reveal stream]]

Mentioned in SAL (#wikimedia-operations) [2025-10-09T07:40:47Z] <kharlan@deploy2002> kharlan, bearloga: Backport for [[gerrit:1192533|EventStreamConfig: Fix user-agent exclusion config (T387600)]], [[gerrit:1194733|EventStreamConfig: fix IP auto reveal stream]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2025-10-09T07:47:36Z] <kharlan@deploy2002> Finished scap sync-world: Backport for [[gerrit:1192533|EventStreamConfig: Fix user-agent exclusion config (T387600)]], [[gerrit:1194733|EventStreamConfig: fix IP auto reveal stream]] (duration: 11m 53s)