Page MenuHomePhabricator

Get Started notification is always displayed in 48 hours on testwiki
Closed, DeclinedPublic2 Estimated Story Points

Description

Reported by Elena in the comment below:

Checked on testwiki wmf.7 -a question about user_variant: "control"- this variant has the same timing for "Get started notification"? pinging @Urbanecm_WMF

(1) user 69264 (user_registration: 20250625001743') gets variant user_variant: 'get-started-notification'`

  • Get started notification (with updated text: "Curious about...") comes in 24 h

Screenshot 2025-06-27 at 10.34.10 AM.png (654×1 px, 107 KB)

(2) user 69263 (user_registration:20250624235207) gets variant user_variant: "control"

  • Get started notification (old text: "Try a suggested edit...") also comes in 24 h

Screenshot 2025-06-27 at 12.18.23 PM.png (580×1 px, 83 KB)

Event Timeline

To look into this, I first tried reproducing this locally. I do not have delayed job support, so I modified the job sending in the following way:

urbanecm@wmf3345 GrowthExperiments % git diff
diff --git a/includes/LevelingUp/LevelingUpManager.php b/includes/LevelingUp/LevelingUpManager.php
index eeb58f9c5..eea7125d6 100644
--- a/includes/LevelingUp/LevelingUpManager.php
+++ b/includes/LevelingUp/LevelingUpManager.php
@@ -396,8 +396,8 @@ class LevelingUpManager {
                $jobQueue = $this->jobQueueGroup->get( $jobName );
                if ( !$jobQueue->delayedJobsEnabled() ) {
                        $this->logger->error(
-                               'Failed to schedule {jobName} with a delay, delayed jobs are not supported',
-                               [ 'exception' => new \RuntimeException, 'jobName' => $jobName ],
+                               'Failed to schedule {jobName} with a delay of {delay}, delayed jobs are not supported',
+                               [ 'exception' => new \RuntimeException, 'jobName' => $jobName, 'delay' => $delay ],
                        );
                        return false;
                }
urbanecm@wmf3345 GrowthExperiments %

to see the delay when I create an user. The logs were as follows:

[GrowthExperiments] Failed to schedule notificationKeepGoingJob with a delay of 172800, delayed jobs are not supported
[GrowthExperiments] Failed to schedule notificationGetStartedJob with a delay of 172800, delayed jobs are not supported
================================================================================================
[GrowthExperiments] Failed to schedule notificationKeepGoingJob with a delay of 172800, delayed jobs are not supported
[GrowthExperiments] Failed to schedule notificationGetStartedJob with a delay of 72000, delayed jobs are not supported

First two logs were for an account with the control group, the other two logs for a treatment group member. So, this seems to work fine locally.

Then, I decided to verify what the actual job event sent by MediaWiki was, which is available in Kafka. Fortunately, Kafka has a little bit of backscroll and is capable of sending historical data, so I was able to retrieve both events:

{
  "$schema": "/mediawiki/job/1.0.0",
  "meta": {
    "uri": "https://placeholder.invalid/wiki/Special:Badtitle",
    "request_id": "7a51b479-32d9-4f44-9e3c-b2ab07c57c04",
    "id": "3ba5d9ff-948f-45d8-8f82-1ad5bd70f9e7",
    "dt": "2025-06-25T00:17:44Z",
    "domain": "test.wikipedia.org",
    "stream": "mediawiki.job.notificationGetStartedJob"
  },
  "database": "testwiki",
  "type": "notificationGetStartedJob",
  "delay_until": "2025-06-25T20:17:43Z",
  "params": {
    "userId": 69264,
    "jobReleaseTimestamp": 1750882663,
    "requestId": "7a51b479-32d9-4f44-9e3c-b2ab07c57c04"
  },
  "mediawiki_signature": "redacted"
}
{
  "$schema": "/mediawiki/job/1.0.0",
  "meta": {
    "uri": "https://placeholder.invalid/wiki/Special:Badtitle",
    "request_id": "abfeec79-567c-4812-bfef-3bb3ed5e8638",
    "id": "b2db7085-13cd-44ab-b53d-6fd538a9971c",
    "dt": "2025-06-24T23:52:08Z",
    "domain": "test.wikipedia.org",
    "stream": "mediawiki.job.notificationGetStartedJob"
  },
  "database": "testwiki",
  "type": "notificationGetStartedJob",
  "delay_until": "2025-06-26T23:52:07Z",
  "params": {
    "userId": 69263,
    "jobReleaseTimestamp": 1750981927,
    "requestId": "abfeec79-567c-4812-bfef-3bb3ed5e8638"
  },
  "mediawiki_signature": "redacted"
}

The 69264 account is in the treatment group, meaning it should've received the notification in ~20 hours. The event itself was sent at 2025-06-25T00:17:44Z, with a delay_until set to 2025-06-25T20:17:43Z (exactly 20 hours after). This matches the expectation.

The 69263 account is in the control group, meaning it should've received the notification in ~48 hours. The event itself was sent at 2025-06-24T23:52:08Z, with a delay_until set to 2025-06-26T23:52:07Z (exactly 48 hours after). This matches the expectation.

Then, I went to verify when the job actually ran via JobExecutor's logs.

name=JobExecutor, 69263 logs
2025-06-26 23:52:07.028747 [abfeec79-567c-4812-bfef-3bb3ed5e8638] mw-jobrunner.eqiad.main-74f489b554-rk98t testwiki 1.45.0-wmf.7 JobExecutor DEBUG: Beginning job execution {"job":"notificationGetStartedJob Special: userId=69263 jobReleaseTimestamp=1750981927 requestId=abfeec79-567c-4812-bfef-3bb3ed5e8638","job_type":"notificationGetStartedJob"} 
2025-06-26 23:52:07.336801 [abfeec79-567c-4812-bfef-3bb3ed5e8638] mw-jobrunner.eqiad.main-74f489b554-rk98t testwiki 1.45.0-wmf.7 JobExecutor INFO: Finished job execution {"job":"notificationGetStartedJob Special: userId=69263 jobReleaseTimestamp=1750981927 requestId=abfeec79-567c-4812-bfef-3bb3ed5e8638","job_type":"notificationGetStartedJob","job_status":true,"job_duration":0.3108968734741211,"context.job_type":"notificationGetStartedJob"}

So far, I cannot locate a JobExecutor entry for the other user. That being said, the job event suggests the event should be sent at the right time.

Okay, JobExecutor route didn't give me the timestamp, but if the notification was sent at all, the time should be in the database. Let's query it:

mysql:research@dbstore1009.eqiad.wmnet [testwiki]> select event_id, event_type, notification_user, notification_timestamp, notification_read_timestamp from echo_event join echo_notification on notification_event=event_id where event_type='get-st
arted' and event_agent_id in (69264, 69263);
+----------+-------------+-------------------+------------------------+-----------------------------+
| event_id | event_type  | notification_user | notification_timestamp | notification_read_timestamp |
+----------+-------------+-------------------+------------------------+-----------------------------+
|   138038 | get-started |             69263 | 20250626235207         | NULL                        |
|   138041 | get-started |             69264 | 20250627001314         | 20250627175122              |
+----------+-------------+-------------------+------------------------+-----------------------------+
2 rows in set (0.006 sec)

mysql:research@dbstore1009.eqiad.wmnet [testwiki]>

mysql:research@dbstore1007.eqiad.wmnet [testwiki]> select user_id, user_name, user_registration from user where user_id in (69264, 69263);
+---------+--------------------+-------------------+
| user_id | user_name          | user_registration |
+---------+--------------------+-------------------+
|   69263 | Vincatest24june02  | 20250624235207    |
|   69264 | Zilanttest24june01 | 20250625001743    |
+---------+--------------------+-------------------+
2 rows in set (0.001 sec)

mysql:research@dbstore1007.eqiad.wmnet [testwiki]>

In here, the 69263 (control) account received the notification on 2025-06-26T23:52:07 (registered on 2025-06-24T23:52:07, so the notification arrived exactly 48 hours after, as expected). The 69264 (treatment) account received the notification on 2025-06-27T00:13:14, while the account registered on 2025-06-25T00:17:43 (which is...nearly two days after registration).

So, there seems to be a mismatch between the requested timestamp (2025-06-25T20:17:43Z) and when the notification actually arrived. Not great, but I do not observe anything like "notification always arrives in 24 hours", as @Etonkovidova's comment indicates.

Okay, JobExecutor route didn't give me the timestamp, but if the notification was sent at all, the time should be in the database. Let's query it:

mysql:research@dbstore1009.eqiad.wmnet [testwiki]> select event_id, event_type, notification_user, notification_timestamp, notification_read_timestamp from echo_event join echo_notification on notification_event=event_id where event_type='get-st
arted' and event_agent_id in (69264, 69263);
+----------+-------------+-------------------+------------------------+-----------------------------+
| event_id | event_type  | notification_user | notification_timestamp | notification_read_timestamp |
+----------+-------------+-------------------+------------------------+-----------------------------+
|   138038 | get-started |             69263 | 20250626235207         | NULL                        |
|   138041 | get-started |             69264 | 20250627001314         | 20250627175122              |
+----------+-------------+-------------------+------------------------+-----------------------------+
2 rows in set (0.006 sec)

mysql:research@dbstore1009.eqiad.wmnet [testwiki]>

mysql:research@dbstore1007.eqiad.wmnet [testwiki]> select user_id, user_name, user_registration from user where user_id in (69264, 69263);
+---------+--------------------+-------------------+
| user_id | user_name          | user_registration |
+---------+--------------------+-------------------+
|   69263 | Vincatest24june02  | 20250624235207    |
|   69264 | Zilanttest24june01 | 20250625001743    |
+---------+--------------------+-------------------+
2 rows in set (0.001 sec)

mysql:research@dbstore1007.eqiad.wmnet [testwiki]>

In here, the 69263 (control) account received the notification on 2025-06-26T23:52:07 (registered on 2025-06-24T23:52:07, so the notification arrived exactly 48 hours after, as expected). The 69264 (treatment) account received the notification on 2025-06-27T00:13:14, while the account registered on 2025-06-25T00:17:43 (which is...nearly two days after registration).

So, there seems to be a mismatch between the requested timestamp (2025-06-25T20:17:43Z) and when the notification actually arrived. Not great, but I do not observe anything like "notification always arrives in 24 hours", as @Etonkovidova's comment indicates.

Thank you, @Urbanecm_WMF - that's I just realized when I was re-checking. Exactly that: not 24-hours, but 48h. I don't have access to testwiki db, so I was calculating notification received time based only on the timestamp in my test user echo UI and taking into account the user's timezone (apparently, I subtracted 7hours instead of adding it or something like that) .
So, the issue is - the notifications for the treatment user group should come earlier, right?

I tried sending another notification for the treatment group user:

{
  "$schema": "/mediawiki/job/1.0.0",
  "meta": {
    "uri": "https://placeholder.invalid/wiki/Special:Badtitle",
    "request_id": "74716060566ab88f9c300b02",            
    "id": "65abaf73-91c2-4d0f-912e-25b102861748",
    "dt": "2025-07-02T19:29:37Z",
    "domain": "test.wikipedia.org",
    "stream": "mediawiki.job.notificationGetStartedJob"
  },
  "database": "testwiki",
  "type": "notificationGetStartedJob",
  "delay_until": "2025-07-03T15:29:37Z",
  "params": {
    "userId": 69264, 
    "jobReleaseTimestamp": 1751556577,
    "requestId": "74716060566ab88f9c300b02"            
  },
  "mediawiki_signature": "redacted"
}

Let's see what happens.

Created a user with "get-started-notification":

user_id: 69390
user_name:  Vincatest02july01
user_regirstration: 20250703004345
Michael renamed this task from Get Started notification is always displayed in 24 hours on testwiki to Get Started notification is always displayed in 48 hours on testwiki.Jul 4 2025, 10:26 AM

I'm curious to learn how these new attempts turned out. The data looks not good: https://grafana.wikimedia.org/d/e27a78e9-c6a2-4bf2-ad52-6c653821686d/delayed-notification-information? (Though I'm not sure how meaningful those panels are. Gauge metrics still seem a bit tricky...)

KStoller-WMF set the point value for this task to 2.Jul 8 2025, 4:44 PM
KStoller-WMF subscribed.

Timebox ~2 days, let's check in at that point if we don't have a solution.

Urbanecm_WMF renamed this task from Get Started notification is always displayed in 48 hours on testwiki to Get Started notification is always displayed in 24 hours on testwiki.Jul 9 2025, 2:06 PM
Urbanecm_WMF renamed this task from Get Started notification is always displayed in 24 hours on testwiki to Get Started notification is always displayed in 48 hours on testwiki.

@KStoller-WMF decided to proceed with copy changes instead. Kirsten will update Phab to track that work.