Page MenuHomePhabricator

Daily Reports for the Thank You 2019 Campaign
Closed, ResolvedPublic

Description

Prepare

  • data acquisition, and
  • daily reporting

procedures for the Thank You 2019 campaign in line with https://phabricator.wikimedia.org/T210832.

Campaign start: 2019/01/02
Campaign end: 2019/01/16

Event Timeline

@GoranSMilovanovic Hi Goran, just a quick information on the timeline for the tracking. Kai is ill right now and can build the banner on December 27. I am on holidays but Verena will do a test registration to see if the tracking works on December 28 and it would be great if you could do a check if you received the test registration and all necessary data on the evening of December 28. Please let me know, if this works for you! Could you also reserve some time on December 29 just in case something does not work right away and we need to do a second test run? This would be great!
@Verena

@Christine_Domgoergen_WMDE please see: T210832#4844737 - seems like everything is in place. Thanks @kai.nissen for feedback on T210832#4844737

@Christine_Domgoergen_WMDE No worries, I will be available continuously until the campaign onset. If we need to test tomorrow again (or any other time) just ping me here. Thanks.

@Christine_Domgoergen_WMDE @kai.nissen @Verena @Stefan_Schneider_WMDE

IMPORTANT On January 1st 2019, there are no WMDE_2019_thx banners observed under the /beacon/impression path for either de.wikipedia.org or de.m.wikipedia.org hosts in the pageviews table.

Did the campaign start on January 1st 2019 or not?

Sorry, the campaign did not start, yet. The are some implementation issues, but I'm confident that we can start this today.

The campaign started yesterday at 14:15 UTC. You should now be able to retrieve banner impression count data.

@kai.nissen I'm on it. Reporting back as soon as I have the data.

  • The daily reporting for the Thank You 2019 campaign has started.
  • You will be able to follow the intake of daily aggregated data in this spreadsheet.
  • A full Report will be delivered as soon as the Campaign finishes.
  • NOTE: 41 users registered on Day 1 (01/02/2019) - a good start indeed!
  • 03. January 2019 update is ready.
  • 69 new editors registered on the 2nd day of the campaign.
  • 04. January 2019 update is ready.
  • 52 new editors registered on the 2nd day of the campaign.
  • 05. January 2019. update is read.
  • 43 registrations.
  • 06. January 2019. update is ready;
  • 41 registration.
  • 07. January 2019. update is ready;
  • 47 registrations.

@GoranSMilovanovic Thank you very much for your work Goran! This looks really good indeed :-)

@Christine_Domgoergen_WMDE I would also say it looks good. I am currently running the data acquisition procedures for 08. January and reporting back as soon as I have the update ready.

  • 08. January 2019. update is ready;
  • 52 registrations.
  • 09. January 2019. update is ready;
  • 43 users registered;
  • 10. January update running now.
  • 10. January 2019. update is ready;
  • 51 registrations.
  • 11. january 2018. update is ready;
  • 33 registrations.
  • 12. January 2018. update is ready;
  • 23 registrations.
  • 13. January 2019. update is ready;
  • 20 registrations.

@kai.nissen @Christine_Domgoergen_WMDE @Stefan_Schneider_WMDE @Verena Please let me know when does the campaign end. Thanks.

@GoranSMilovanovic Hi Goran, thank you very much for your updates! Do you have an update on the numbers for 14., 15. January? The campaign will probably end today. Thank you!

  • 14. January 2019. update is ready;
  • 33 registrations.
  • 15. January 2019. update is ready.
  • 36 registrations.

@Christine_Domgoergen_WMDE Please make sure to ping me here and let me know if the campaign definitely ends today. If that is the case, I will start preparing the Final Report tomorrow as soon as the today's update is ready. Thank you!

@GoranSMilovanovic Hi Goran, thank you for the updates! The campaign will definitely end tonight at midnight.

  • 16. January 2018. update is ready;
  • 31 user registrations.

This is the final daily update for the Thank You 2019 WMDE Banner Campaign.
The final report will be prepared and shared in the beginning of the next week (see also T214042).

@GoranSMilovanovic Great, thank you so much! Looking forward to the report :-) Could you also schedule to give us an update on the edit counts of the new users one and two weeks after the end of the campaign (so all the edits until midnight January, 23 and 30 respectively)? It would be interesting to see if they continue editing after the campaign or maybe start editing some time after the registration of the user account.

@Christine_Domgoergen_WMDE Of course. Let's keep this ticket open and encompass the final report beyond the daily ones here.

@Christine_Domgoergen_WMDE Here is the preliminary campaign report. Findings on user edits and training modules will be included on January 24th. The final report will be delivered immediately after January 30th to encompass all user edits two weeks following the end of the campaign.

@Christine_Domgoergen_WMDE Here is the report with user edits and training modules data included.

Next report will be delivered on 31. January. Please keep the ticket opened until then.

@GoranSMilovanovic Hi Goran, thank you for the report! I will keep the ticket open. I have two questions:

  • When I sum up all banner impressions from 1.1.1 Banner Impressions Overview: Table the result is 168.540.386, when I sum up all impressions in 1.1.2 Total Banner Impressions (split by device) the result is 168.540.422. Could you check, why the numbers differ here?
  • in Table 4.2.4 User edits and Table 4.2.5 User edits: does "not completed" also include users who just started but didn't finish a training module or does it just count users who did not start a module?

Thank you!

@Christine_Domgoergen_WMDE

When I sum up all banner impressions from 1.1.1 Banner Impressions Overview: Table the result is 168.540.386, when I sum up all impressions in 1.1.2 Total Banner Impressions (split by device) the result is 168.540.422. Could you check, why the numbers differ here?

The correct number is: 168,540,386 - the first one that you have reported. Both tables have the same total number of impressions reported (checked by re-running the report code). Did you add up the number correctly?

in Table 4.2.4 User edits and Table 4.2.5 User edits: does "not completed" also include users who just started but didn't finish a training module or does it just count users who did not start a module?

"Not completed" in this table includes users who have started but not finished a training module.

@GoranSMilovanovic Hi Goran, thanks for the reply!

The correct number is: 168,540,386 - the first one that you have reported. Both tables have the same total number of impressions reported (checked by re-running the report code). Did you add up the number correctly?

I hope spreadsheets did:
Desktop_ctrl 32.599.984
Desktop_var 32.701.749
Ipad_ctrl 3.983.621
Ipad_var 4.005.107
mobile_ctrl 47.623.662
mobile_var 47.626.299
sum 168.540.422

"Not completed" in this table includes users who have started but not finished a training module.

Okay, thank you! Could you add another line in both tables with the number of users who did not start or finish a training module for comparison?

  • Prioritizing this task again;
  • all requests and issues reported in T211690#4918152 will be taken into consideration in the updated campaign report (edits until Jan 30);
  • reporting back as soon as I have the updates.

@Christine_Domgoergen_WMDE The updated campaign report is ready:

  • User edits are now tracked up to 30 Jan 2019;
  • Tables 4.2.4 and 4.2.5 now have three rows both, representing
    • users who have started but not finished a training by Started,
    • users who completed a training by Completed, and
    • users who did not start a training by No training.

As of T211690#4918152: trust the numbers in this report, they are correct. I did not check the spreadsheets yet, but the error is certainly there and not in the dataset upon which this report is based.
Spreadsheets are produced "manually" (i.e. me doing copy+paste from some intermediate tables), so I cannot provide a full consistency check there; however, with the report dataset I can and I did. The numbers in the report are correct (and add up nicely, as expected).

Please let me know if there is anything else that you need. Thank you for your patience.

@GoranSMilovanovic Hi Goran, thank you for the report and for adding the extra row! I will have a closer look on Monday and get back to you if I have more questions, let's keep the ticket open till then.

@GoranSMilovanovic Hi Goran, I have a question about the table 3.1 User edits: daily: the days 2019-01-19, 2019-01-20, 2019-01-27 and 2019-01-29 seem to be missing, could you have a look? Thank you!

Also it would be interesting to know when the new users edit: is there a possibility to calculate in which time interval new users are active, e.g. directly after registration, or some days later etc.? On the x-axis would be time after registration in days, with 0 being point of registration and on the y-axis number of edits. The diagram would then show average edits per user x-days after registration. Could this work?

@GoranSMilovanovic Hi Goran, I have a question regarding the data of the training modules: Do you see in the tables from Ragesoss how many people did or begin the modules in general - regardless of the fact they register or not. Question behind the question: I would like to know how many people potentially just did the module, but didn't register beforhand.

Thx in advance!

I am getting back to this analysis during the day.

@Christine_Domgoergen_WMDE

I have a question about the table 3.1 User edits: daily: the days 2019-01-19, 2019-01-20, 2019-01-27 and 2019-01-29 seem to be missing, could you have a look? Thank you!

There were zero edits on those days - I did not pay attention to that fact. The chart is updated to reflect that fact.

Also it would be interesting to know when the new users edit: is there a possibility to calculate in which time interval new users are active, e.g. directly after registration, or some days later etc.? On the x-axis would be time after registration in days, with 0 being point of registration and on the y-axis number of edits. The diagram would then show average edits per user x-days after registration. Could this work?

See section 3.2 User edits: when do the edits happen? in the updated Report (attached).

@Stefan_Schneider_WMDE

Do you see in the tables from Ragesoss how many people did or begin the modules in general - regardless of the fact they register or not. Question behind the question: I would like to know how many people potentially just did the module, but didn't register beforhand.

Please take a look at the very end of the Training Modules section of the Report - your answer is there.

@GoranSMilovanovic Thank You! that was fast. To be clear here: So 387 people in total took the trainings, the ones tracked and registered included?

@Stefan_Schneider_WMDE :) That was not really fast, Stefan, thank you for you kind words but @Christine_Domgoergen_WMDE posed her questions day ago. But I really had to prioritize some WD related things.

So 387 people in total took the trainings, the ones tracked and registered included?

To be precise: 387 is the number of unique usernames found in the Training Modules dataset minus 2 (yourself and the data set maintainer).
So, yes, I would say that is how many people took any training at all irrespective of whether they did or did not register with us.

@GoranSMilovanovic Hi Goran, thank you for the updated report. I have a question concerning table 3.2. The numbers seem a bit unrealistic to me: do users really make 11 edits 22 days after registration - on average? Could you tell me what is exactly the algorithm behind the chart? Thank you!

@Christine_Domgoergen_WMDE There was only user making edits on 2019-01-25 and that user have made 11 edits then; the user registered on 2019-01-03, and that would be a difference of 22 days.

The algorithm (R):

# - load user registrations per day, per user
userRegistrationsDaily <- fread("_analytics/fullRegistrationDataset.csv")
userRegistrationsDaily <- select(userRegistrationsDaily, 
                                 event_userId, date)
editsSinceReg <- userRegistrationsDaily 
editsSinceReg <- left_join(editsSinceReg, 
                           userEdits, 
                           by = c("event_userId" = "user_id")) %>% 
  arrange(event_userId)
colnames(editsSinceReg) <- c("user_id", "registration", "edit")
editsSinceReg <- filter(editsSinceReg, !is.na(edit))
editsSinceReg$registration <- as.POSIXct(editsSinceReg$registration)
editsSinceReg$edit <- as.POSIXct(editsSinceReg$edit)
editsSinceReg$diff <- (editsSinceReg$edit - editsSinceReg$registration)/(60^2*24)
avgEditsSinceReg <- editsSinceReg %>% 
  select(diff, user_id) %>% 
  group_by(diff, user_id) %>% 
  summarise(edits = n()) %>% 
  select(diff, edits) %>% 
  group_by(diff) %>% 
  summarise(meanEdits = mean(edits))
avgEditsSinceReg$diff <- as.numeric(avgEditsSinceReg$diff)
avgEditsFrame <- data.frame(diff = seq(min(avgEditsSinceReg$diff), max(avgEditsSinceReg$diff)))
avgEditsFrame <- left_join(avgEditsFrame, avgEditsSinceReg, by = "diff")
avgEditsFrame$meanEdits[is.na(avgEditsFrame$meanEdits)] <- 0
avgEditsFrame$meanEdits <- round(avgEditsFrame$meanEdits, 2)
ggplot(avgEditsFrame, aes(x = diff, 
                    y = meanEdits, 
                    label = meanEdits)) + 
  geom_path(color = "blue", size = .25) + 
  geom_point(color = "blue", size = 1.5) + 
  geom_point(color = "white", size = 1) + 
  ggtitle('Thank You 2919: Average number of edits N days after registration') + 
  xlab("Days after registration") + ylab("Mean edits per user") + 
  theme_minimal() + 
  geom_text_repel(size = 3.5, show.legend = FALSE) + 
  scale_y_continuous(labels = comma) +
  theme(axis.text.x = element_text(angle = 0, size = 8)) +
  theme(plot.title = element_text(size = 10)) +
  theme(legend.title = element_blank()) + 
  theme(legend.position = "right")

and you can always find all my code in the notebooks (Reports) that I share with you.

@GoranSMilovanovic Hi Goran, thank you for your reply. Thanks also for the algorithm, which doesn't help me to understand the chart though ;-) So, the mean number of edits should show the average number of edits of all users, who edited after registration. If there is only one user doing 11 edits a certain day we would need to also consider all the 0 edits which happened that day. Does that make sense to you?

@Christine_Domgoergen_WMDE From your formulation

So, the mean number of edits should show the average number of edits of all users, who edited after registration. If there is only one user doing 11 edits a certain day we would need to also consider all the 0 edits which happened that day. Does that make sense to you?

I understand what you want now, but you also need to be more precise in the formulation of your questions. Your first formulation was

The diagram would then show average edits per user x-days after registration.

and that is what the chart shows: on some day N, there was one user editing, who made 11 edits on day N, so the mean on day N is: 11/1 = 11.

Ok, I am now on to produce the data and the chart that you are asking for, this time including the count of all campaign registered users until day N - even if they've made zero edits on day N.

The question here is what information makes more sense for campaign analytics:

a. Total number of edits made on day N / Total number of campaign registered users until day N, or
b. Total number of edits made on day N / Total number of users editing on day N.

I will keep both charts in the Report.

@Christine_Domgoergen_WMDE Chart 3. 2. 2 in the updated Reported shows the data as per your request.

@GoranSMilovanovic Great, this was fast, thank you! Could we make one last adjustment, please? In Chart 3.2.1 could we change the table so that it not divides by all users but by users who edited? The description would change like this:

"Note. The chart presents the mean number of edits calculated from the total number of edits per day divided by the number of active users (i.e. by how many users edited) until that day."

@Christine_Domgoergen_WMDE Chart 3.2.3 provides an answer to your latest question.

@GoranSMilovanovic Thank you very much, this is very interesting! We can close the ticket, thanks for your help during the campaign and your patience with all the tracking questions :-)