Page MenuHomePhabricator

Campaign report
Closed, ResolvedPublic


Reporting should start on 2nd January 2018. The report must include any activity from the beginning of the campaign. The estimated start will be 1st January 2018.

Event Timeline

@Verena @Stefan_Schneider_WMDE

Please allow me to edit because otherwise I cannot insert a drawing there, and I need to place a sketch of a research design for a campaign and ask for you feedback on it. Thanks.

@Stefan_Schneider_WMDE Thanks. So, I will provide a full information flow chart for this campaign and then list all the critical steps for data collection. Then you can give me feedback and let me know if I am getting everything right. The moment we agree that our understandings of what needs to be done fall in line, I will start developing the analytics code, so that everything is ready well in advance.

@Verena @Stefan_Schneider_WMDE @Addshore

Please let me summarize and check out the following points as correct or not:

  • 1. Campaign tag: ?campaign=wmde_etc2017_bt1 (tracking of the banner impressions); tested from the wmf.webrequest table and found no data for December 21 and 22.
  • 2. Campaign landing page:
  • 3. Landing page view = banner click (meaning: every banner click inevitably leads to the campaign landing page)
  • 4. Guided tours: event_tour=diskutieren and event_tour=seimutig; already tested from log.GuidedTourExited_8690566 and found some test data for both campaigns;
  • 5. Training module data: @Ragesoss will provide a data set from the training module once the campaign ends;
  • 6. The campaign addresses both new users (those who visit but yet have to register) and the existing users (those that need only log in and get some work done)?


  1. correct
  2. correct
  3. correct
  4. correct
  5. correct
  6. correct

@Stefan_Schneider_WMDE Thank you!

Q. What is the name of the campaign training module: wikipedia-basiswissen, or editieren-basiswissen, or artikel-bewerten, or maybe something else?


  • please ping when any test banner presentations and clicks are made
  • please ping when the campaign landing page is ready.

Essentially, I will simply modify the Autumn Banner Campaign 2017 code for this, but testing proved to be of essential importance last time, so I suggest we have some testing data for this campaign too. Especially because it is beginning around January 1st it would be good to have everything 100% ready before its onset. Thanks.


You can see the names here in this file:
I guess for tracking the slugs are the relevant data.


  • there will be no test data this time, as fundraising is coordinating the banner creation and we are very limited in resources this time.
  • a first draft of the landing page was just created. It will be approved and finalized next week (just little fixes, but in general it's done). ( [campaign tag excluded])

@Stefan_Schneider_WMDE Thank you. I think that I now have everything that is needed to develop the analytics code, and I'm getting back in touch as soon as I'm done with it.

@Stefan_Schneider_WMDE @Verena

Since we don't have any test data for the campaign banner this time, it is essential to let me know the start date of the campaign as soon as possible.

That should give me enough time to make sure that everything is in place for the regular campaign reports.

Thank you.

@Stefan_Schneider_WMDE @Verena

The R script for campaign data acquisition is ready.

I cannot guarantee that everything is perfect there because we have no test data, but the code will be tested as soon as the campaign starts, and certainly no dramatic changes - if any at all - will need to be introduced.

Thank You Goran for your preperations. I don't really know when the campaign starts. Fundraising is deciding on that point.

Hi Kai,
In that case I have two questions:

  • Do you know when the Thank-You campaign starts?
  • Is the banner for the thank you campaign ready? Maybe I could create some test data with that banner before the campaign starts?

Thx in advance for your help!

  • Do you know when the Thank-You campaign starts?

The campaign will start on New Year's Eve.

  • Is the banner for the thank you campaign ready? Maybe I could create some test data with that banner before the campaign starts?

It is not ready, yet. If it helps, I can create a dummy banner.

I'll just point out here that T182794 is still not done.
Hopefully I will do this in the next 2/3 days during EU evening / US morning / afternoon.

@Addshore Just to remind you that I was already able to fetch some data on this campaign's Guided Tours from log.GuidedTourExited_8690566 - I do not know if this detail helps you in your work, but I thought it would be good to let you know.

Very good - Let's have a dummy to create some test data.

@Addshore Just to remind you that I was already able to fetch some data on this campaign's Guided Tours from log.GuidedTourExited_8690566 - I do not know if this detail helps you in your work, but I thought it would be good to let you know.

Yup, that is fine, the hack has nothing to do with guided tours this time, the hack is to enable tracking the registrations.


I will not be resurrecting T182797 again, but the problem that I have addressed there is that there are no banner impression data: HiveQL, wmf.webrequest table, not MariaDB and the log database on analytics-slave where you have demonstrated that the data are tracked indeed.

@Stefan_Schneider_WMDE @Verena So, we don't have the banner impression data for this campaign. I will start reporting on everything else (pageviews, user registrations and edits, guided tours, and the training module) soon.

Ahh okay, T182797 does not relate to banner impressions at all.

@kai.nissen would know more about this I believe.

@Verena @Stefan_Schneider_WMDE

Here's the first daily report on the Thank you 2018 campaign.

Hi Goran,
thank you very much! The numbers are what we expected. The patch was deployed in the afternoon of the 1st, the campaign started at 5.30 pm. Is there any data for the 1st?

Hey @Verena

your update, w. 1. January 2018 data included.

@Verena @Stefan_Schneider_WMDE your most recent update is here.

Q. Did we find out anything about the banner impressions data?

@Verena @Stefan_Schneider_WMDE A fresh update is ready.

@Verena @Stefan_Schneider_WMDE Update on the Thank You 2018. campaign.

@Verena @Stefan_Schneider_WMDE

Update, 5. January, approx. 11:30 CET.

@Verena @Stefan_Schneider_WMDE Update, including the banner impressions data.

@Verena @Stefan_Schneider_WMDE Due to a bug in my R code, the data on Guided Tours were not included in the previous reports (more precisely, it was always reported that there are not data on the Guided Tours).

However, there are some data on the Guided Tours for this campaign, and they are included in the present update. Only four registered users have started the Guided Tour at all...

@GoranSMilovanovic: Thx Goran! Four people doing the guided tours is already a lot if you take into account, that the guided tours are embedded in the trainings. You have to do half of the third training to get to the guided tours. So I think that's really good!

However in the report I cannot see how many people did the guided tour. Is it possible to see the events of the guided tour per person? Or did 11 people see the guided tour and exited at event-step?

@Stefan_Schneider_WMDE At this point the event_userId field is simply removed from the Report for reasons of data privacy protection. I will now produce an anonymized version of the data set where you will be able to see individual paths through the Guided Tours. Getting back to you ASAP.

@Stefan_Schneider_WMDE No problem. I'm currently running an update to collect all January 7th data and then I will send the update.

Q: Do you know when does the campaign end?

@GoranSMilovanovic The Fundraising team decides on the data they get every day so there is no final end date. I will get back to you as soon the end is known for sure.

@Stefan_Schneider_WMDE @Verena Update, complete data up to 7. January. Currently running an update to include the new 8. January data.

Stefan: you will find an overview of the number of registered users taking Guided Tours per tour at the bottom of the report. All user IDs are now anonymized, but I also keep the real Ids internally in order to be able to match against the Training Module data once the campaign ends.

@Verena @Stefan_Schneider_WMDE The update as of 8. January, approx. 20:40 CET.

@Verena @Stefan_Schneider_WMDE Fresh update, 9. January approx. 14:30 CET. Three registered users today.

@Verena @Stefan_Schneider_WMDE Update, early CET hours 10. January.

@Verena @Stefan_Schneider_WMDE Fresh update. Please let me know: did the Campaign end, or am I doing something wrong since I have no data for 11 January? Thanks.

@Verena @Stefan_Schneider_WMDE

Approx. 21:00 CET, 11 January, update.

Approx. 16:00 CET, 14. January, Campaign Report Update.

Note: no new user registrations since 10. January.

Thank you.

If we could have the final report - depending on the availability of the
training data of course - by next Monday that would be totally sufficient.



Thank you.

If we could have the final report - depending on the availability of the training data of course - by next Monday that would be totally sufficient.

@Verena @Stefan_Schneider_WMDE

Here's the last daily update for the Thank You 2018 campaign. Turns out that there were some user registrations after January 14 which were previously unaccounted for due to an I/O error.

As soon as the data on the training module become available, I will incorporate them into the Report and address all campaign research questions in the final version.

@Verena @Stefan_Schneider_WMDE

Here's the Thank You 2018 Campaign Report with the data on Training Modules included.

Please take into your consideration the following: in order to answer to your "Is there a difference in users with completed and not-completed or not at all having taken the modules?", you need to let me know what are the last slides per Training Module, so that I can differentiate between the users who did and who did not complete the Training Module. The present analysis addresses only the differences between the users who have *started or not the Training Module.

I haven't performed any statistical hypothesis testing (i.e. testing whether the difference in the number of edits between those who have vs. those haven't took the Training Module is statistically significant) because (a) the number of data points is rather low, and (b) the numbers seem telling prima facie. @Jan_Dittrich However, if you insist, I will include the tests.

Also, any additional analytics that you might need - just let me know.

@GoranSMilovanovic Thank you for the prompt report! The slides and their order you can find in the JSON of the modules
below Wikipedia Editing (German) ( Does that answer your question of the order?

Looks like almost all people in the editing course nearly completed the course - that's quite good!

It would be interesting to see how much edits the people did, who did a training or even better - who did the editing training. Is It possible to create this information?

@Stefan_Schneider_WMDE Thanks!

  • If the order of slides in JSON is the real order in which they were presented, then yes, that's the information that I've been looking for.

It would be interesting to see how much edits the people did, who did a training or even better - who did the editing training. Is It possible to create this information?

I can't see what would prevent us from finding out. Please, give me just a few hours and I will get back to you on this - I am in a bit of rush these days. Ok?

@Stefan_Schneider_WMDE @Verena New sub-sections under Section 5 Training Module of the Report are addressing your most recent questions. Please let me know if you need any additional analyses. Thanks.

@GoranSMilovanovic Looks good. And very good, that we already have 6 people who have more than 10 edits.

Nevertheless I'm a bit confused. How do you define "TRUE" in terms of the completion of a training. If I see it right only the 7 people in editieren-basiswissen completed the training and did 36 edits (see table row last_slide_completed). In the table before (completed_training --> column TRUE) there are only 13 edits. Why do these numbers differ?

Another question: Is it possible to have a closer look on the 6 people who did the 10th edit: Which trainings did they made until which slide? If this is too complicated maybe just until when they did the editieren-basiswissn-module.


The table that you refer to (where completed_training can be TRUE/FALSE) is the described in the immediately preceding paragraph in the following way:

Let’s now take a look at the number of users who (a) made any edits at all, or(b) have reached their 10th edit, in these two groups:

It's the column names that have cause confusion (my bad, sorry). The column Edits in that table refers to the number of users who have edited at all, while column Edits10 refers to the number of user who have reached their 10th edit.

Let me know if this helps.

@GoranSMilovanovic I have another few questions regarding the report:

  • How do I find out the total number of users per module? Is it just the sum of the numbers of exitpoints? For Example the Edits of artikel-bewerten: Did 8 users start the artikel-bewerten module?
artikel-bewerten --- fertig ---------------------- 6
artikel-bewerten --- artikel-qualitat-bewerten --- 1
artikel-bewerten --- artikel-qualitat-quiz ------- 1
  • In the table completed-Training the column users show how many users did (TRUE) or did not (FALSE) complete the training - right? We have 121 newly registered users and if I sum up the numbers of TRUE and FALSE it's 144 users. Is this right? I would suggest, that TRUE and FALSE would equal the total number of registered users or is there a tracking lack?

Thx in advance for your help!


Did 8 users start the artikel-bewerten module?


I would suggest, that TRUE and FALSE would equal the total number of registered users or is there a tracking lack?

The numbers in the completed_Training table look strange indeed. I need to check this and the I'll get back to you. You're right about the number of registered users. Also, you understanding that TRUE means completed the training while FALSE implies the opposite is correct too. As you suggest, TRUE and FALSE numbers in that table should sum up to 121 (the number of registered users). Let me check.

@GoranSMilovanovic Thx for the prompt reply. Let's see what the difference is about.

@Stefan_Schneider_WMDE I think the error is due to my improper handling of the Training Modules data set. I am still working to fix it, be back ASAP.

@Stefan_Schneider_WMDE Here's an update report with correct data on the Training Modules. The conclusions are now completely different, of course.

The problems were the following:

  • many users are found in the Training Module data who have not registered via our campaign;
  • also, some users have completed more than one Training Module.

Sorry for missing to spot this!

@GoranSMilovanovic Thx for the update. I will have a look on it tomorrow. I'm quite interested how the data changed now :)

@Stefan_Schneider_WMDE You're welcome. Well, the data didn't change, really - it's just that we are now looking a the relevant subset of the Training Module data set.

@GoranSMilovanovic There are still open questions to that new report:

  • If I have a look on the exit points I see only 5 people who completed the trainings. You displayed 10. How do you count them or are there some exitpoints missing? Or different users integrated, that don't belong to the code?
  • When I count the sum of exitpoints I see 23 people that exited in one particular step. I assume, that that same people attend different Training Modules. Could you display which training modules were done per user? [User | Module1 (flag) | Module2 (flag) | Module3 (flag)]
  • In the first final report you reported six people who reached their 10th edit. Now there are two people who reached their 10th edit. Was the first information false?

@Stefan_Schneider_WMDE Responding in an e-mail, because I am sending you a table with usernames (requires NDA).

@Stefan_Schneider_WMDE My responses here, also in an e-mail where you will also find a data integrity check table:

Stefan: "If I have a look on the exit points I see only 5 people who completed the trainings. You displayed 10. How do you count them or are there some exitpoints missing? Or different users integrated, that don't belong to the code?"
Goran: as you can see, the numbers that I have from the data on Training Modules and user registrations are different. And those are the only numbers I have; Training Modules data were obtained from Russel, while user registrations originate in our SQL databases. In the final Report on this campaign (the latest share on Phab), I don't think I have reported on observing ten users completing the Training Module(s) on any occasion. Please correct me if I am wrong.

Stefan: "When I count the sum of exitpoints I see 23 people that exited in one particular step. I assume, that that same people attend different Training Modules. Could you display which training modules were done per user? [User | Module1 (flag) | Module2 (flag) | Module3 (flag)]"
Goran: The table attached in my e-mail should be helpful in that respect. However, that table contains only the users who have completed any of the three Training Modules under consideration. If you need the dataset for all those who have started any of these Training Modules at all, please just let me know and I will send it to you in a flick of an eye.

Stefan: "In the first final report you reported six people who reached their 10th edit. Now there are two people who reached their 10th edit. Was the first information false?"
Goran: Yes, that information must have been false. However, I am looking at the first version of the Report that has included the analytics for the Training Modules, and I find that I have reported six users reaching their 10th edit. In order to avoid confusion, could you please send me the exact version of the Report (they are all accessible from the Phab ticket) in which you have found this information? We need to be precise here since all the Reports have the same file name, so it is not difficult to mistake one version for another. Thank you.

I hope this helps. Please contact me if you need anything beyond what I have provided here.

(1) Now I get your data more. I guess the confusion is, that there is other data created, when users click on the final button "Fertig!" OR when the exis at the last slide. In fact we have more people that completed the trainings (have seen all slides, but didn't click "Finished!")

(2) Yes, the table is exactly the information I needed

(3) I cannot find the information I related to. I just take the Information that is correct now and communicate that to the community.

Now everything is in place. I'll write the pubic presentation of our findings now. Thx for your answers!

I have one last question to the final report of the campaign: You show one registration on Jan 15th. If I remember right the campaign ended Jan 14th 23:59:59. I think that is the reason, that I don't see any impressions or clicks on the landing page on Jan 15th - right? So could this user just still had the campaign tag in his browser and you tracked him, because you took the timeframe until Jan 15th for the registrations? Or do I remember the end of the campaign falsely and there are impressions and page views on Jan 15th, that I am not aware of?