We need all registrations via the summer campaign with the timestamp of the registration and the information if the guided tour was shown.
Description
Related Objects
Event Timeline
I can understand what the timestamp of registration and the guided tour info mean.
To define a metric the "all registrations via summer campaign" need to be defined – what does "via summer campaign" mean?
all registrations via the summer campaign = all user who clicked on one of the two banner links (Campaign #1 link: https://de.wikipedia.org/wiki/Wikipedia:Wikimedia_Deutschland/Entdeckungen-sortieren?campaign=wmde2017summer1&wmdesource=bannerclick
Campaign #2 link: https://entdecke.wikipedia.de?campaign=wmde2017summer2&wmdesource=bannerclick) and later registered
Is such a definition sufficient?
@Verena I can take care of everything in relation to the internal page: the data should be readily available from the webrequest table and event logging. As of the external page, let me remind you that the URL parameter for its banner was removed, so no related data in Hadoop will be found: you will have to ask for the respective data from your internal server.
Let me know if this makes sense. The data are obtained from log.ServerSideAccountCreation_5487345.
@Verena @Addshore @kai.nissen @Tobi_WMDE_SW
Please, if anyone has an idea on which of the following tables from log (MariaDB m2), if any, are related to the Summer Banner Campaign, let me know:
| GuidedTourButtonClick_13869649 |
| GuidedTourButtonClick_13869649_15423246 |
| GuidedTourButtonClick_8221559 |
| GuidedTourButtonClick_8690550 |
| GuidedTourExited_8208763 |
| GuidedTourExited_8690566 |
| GuidedTourExited_8690566_15423246 |
| GuidedTourExternalLinkActivation_8208762 |
| GuidedTourExternalLinkActivation_8690560 |
| GuidedTourExternalLinkActivation_8690560_15423246 |
| GuidedTourGuiderHidden_8208754 |
| GuidedTourGuiderHidden_8690549 |
| GuidedTourGuiderHidden_8690549_15423246 |
| GuidedTourGuiderImpression_8208752 |
| GuidedTourGuiderImpression_8694395 |
| GuidedTourGuiderImpression_8694395_15423246 |
| GuidedTourInternalLinkActivation_8208760 |
| GuidedTourInternalLinkActivation_8690553 |
| GuidedTourInternalLinkActivation_8690553_15423246 |
| GuidedTour_4972209 |
| GuidedTour_5222838 |
https://meta.wikimedia.org/wiki/Schema:ServerSideAccountCreation
Which is provided by the Campaigns mediawiki extension.
@Addshore That was already used to produce the initial statistics (see above). If you're reading this: which version of the software, Adam - if you remember, we've done this before, there are many ServerSideAccountCreation tables in the log database, each marked by the respective code version; which one do we use now?
I believe https://github.com/wikimedia/mediawiki-extensions-Campaigns/blob/master/Campaigns.hooks.php#L88 should help you find it.
Verena: we have checked whether the registration data were collected from the correct tables, and the answer is yes they were:
Adam: Line 88: EventLogging::logEvent( 'ServerSideAccountCreation', 5487345, $event ); while I have used exactly log.ServerSideAccountCreation_5487345.
Verena: in other words, you are looking at the correct numbers.
@Verena As soon as someone lets me know which of the guided tour tables correspond to the Summer Banner Campaign, I will send the respective data.
Okay. Unfortunately I can't help you with that. I am afraid we have to wait for Adam...
@Verena Unfortunately I can't help with that either: only our web-developers know which version of the code have they deployed for what task exactly. I can do many things with your data, but first someone needs to tell me *where* the data are :)
So this is something that we never ended up looking at for the last campaign.
Thus we would be best off contacting the GuidedTours people / Devs / users of the event logging data and see what they say.
@Addshore And who are those people, please?
I will do a short research on these Guided Tour log tables online and see whether do we have any usable documentation on them. But as of the software version used - which I guess is the information that we need to figure where are we looking for what - I can't help there. Let me know who do you think could know more and I will get in touch with them. Thanks.
Devs that might be able to help can be found @ https://github.com/wikimedia/mediawiki-extensions-GuidedTour/blob/master/extension.json#L4
The versions of schemas used appear to be here: https://github.com/wikimedia/mediawiki-extensions-GuidedTour/blob/master/GuidedTourHooks.php#L234
https://meta.wikimedia.org/wiki/Schema:GuidedTourGuiderImpression
https://meta.wikimedia.org/wiki/Schema:GuidedTourGuiderHidden
https://meta.wikimedia.org/wiki/Schema:GuidedTourInternalLinkActivation
https://meta.wikimedia.org/wiki/Schema:GuidedTourExternalLinkActivation
https://meta.wikimedia.org/wiki/Schema:GuidedTourButtonClick
The ID numbers are revision IDs of the relevant schema. I'm not sure what it means where there are two numbers in the name
Thanks @Halfak
I've seen the schemes already. The problem is related exactly to the question of what revision was active when and relevant for what. I will have to figure it out somehow.
Thanks for a prompt response!
You'll likely need to reverse engineer the data.
mysql:research@analytics-store.eqiad.wmnet [log]> SELECT MIN(timestamp), MAX(timestamp) FROM GuidedTourGuiderImpression_8694395; +----------------+----------------+ | MIN(timestamp) | MAX(timestamp) | +----------------+----------------+ | 20170329170309 | 20170726210636 | +----------------+----------------+ 1 row in set (0.05 sec) mysql:research@analytics-store.eqiad.wmnet [log]> SELECT MIN(timestamp), MAX(timestamp) FROM GuidedTourGuiderImpression_8208752; +----------------+----------------+ | MIN(timestamp) | MAX(timestamp) | +----------------+----------------+ | 20140420030428 | 20140421031440 | +----------------+----------------+ 1 row in set (0.02 sec) mysql:research@analytics-store.eqiad.wmnet [log]> SELECT MIN(timestamp), MAX(timestamp) FROM GuidedTourGuiderImpression_8694395_15423246; +----------------+----------------+ | MIN(timestamp) | MAX(timestamp) | +----------------+----------------+ | 20140530033914 | 20170329165626 | +----------------+----------------+ 1 row in set (0.17 sec)
@Halfak so your suggestion is that the revisions of these tables were never produced/aimed for specific campaigns, but simply present their evolution from one to another version of code. Ok, I'll do that and seek for the data that we need from the most recent revision.
Thanks a lot.
@Verena Could you please let me know whether any of the following Guided Tours are ours (i.e. were used in the Summer Banner Campaign 2017):
"gettingstartedtasktoolbar"
"firstedit"
"firsteditve"
"WikiEduHelp"
"RcFiltersBeta"
"gettingstartedtasktoolbarve"
"twa1"
"wbstatements"
"twa2"
"twa3"
"twa4"
"twa5"
"twa6"
"twa7"
"wbitems"
"StudentTrainingComplete"
"flowOptIn"
"finddpla"
"CourseWizard"
"RcFiltersHighlight"
"newpage"
"RcFiltersInvite"
(this is a list of all Guided Tours that are found in the latest revisions of the log.GuidedTour_Something tables; these tables have entries made during July 2017, meaning that our data should be there).
Thanks.
@GoranSMilovanovic: The Guided Tour we used is located here: https://de.wikipedia.org/wiki/MediaWiki:Guidedtour-tour-einfuhrung.js. Maybe you find the appropriate data there.
Best
@Verena @Stefan_Schneider_WMDE @Tobi_WMDE_SW @Addshore @kai.nissen
Conclusion: No guided tour related data on the Summer Banner Campaign 2017 can be found in the relevant tables of the log database.
Please check whether the mediawiki logging of the relevant events was enabled at all and then comment here before we mark the task as 'resolved'.
Thank you.
So this is the same issue that we had for the spring campaign.
If we want to track guided tour interaction we need to do further investigation before the next campaign to see why we get no data in the event logging tables.
How did we manage last time? Where was the information on GuidedTours yes/no retrieved from?
Ahh, if you simply mean the details of the users that were shown a guided tour vs not shown a guided tour then that comes from the log created by the hack as discussed in the other ticket.
@Addshore Adam, could you please send that hack generated data on guided tours for this campaign to @Verena and @Stefan_Schneider_WMDE - they manage a concise table with all the data on the Summer Banner Campaign that were collected thus far? Thank you.
Yes.
So the code that generated the data can be found @ https://gerrit.wikimedia.org/r/#/c/362379/4
The data / logs are stored on mwlog1001.eqiad.wmnet, which I don't believe you have access to.
To access this server you would need to be in the "restricted" ldap group https://github.com/wikimedia/puppet/blob/production/modules/admin/data/data.yaml#L72
Do you want to file a ticket for getting added to that group or should I move the logs to one of the stat boxes for you to see for now?
I guess adding you to the group is the long term solution here really.
@Addshore Depends. If mwlog1001.eqiad.wmnet is a production web-server, I then *I don't want to have access to it* - and it probably is, since people need special access rights there.
It is not a production web server, it a production mediawiki log server.
It sole purpose is to collect logs.
It looks like there is actually a finer grained group called "mw-log-readers" which will only allow you to read these logs from these servers.
https://github.com/wikimedia/puppet/blob/production/modules/admin/data/data.yaml#L399
@Addshore As of the Guided Tours logging, please take a look at the following discussion:
and let me know whether we set the MW parameters right for our campaign.
So, the tour is located @ https://de.wikipedia.org/wiki/MediaWiki:Guidedtour-tour-einfuhrung.js and it looks like it needs this shouldLog param added to it as in the post!
I can't do this but I am sure @Verena can get this done for next time!
Okay, I will take the adjustment of the tour itself as a todo for me. As for retrieving the data an guided tour usage for the summer campaign: As I understand it was tracked but currently only Adam hast access?
@Verena there is a text log that contains some data that currently only I can see. @GoranSMilovanovic has requested access to the log server in T171958
@Verena @Addshore Correct; I have requested access to the log server here https://phabricator.wikimedia.org/T171958, and it might take up to three working days to get it - or at least the Ops say so.
@Verena @Addshore Nothing yet in relation to https://phabricator.wikimedia.org/T171958
@Abraham @Tobi_WMDE_SW Please, whoever can sign off the request on https://phabricator.wikimedia.org/T171958, so that we can finalize the respective analyses. Thank you.
...
The ID numbers are revision IDs of the relevant schema. I'm not sure what it means where there are two numbers in the name
If there are two numbers, that means that the table dates from before the big March 2017 renaming. The appended "15423246" refers to a previous revision of the event capsule schema.
Here's the data on Guided Tours from the Summer Banner Campaign.
However, I am not sure whether your initial request: "We need all registrations via the summer campaign with the timestamp of the registration and the information if the guided tour was shown." can be met completely.
Namely, if the anonymized user names from the logs that were used to find out about Guided Tours match anything that refers to user registration, than yes, otherwise - no. I will check this out and get back to you.
In the meantime, here's the aggregated data on Guided Tours.
@Verena The user registration timestamps and the data on whether the Guided Tour was shown or not can be matched.
Here's the table. Everything will be also sent via e-mail.
Hint: 'GuidedTour' column (yes/no) = whether the Guided Tour was presented or not; 'timestamp' = timestamp, of course; 'Code' and 'event_campaign' carry the same information from two different sources, and that is whether the user has registered via the wmde2017summer1 or wmde2017summer2 campaign. You need to take a look at these three columns only.
Please let me know if anything else needs to be done here.
N.B. The 'NA' code in the event_campaign column stands for 'Not Available' and signifies missing data. There are three users on whom we have data in the logs on mw1001.eqiad.wmnet server, but miss their registration data in the log.ServerSideAccountCreation_5487345 table. Consistency check: the 'Code' and 'event_campaign' column must match, and they do.
Thank you. We also need the editing behaviour as indicated in this sheet for the complete analysis.
@Verena Which sheet for the complete analysis? And what editing behavior do you mean exactly - edits of new users (after the campaign, as indicated in No. 2 on the list below)?
As of the analysis guidelines and data summaries for the Summer Banner Campaign, I know about the following:
- Summary Data: https://docs.google.com/spreadsheets/d/14iCkcTmDhuPPVTTCz6SEhmarjH-01mpsuwlFmPTfHwk/edit#gid=0
- Results/Analysis Doc: https://docs.google.com/document/d/1Luv1bXDvp3zV7ydgR8Ave3iXIPAm8RLGoxrP1AD6iDA/edit#
- User registration data collected
- Proceeding with collecting post-registration edits and campaign analysis on: https://phabricator.wikimedia.org/T168637 (Analysis of summer banner-campaign)
- Closing this task

