Page MenuHomePhabricator

Create daily tracking reports for occasional editors campaign 2021
Closed, ResolvedPublic

Description

Provide daily tracking report for banner campaign running in October (18th - 31st). Please see tracking document for all details

Provide interim report after the end of the campaign.

Provide final report after analyzing editing behavior of tracked users four weeks after the campaign.

Timeline for the whole project

  • Start of the banner campaign: October 18th
  • End of the banner campaign: October 31st
  • Tracking test: October 6ht - 8th
  • Preliminary report for tracking part 1: End of October
  • Track editing behavior four weeks after end of campaign: November 23rd
  • final report for tracking part 1 and 2: End of November

Campaign Tags and Landingpages

Campaign tag: WMDE_oceditors_fall_2021
Landingpage: https://de.wikipedia.org/wiki/Wikipedia:Wikimedia_Deutschland/DeinEngagement

target group

  • logged-in users with 1 - 200 edits

There will be no A/B-Testing.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@Christine_Domgoergen_WMDE @kai.nissen

Please confirm if we will be using Schema:WMDEBannerInteractions in this campaign as we did for the Occasional Editors Campaign in 2020.
Thank you.

EDIT. Ok, I see from the tracking doc that we will use Schema:WMDEBannerInteractions.

@Christine_Domgoergen_WMDE

The campaign analytics code is in place and we are ready to test this.

Please check the key campaign parameters once again before testing:

Pages to track:

# - Landing Page 1: https://de.wikipedia.org/wiki/Wikipedia:Wikimedia_Deutschland/DeinEngagement
# - Landing Page 2b: https://de.wikipedia.org/wiki/Wikipedia:Wikimedia_Deutschland/LerneWikipedia
# - Landing Page 2c: https://de.wikipedia.org/wiki/Wikipedia:F%C3%B6rderung/F%C3%B6rderangebote

Campaign tags to track:

WMDE_oceditors_fall_2021

@kai.nissen Since we are using Schema:WMDEBannerInteractions in this campaign, unlike in the same campaign in 2020, should I assume that the banner impressions (bannerImpressions field in Schema:WMDEBannerInteractions) from this schema are correct?

@Christine_Domgoergen_WMDE It is requested to separate banner impressions, clicks, and closing rates by desktop/mobile In the tracking document. This will not be possible since the Schema:WMDEBannerInteractions does not provide device info.

@GoranSMilovanovic perfect, thanks. I am ooO today, let's test on Monday!

@GoranSMilovanovic Hi, I just did two banner clicks and produced one page view for each sub-landing page - can you check if you find it?

We had to change the campaign dates: the campaign will now run from next monday, October 18th - October 31st. I updated the task description accordingly.

@Christine_Domgoergen_WMDE

Test date: 2021/10/11

Banner Interactions Data

"","banner","action","count","day","campaign"
"1","WMDE_oceditors_fall_2021_ctrl","banner-clicked",2,2021-10-11,"OccasionalEditors_2021"
"2","WMDE_oceditors_fall_2021_ctrl","banner-seen",2,2021-10-11,"OccasionalEditors_2021"

The WMDE_oceditors_fall_2021_ctrl campaign banner was seen twice and clicked twice on October 11 2021.

Pageviews Data

"","Tag","Page","Pageviews","date","campaign"
"1","?campaign=WMDE_oceditors_fall_2021_ctrl","de.wikipedia.org/wiki/Wikipedia:Wikimedia_Deutschland/DeinEngagement",2,2021-10-11,"OccasionalEditors_2021"

Two pageviews of de.wikipedia.org/wiki/Wikipedia:Wikimedia_Deutschland/DeinEngagement, both tagged by WMDE_oceditors_fall_2021_ctrl", were observed on October 11 2021.

No other data matched the campaign tag.

@GoranSMilovanovic Okay, thank you! So the first half seems to be working fine, that's good :-) For the second half:

There should be two pageviews, one for https://de.wikipedia.org/wiki/Wikipedia:Wikimedia_Deutschland/LerneWikipedia and one for https://de.wikipedia.org/wiki/Wikipedia:F%C3%B6rderung/F%C3%B6rderangebote also, can you find them?

I just tested agein, can you please check if you see them now? If not, why not?

@Christine_Domgoergen_WMDE

There should be two pageviews, one for https://de.wikipedia.org/wiki/Wikipedia:Wikimedia_Deutschland/LerneWikipedia and one for https://de.wikipedia.org/wiki/Wikipedia:F%C3%B6rderung/F%C3%B6rderangebote also, can you find them?

I will, let's give them some time to show up in the wmf.webrequest table.

But the answer to the following question I know with certainty:

If not, why not?

The answer is: I have no idea, but if they are not found in the database than the problem is on the tracking side.

@Christine_Domgoergen_WMDE

Test date: 2021-10-13

Pageviews

"","Tag","Page","Pageviews","date","campaign"
"1","?campaign=WMDE_oceditors_fall_2021_ctrl","de.wikipedia.org/wiki/Wikipedia:Wikimedia_Deutschland/DeinEngagement",2,"2021-10-13","OccasionalEditors_2021"

Result. DeinEngagement" had two pageviews from WMDE_oceditors_fall_2021_ctrl today.

2. Test date: 2021-10-12

Pageviews

Result. No pageviews with the WMDE_oceditors_fall_2021_ctrl campaign tag.

@GoranSMilovanovic Okay, so the page views for the second level landing pages are not being tracked. Can you change something in the tracking set-up in order to track it? Or could it be a cookie issue?

Kai is not available this week, so we have to find a solution on our own....

@Christine_Domgoergen_WMDE

Can you change something in the tracking set-up in order to track it? Or could it be a cookie issue?

I have never set a single campaign myself; @MartinRulsch was recently involved in campaign setup and might be able to help you.

@AbbanWMDE Hi Abban, do you have an idea why the page views for the second landing pages are not being tracked? Could this be a cookie issue? Or any other reason why the tracking tag/cookie does not seem to stick when we go from the first landingpage to the second?

@GoranSMilovanovic Do I understand correctly that you are looking for page views of Wikipedia:Wikimedia_Deutschland/LerneWikipedia and Wikipedia:Förderung/Förderangebote using the query string campaign=WMDE_oceditors_fall_2021_ctrl?

That won't work, because the links on the first page don't pass that query string. If the cookies are logged in the webrequest table, we could take the referrer info for the 2nd page from there.

@kai.nissen

The campaign uses the following landing pages:

https://de.wikipedia.org/wiki/Wikipedia:Wikimedia_Deutschland/DeinEngagement
https://de.wikipedia.org/wiki/Wikipedia:Wikimedia_Deutschland/LerneWikipedia
https://de.wikipedia.org/wiki/Wikipedia:F%C3%B6rderung/F%C3%B6rderangebote

and I am filtering out all queries that do not match anything containing WMDE_oceditors_fall_2021.

@Christine_Domgoergen_WMDE

Would that be a possible solution?

Let me check with @kai.nissen first

@kai.nissen

If the cookies are logged in the webrequest table, we could take the referrer info for the 2nd page from there.

Just to confirm: you would like us to grab the uri_query from the referer of Wikipedia:Förderung/Förderangebote?

@GoranSMilovanovic Oh, the referer might also be an option. It should be https://de.wikipedia.org/wiki/Wikipedia:Wikimedia_Deutschland/DeinEngagement?campaign=WMDE_oceditors_fall_2021 when initially coming from a banner.

@kai.nissen @Christine_Domgoergen_WMDE

Got it, following the inspection of the referer field.

Test date: 2021-10-13

Pageviews

"","Tag","Page","Pageviews","date","campaign"
"1","WMDE_oceditors_fall_2021_ctrl","de.wikipedia.org/wiki/Wikipedia:F%C3%B6rderung/F%C3%B6rderangebote",1,"2021-10-13","OccasionalEditors_2021"
"2","WMDE_oceditors_fall_2021_ctrl","de.wikipedia.org/wiki/Wikipedia:Wikimedia_Deutschland/DeinEngagement",2,"2021-10-13","OccasionalEditors_2021"
"3","WMDE_oceditors_fall_2021_ctrl","de.wikipedia.org/wiki/Wikipedia:Wikimedia_Deutschland/LerneWikipedia",1,"2021-10-13","OccasionalEditors_2021"

Result. DeinEngagement had two (2) pageviews, F%C3%B6rderung/F%C3%B6rderangebote had one (1), and LerneWikipedia had one (1), all from from WMDE_oceditors_fall_2021_ctrl .

@Christine_Domgoergen_WMDE

Ok, I am putting this on a daily update schedule as of tomorrow October 18 and until October 31.
I will share a public directory for daily updates in the ticket.

@kai.nissen Thanks you for your assistance.

@GoranSMilovanovic Great, thanks! If possible please share the spreadsheet, thank you.

@Christine_Domgoergen_WMDE

I will see to transfer the files from the public directory into a Google Spreadsheet for your team.

@Christine_Domgoergen_WMDE

No. At this point, some other tasks for the New Editors team are prioritized. I will get in touch as soon as the opportunity to work on this ticket opens again.

@Christine_Domgoergen_WMDE

I will take care of this today (Nov 3), so I think that you can expect to see the results in the evening hours.

@Christine_Domgoergen_WMDE

Unfortunately, I will be able to complete this report tomorrow only.

@Christine_Domgoergen_WMDE Your interim report is here:

What remains:

Track editing behavior four weeks after end of campaign: November 23rd
final report for tracking part 1 and 2: End of November

The editing behavior summary will be provided on November 23rd

@GoranSMilovanovic Great, thank you! I've just had the time for a quick glance so far but could you please add the data points in all graphs in section 1 so we have the numbers for total banner clicks etc.? And do you also have the numbers in a spreadsheet?

@Christine_Domgoergen_WMDE

The report (with all data points included - I have no idea why I've missed to include them here in the first place, they are present in all previous reports or at least I think they are) is here:

Spreadsheets, one for banners and one for pageviews:

@GoranSMilovanovic Hello Goran, I just had time to look at the report in depth. I have only one small request:

2.3. Pageviews per Tag: pageviews per device. Could you separate the data for landingpages 1 vs. 2a and 2b here? That would mean listing the total number of pageviews per device for landingpage 1 only in the graph 2.3. And then adding another graph listing the total number of pageviews per device for the landingpages 2a and 2b.

You can add this when you add the data for the editing behavior next week.

Thank you!

@Christine_Domgoergen_WMDE Noted T291635#7506437; it will be included in the final report which I will be able to deliver until the end of November 2021 as planned.

@Christine_Domgoergen_WMDE

Your final report is here:

Comments

If possible: Split up in groups compared to their editing behavior before the campaign (see question 3), e.g. xx users edit xx percent more than before the campaign

I guess this is in relation to the following question too:

Set up cohorts (account age and editing behaviour regarding activity in the defined time frame)

I am not sure what exactly do you mean by Set up cohorts in this context. I have the data on the user revision counts before the campaign for all users who have engaged (i.e. clicked) in this campaign: but cohorts I am not sure how would you like to be defined. Please describe the cohorts more precisely. Also: how would you like me to analyse the relationship between account age, number of edits before the campaign, and number of edits following the campaign engament? Precisely: how would you expect, given the data that we have, to reach to e.g. xx users edit xx percent more than before the campaign? Please advise. Thanks.

Edit quality: revert rate of the edits the users made before, during and after the campaign

From the revision_actor_temp table that we are using for some time already I don't see a way to find out anything about reverted revisions.
@Tobi_WMDE_SW Maybe you could help around this:

... The revision_actor_temp table is a temporary table used for the Actor migration

(from the revision_actor_temp table docs) - do we have anyone in WMDE who is following more closely the Actor migration process? Our analytics for the WMDE New Editors team might be improved if I could have someone who follows this process to rely on. Thanks.

@GoranSMilovanovic Hi Goran, thank you for the report! Since the reporting requirements are the same as in last year's campaign maybe you can have a look at the report in https://phabricator.wikimedia.org/T249617 and then analyze this year's data in the same way? I hope this clarifies a bit, if not please let me know!

Concerning the revision rate: here we would need the revision rate of the edits made:
4 weeks before the campaign
during the campaign
4 weeks after the campaign

as well as the average revision rate in those time periods in dewp among editors from 1 - 200 edits. Would that be possible? Thank you!

@Christine_Domgoergen_WMDE

Hi Goran, thank you for the report!

You are welcome!

Since the reporting requirements are the same as in last year's campaign maybe you can have a look at the report in https://phabricator.wikimedia.org/T249617 and then analyze this year's data in the same way?

Got it - I am on it as soon as I finish something that I am working on right now.

@Christine_Domgoergen_WMDE

Concerning the revision rate: here we would need the revision rate of the edits made: - 4 weeks before the campaign - during the campaign - 4 weeks after the campaign

Done; here is the updated Final Report:

as well as the average revision rate in those time periods in dewp among editors from 1 - 200 edits. Would that be possible? Thank you!

This request was not a part of the initial ticket description; it calls for a new ETL procedure to be developed in Pyspark or SQL because I do no think that we have any similar "off the shelf code" ready for this. Also, you did not define editors from 1 - 200 edits precisely: 1 - 200 edits when?
@Tobi_WMDE_SW Please prioritize; I am not sure, given the circumstances, that I will have the time to develop this procedure right now.

@Tobi_WMDE_SW @Christine_Domgoergen_WMDE I've got the numbers for your

as well as the average revision rate in those time periods in dewp among editors from 1 - 200 edits. Would that be possible? Thank you!

They are found below the last chart in the following update of this campaign's Final Report:

@GoranSMilovanovic Thank you! Could you look into the following:

  • 3.5. Daily user edits: Could you add a table as in the 2020 report 3.2.1 (Total and Mean user edits per day: before campaign, during the campaign, and after the campaign)
  • 3.4. Edit clases: Could you split the category 0-1 into two categories: 0 edits and 1 edit? Also could you add a table with the edit classes before, during and after the campaign as in the 2020 report 3.4.4 (Edit Classes: before, during, and after the campaign)
  • revision rate: I don't see the numbers in the last chart, am I overlooking something?

@GoranSMilovanovic One more question: the edit numbers on November 2nd and 3rd right after the campaign seem really high, can you dobule check if they are correct? Can you see who made the edits? Was it just one or a few users? Can you see what kinds of edits they were (e.g. in which name space? different pages or the same page?)?

@GoranSMilovanovic Hi Goran, I guess you are quite busy but still I would like to check if you have any news on this? Thank you!

@Christine_Domgoergen_WMDE Hi! I should be able to get back to this ticket on Monday 2021/12/20.

@Christine_Domgoergen_WMDE

3.5. Daily user edits: Could you add a table as in the 2020 report 3.2.1 (Total and Mean user edits per day: before campaign, during the campaign, and after the campaign)

Done.

3.4. Edit clases: Could you split the category 0-1 into two categories: 0 edits and 1 edit?

This does not make sense since we are only considering users who have made at least one edit.

3.4. Also could you add a table with the edit classes before, during and after the campaign as in the 2020 report 3.4.4 (Edit Classes: before, during, and after the campaign)

Done.

revision rate: I don't see the numbers in the last chart, am I overlooking something?

All charts encompass data point labels.

One more question: the edit numbers on November 2nd and 3rd right after the campaign seem really high, can you dobule check if they are correct?

I am working with raw user edits data - the numbers are exactly what you see.

@Tobi_WMDE_SW

Can you see who made the edits? Was it just one or a few users? Can you see what kinds of edits they were (e.g. in which name space? different pages or the same page?)?

This would call for a thorough revision of the user edits ETL procedure. At this point, unfortunately, there is not time for this: until the end of December I need to stay focused on Wikidata tasks entirely. Requests like this should be formulated well in advance of the campaign onset.

@GoranSMilovanovic Thank you, this looks good.

This does not make sense since we are only considering users who have made at least one edit.

Then why do the tables say "edit class 0-1"? Also it could be that the users had 1+ edits before the campaign as we indeed determined in our target group settings but then didn't edit during the campaign (edit class 0 edits during campaign) and then made one edit after the campaign (edit class 1). So it would make sense to split the classes like we did last year.

All charts encompass data point labels.

Yes, but there is no chart or table with the revision rate.

@Christine_Domgoergen_WMDE

Then why do the tables say "edit class 0-1"? Also it could be that the users had 1+ edits before the campaign as we indeed determined in our target group settings but then didn't edit during the campaign (edit class 0 edits during campaign) and then made one edit after the campaign (edit class 1). So it would make sense to split the classes like we did last year.

Now it makes sense. I will make sure to include the tables with the 0 - 1 split to the Report.

Yes, but there is no chart or table with the revision rate.

Then I am not sure what chart or table do you mean by "revision rate". Please clarify: what data should go there? Thank you.

@GoranSMilovanovic Analyzing the numbers I noticed the following:

  • number of users, who edited: when I summarize the numbers of users in the edit classes (from the three tables in 3.5), I get a different number of users, who edited (before 384, during 419, after 457). When I summarize the number of users who edited from the table in 3.4 the result is 457. However, I would expect that the sum of users, who edited (including 0 edits, see above) is always the same. The data basis should be all users, who clicked on the banner. To be sure I checked on last year and there it didn't. Can you check please?

@GoranSMilovanovic Hi Goran, did you finish the report and could you please share it? Thank you.

@Christine_Domgoergen_WMDE

Hey, no - didn't @Tobi_WMDE_SW told you that I was sick since December 25th or so?

Also, as of January 1st I will be working on a minimum amount of working hours and on Wikidata related projects only.

Perhaps I can share the raw data with you and then the new analyst for the WMDE New Editors team could wrap this up?

@GoranSMilovanovic No, we didn't get this information.

I am afraid not finishing the report is not an option. The requirements and tasks were shared very early and haven't changed, so we expect the report to be delivered as it was agreed on in this task. We need this report quite urgent now, and I think to finish it does not require a lot of work anymore. So it would be great, if you could finish and share the result. Thank you!

@Christine_Domgoergen_WMDE Ok, at some point I will, but you really need to coordinate about priorities with @Tobi_WMDE_SW as of now. It is January 3rd and I am not supposed to invest any more hours in any analytics for the WMDE New Editors team. What you say makes sense - we started this, we should finish this, but at this point it cannot be a priority in my work. Thank you for your understanding.

No, we didn't get this information.

How did the rest of the WMDE New Editors team then get this information, and they did? I really cannot coordinate your team while working remotely on Wikidata related tasks. Please get in touch with your management and my EM @Tobi_WMDE_SW they can share all the details with you. Thanks.

@GoranSMilovanovic

What you say makes sense - we started this, we should finish this, but at this point it cannot be a priority in my work.

Okay, then let's finish this. It does not need to be a priority but it needs to be done. I am sure it's understandable that we need the deliverable we have agreed on (and which we also have coordinated internally) months ago. Could you give me a timeline, when we can expect the final report?

How did the rest of the WMDE New Editors team then get this information, and they did? I really cannot coordinate your team while working remotely on Wikidata related tasks. Please get in touch with your management and my EM @Tobi_WMDE_SW they can share all the details with you. Thanks.

No worries about that, we will catch up and coordinate internally.

@Christine_Domgoergen_WMDE @Tobi_WMDE_SW

Could you give me a timeline, when we can expect the final report?

Maybe until the end of the week. I am starting a new job full-time as of tomorrow and have an agreement on a minimal number of working hours with WMDE until March 2022 and for Wikidata related tasks only - and I still have opened, unfinished Wikidata things to work on. So, fast is not an option at this point.

Please keep @Tobi_WMDE informed and tagged here because prioritising is now (obviously) critical.

@GoranSMilovanovic @Tobi_WMDE_SW Okay, end of the week is fine. Thanks and congrats for the new job!

@Christine_Domgoergen_WMDE @Tobi_WMDE_SW

The whole user edits ETL procedure had to be refactored and re-run in order to answer to all the newly formulated demand in this ticket.

@Tobi_WMDE_SW This has already taken approx. eigth (8) hours of the first week of January 2022. Should I put this hours on my Wikidata invoice for this month, since we have agreed that no additional work for the WMDE New Editors team will be done in 2022? Please advise.

@Christine_Domgoergen_WMDE The following dataset encompasses all relevant user edits from the users who have clicked on a campaign banner. You will need to use this dataset to check any numbers in your report once I deliver it.

The columns are:

  • ts_click - date when the user first clicked on any of the campaign banners
  • banner - first campaign banner that the user has clicked
  • impressions - number of banner impressions before the first banner click for the respective user
  • ts_revision - date of user revision
  • campaignDay - classification of ts_revisions: beforeCampaign, Campaign, afterCampaign (4 weeks before/after as described in the ticket)
  • anon_userid - anonymized user id

In order to understand the data, you need to consider the following:

  • if a user has an NA (i.e. Not Available) value in the ts_revision column, than that user (a) has never edited in the relevant time span (4 weeks before/after campaign and during the campaign), (b) thus appears only once in the dataset, and (c) necessarily has an NA value in the campaignDay column (no user edits -> no before/during/after campaign classification);
  • in order to analyze per campaign tag edits (which was not requested, but I always include that analysis since it is related to campaign channels) we are considering each user as assigned to a single campaign tag only, namely the banner he clicked first (be it desktop, ipad, or mobile); this is important to consider since I have just realized that multiple clicks on the same banner were recorded and thus the numbers will now be different than before (while the rank order of edits per tag remains the same).

In the next step I will re-work the whole section 3. Editing Behaviour of the report - that is how we will know from this new dataset that everything is in place.

As agreed with @Tobi_WMDE_SW, no new demand for analytics in this campaign will be accepted.

@Christine_Domgoergen_WMDE No 0/1 edits split was used for this campaign in 2020, I have checked the campaign's final report. The introduction of this split took eight (8) hours of work thus far since our standard user edits ETL procedure does not take into account users who have never edited - in any of the previously analyzed WMDE banner campaigns. Those hours should have been dedicated to Wikidata related tasks solely.

@Christine_Domgoergen_WMDE @Tobi_WMDE_SW

The final report is revised (section 3. Editing Behaviour) and attached:

  • Some numerical errors were corrected;
  • Section 3.2 How many banner impressions before the first edit? was removed since it really had nothing do with user edits (it is the number of impressions before first click, not edit...)
  • The 0 vs 1 edit class split was introduced.

Please refer to and use the dataset shared in T291635#7599973 for all manual checks if any are needed. All the raw numbers used to produce Section 3. Editing Behavior are there. Please do not forget to take into your consideration everything said in T291635#7599973 while checking the results.

@Christine_Domgoergen_WMDE

Okay, end of the week is fine. Thanks and congrats for the new job!

Thank you! It was a pleasure to work with you in the previous years.

@GoranSMilovanovic @Tobi_WMDE_SW Hi Goran, thank you, I'll have a look right away. I understand it must be very stressful with being sick and the new job but just to be clear: there were no new demands formulated in this ticket. The numbers of users who clicked on the banner in the report didn't match so for quality assurance the double check was necessary. To evaluate the success of the campaign we need the editing behavior of all users who clicked on the banner and this also includes users who clicked on the banner but then never edited. In my understanding this is exactly what we did in 2020. Thanks for understanding.

Thank you! It was a pleasure to work with you in the previous years.

Likewise!

@Christine_Domgoergen_WMDE @Tobi_WMDE_SW

Likewise!

Thank you.

Do not forget: the raw numbers are in T291635#7599973; everything can be checked there (even in Excel, Google Spreadsheets, or Libre Calc).

@GoranSMilovanovic @Tobi_WMDE_SW Hi Goran, thanks for the revision, the report looks good now. So we can resolve the ticket :) Take care and all the best for your new job!