Page MenuHomePhabricator

[REQUEST] 1Lib1Ref May 2020 data analysis
Closed, ResolvedPublic

Description

What's requested:

We would like to evaluate the May edition of the #1Lib1Ref campaign. We have data from the Hashtags tool and would like to understand the breakdown of new, returning, and active editors from the participating projects. In this past campaign we also saw a higher than normal rate of users blocked - we would like to better understand how many editors were blocked (and in some cases later unblocked) and understand how many additions were reverted.

From the Hashtags tool, we can provide a CSV file containing data on edits, including the usernames, pages, and projects for each edit made as part of the campaign which was tagged with #1lib1ref or #1bib1ref in the edit summary.

In the past we have done simple analysis based primarily on querying the replica databases to check the activity levels of users per project. Reports for this analysis is linked below.

Why it's requested:

#1lib1ref is a campaign that we run every year to engage librarians in the Wikimedia movement. #1lib1ref May 2020 had an unusually high participation of new editors from the African context. We want to continue to monitor how that recruitment changes and effects the impact and relative effectiveness of the campaign, so that we can continue to make tweaks to make the initiative more effective and diverse.

This analysis had been done by Sam on the Wikipedia Library team when 1Lib1Ref lived in that team, but we'd now like to figure out a more sustainable and thorough way of getting these metrics.

When it's requested:

We would prefer this data this quarter so that we can use it as part of thinking about the design and communication of the campaigns. However if it takes longer, that is fine.

Other helpful information:

  • Report for January 2020
  • Sam on the Wikipedia Library team did this kind of analysis in the past and would be happy to answer questions about the data

Delivery

  • editor cohort analysis for May campaign ( 5/15/2020-6/5/2020), using the same cohort definitions in Report for January 2020

Cover below 10 wikis, and any other wikis which has more than 20-30 edits or more than 2-3 editors

CodeLanguage
ENEnglish
ESSpanish
CACatalan
PLPolish
SVSwedish
SRSerbian
FRFrench
SWSwahili
ITItalian
HEHebrew
  • blocks and unblocks on editors acquired in May campaign ( 5/15/2020-6/5/2020)
  • reverted edits in May campaign ( 5/15/2020-6/5/2020)
  • blocks and unblocks on editors acquired in Jan campaign ( 1/15/2020-2/5/2020)
  • reverted edits in Jan campaign ( 1/15/2020-2/5/2020)

Report
https://docs.google.com/document/d/1frP_aps64CTckhvUs6NQrPRT2WQqh8PCyvS28INQeGU/edit?usp=sharing

Event Timeline

@Samwalton9, I am the owner of this task now. Can I schedule a meeting with you to know more about your request?

Absolutely! Please feel free to schedule a call with me and @Sadads.

Data file for the campaign's list of tagged edits, downloaded from the relevant Hashtags tool query:

Some details on how I did the previous analysis:

  • I used the revision_userindex table (which I think is now deprecated) to get information on editors' edit counts, i.e. counting the number of edits between the relevant dates
  • I got editor account ages by looking at user_registration

As you dive into this please let me know if you have any other questions about how I'd got to the previous figures :)

@Samwalton9 Thank you! Since I also need to analyze the blocks for Jan 2020 campaign, I downloaded 1/15-2/5 data from hashtag

https://hashtags.wmflabs.org/?query=1lib1ref%2C+1bib1ref&project=&startdate=2020-01-15&enddate=2020-02-05&search_type=or&user=

Jan. 15, 2020 - Feb. 5, 2020

14778 revisions
7973 pages
720 users
59 projects

Let me know if the cutoff date is not correct.

Hi @Samwalton9 , @Sadads

I documented our discussion about the expected deliveries here. Feel free to add if anything is missing.
Delivery

  • editor cohort analysis for May campaign ( 5/15/2020-6/5/2020), using the same cohort definitions in Report for January 2020

Cover below 10 wikis, and any other wikis which has more than 20-30 edits or more than 2-3 editors

CodeLanguage
ENEnglish
ESSpanish
CACatalan
PLPolish
SVSwedish
SRSerbian
FRFrench
SWSwahili
ITItalian
HEHebrew
  • blocks and unblocks on editors acquired in May campaign ( 5/15/2020-6/5/2020)
  • reverted edits in May campaign ( 5/15/2020-6/5/2020)
  • blocks and unblocks on editors acquired in Jan campaign ( 1/15/2020-2/5/2020)
  • reverted edits in Jan campaign ( 1/15/2020-2/5/2020)

Let me know if the cutoff date is not correct.

That looks accurate and the deliverables look good!

Editor cohort analysis for May campaign ( 5/15/2020-6/5/2020) is posted at Report for May 2020. Feel free to comment.

Analysis for below 4 requests are done. Added to Report for May 2020 . Let me know if you have any question.

  • blocks and unblocks on editors acquired in May campaign ( 5/15/2020-6/5/2020)
  • reverted edits in May campaign ( 5/15/2020-6/5/2020)
  • blocks and unblocks on editors acquired in Jan campaign ( 1/15/2020-2/5/2020)
  • reverted edits in Jan campaign ( 1/15/2020-2/5/2020)

The report looks great - super appreciate the work you did on this!

We also talked about potentially recycling the workflow to perform the same analysis on the #WPWP campaign. The input data structure would be the same (a CSV from Hashtags), the time frame would be different, and the outputs would be the same - cohort analysis, revert rate, and block rate. There wouldn't be a need to compare to another set of data. We'd be very happy with the raw data outputs and we could write any reporting around that ourselves, to save you time.

Would that need to be a fresh analysis request, or is it something that might not take much time that we could add to this one?

Welcome. The manual work in the workflow still require some time. If you are familiar with Phabricator, you can create a similar ticket for #WPWP campaign. If not, you can fill in the analysis request, some member from our team will create a ticket based on the info on the request.

Thanks for the info. In that case I think this is complete! Appreciate your work on this :)