Page MenuHomePhabricator

English translation of banner content and user tasks
Closed, ResolvedPublic

Description

BT1 - Specific Task Banner wmde_abc2017_bt1 (blue):
Text: You can make Wikipedia more vivid! CTA: Learn how to add pictures to articles
The landing page text briefly explains how to add pictures to articles and provides a video. The focus is not to upload new images but to include them in articles. However both is fine. Either way these users make an edit which involves file linking to Commons.

BT2 - Specific Task Banner wmde_abc2017_bt2** (pink)
Text: You can improve the accuracy of Wikipedia! CTA: Learn how to improve articles
The landing page leads to a category which lists articles who need improvement. So ideally edits were made to articles (formerly) in this category.

BT3 - Specific Task Banner wmde_abc2017_bt3 (green)
Text: You can improve the reliability of Wikipedia! CTA: Learn how to add citations
The landing page leads to a page which lists a number of pages who include a certain template. So ideally edits were made to pages who have or had this template.

GIB - General Inviation Banner
Text: Contribute to Wikipedia CTA: Create a user account
no specific task

Event Timeline

@Verena @Stefan_Schneider_WMDE @Jan_Dittrich

Additional analyses to be provided:

  • Did the change in banners that took place on 10/10/2017 influence (a) the number of user registrations and (b) the number of user edits? (DONE) This is already completed: I can say that it has probably influenced the number of registrations in a negative way, while at the same time it cannot be ruled out that it had a positive influence upon the number of user edits. I am not aware of the nature of the change. Please take a look at the attached charts. N.B. The details will be provided in the updated Report when everything else is finished.

Edits.png (432×700 px, 6 KB)

Registrations.png (432×700 px, 7 KB)

  • Did the registered users really followed the instructions as provided in the Specific Task Banner Campaigns in their edits?
  • How many reverted edits there were (a) per campaign, and (b) per user?
  • The effect of having or not having completed the Guided Tour upon the number of edits made
  • Include the list of abbreviations used in the Report (DONE).

Estimate: I think I can finish this until the end of the week. It depends upon the complexity of the first and the second task in this list.

@Verena @Stefan_Schneider_WMDE @Jan_Dittrich

A new version of the ABC 2017 Report is here:

  • Section 4. 7 Guided Tour and the number of user edits answers to your question on the number of edits made on behalf of those who did and did not complete the Guided Tour
  • Section 6. Post-Campaign Analytics answers all remaining questions, except for
  • the incomplete section 6. 2 Did the registered users really followed the instructions as provided in the Specific Task Banner Campaigns in their edits? where I still need to invest some work.

@Verena @Stefan_Schneider_WMDE @Jan_Dittrich

As of the BT2 specific task: 9.68% of edits made by BT2 registered users were made on the pages in the Wikipedia:Überarbeiten category.

BT1 and BT3 specific tasks: I need to parse the revision content to do this, which means I have to get it back from the SHA1 compression in our databases into plain mediatext; I will have to consult someone on how do that.

Thank you for the update.

How many of the new users edits have been reverted and from which banner came these edits/users? If I understood the code correctly the query includes also post-campaign activity? The result of 0 reverts surprises me.

I am still missing the answer for the question 'Are users via GIB more inclined to take a Guided Tour?' It would be sufficient if you could create a second version of the diagram in section 3.1B with slices for the banners in each bar or sth similar. A table would also be sufficient if a diagram is too much work.

Looking forward to the specific task results. BT2 is interesting already.

@Verena

"If I understood the code correctly the query includes also post-campaign activity?"

Well, no, since: AND (rev_timestamp >= 20171004220000) AND (rev_timestamp <= 20171014220000) in the WHERE clause should cover exactly the CEST scope of the campaign on the UTC server time.

"The result of 0 reverts surprises me."

Let me check the analytics code.

I am still missing the answer for the question 'Are users via GIB more inclined to take a Guided Tour?'

I didn't even know that you need this; my apologies. I will provide this in the following hours.

"Looking forward to the specific task results. BT2 is interesting already."

As soon as I figure out how to get back from SHA1 compression in the revision table to mediatext and parse the revisions.

@Verena

Please find attached the following update:

  • Code for reverted edits (Section 6. 3) corrected (sorry about that): 4 edits were reverted;
  • Section 6. 4 Exiting the Guided Tour vs Registration Campaign;
  • Section 6. 5 Point of Guided Tour Exit per Registration Campaign.

Again, as of: "Looking forward to the specific task results. BT2 is interesting already." - as soon as I figure out how to get back from SHA1 compression in the revision table to mediatext and parse the revisions.

@Verena @Stefan_Schneider_WMDE @Jan_Dittrich @Addshore

Here's the situation:

  • In order to discover whether the BT1 (work with multimedia) and BT2 (use citation templates) registered users did what we've suggested for them to do, we need to parse the text of their revisions - if you know any way around this, please let me know.
  • In order to parse the text of their revisions, we need to have that text, and the text is stored under SHA1 encryption in our databases.
  • Apparently, SHA1 cannot be reversed, which is, as far as my understanding goes, the essence of something being an encryption algorithm. I was hoping that there is some trick to use SHA1 simply as a compression algorithm that produces messages that can be reversed, but I don't think so.

@Addshore If you know any way around - again, to complete the analyses here I need to parse mediatext - please let me know. SHA1 etc. is really not my thing.

@Verena @Stefan_Schneider_WMDE @Jan_Dittrich @Addshore

Of course, tables like templatelinks and imagelinks are not what we are looking for: from there we could learn whether a particular user edited a page that uses a template or/and multimedia, not whether that user has made an edit that encompassed template or multimedia usage. So, parsing the text of user revisions is a must. I've checked some Hadoop tables also and of course they also keep only the SHA1 of the revision.

From the revision history as managed by Mediawiki onsite (what you can get from View History on Wikipedia), I am not sure whether I would be able to figure out the changes in templates and/or multimedia, not to mention that scraping and parsing those histories would be such an overkill if we need that for this task only.

I will continue to research for a while and see if there's an option to get to the data that we need for this. However, an advise would be more than welcome at this point.

> In order to discover whether the BT1 (work with multimedia) and BT2 (use citation templates) registered users did what we've suggested for them to do, we need to parse the text of their revisions

What do we need to find out by parsing the wikitext?

In order to parse the text of their revisions, we need to have that text, and the text is stored under SHA1 encryption in our databases.
Apparently, SHA1 cannot be reversed, which is, as far as my understanding goes, the essence of something being an encryption algorithm. I was hoping that there is some trick to use SHA1 simply as a compression algorithm that produces messages that can be reversed, but I don't think so.
@Addshore If you know any way around - again, to complete the analyses here I need to parse mediatext - please let me know. SHA1 etc. is really not my thing.

There might have been a misunderstanding here, the wikitext is not stored in the SHA1 hash, but the SHA1 hash is unique for a version of wikitext.
When we were talking about the SHA 1 hash we were talking about detected rollbacks / reverts.
The wikitext can just be retrieved from the API or dumps.

@Addshore The motivation behind the task: @Verena and the team would like to learn whether the ABC2017 registered users really did what the respective banners were suggesting to them (upload an image or use an image on Wikipedia, learn how to use citations, edit a page in a particular category).

In order to find out whether they really followed those suggestions, I need to parse the edits (revisions) that they've made. That's it. For example, I need to take a look at the wikitext of a particular edit and parse it to know whether a link to a multimedia file on Commons was used or not.

There might have been a misunderstanding here, the wikitext is not stored in the SHA1 hash, but the SHA1 hash is unique for a version of wikitext.

There is a misunderstanding, and it's here with me: I thought that there is some transform(SHA1) that brings back the revision text. Obviously, very wrong.

The wikitext can just be retrieved from the API or dumps.

Thanks for a hint. Could you please advise what API would that be, because parsing the dumps just in order to figure out a few edits is an obvious overkill?
Thanks a lot, @Addshore

You should be able to used the user id to look up a lister of contributions, for example https://en.wikipedia.org/w/api.php?action=help&modules=query%2Ballrevisions

You can get the revision content using: https://en.wikipedia.org/w/api.php?action=help&recursivesubmodules=1#query+revisions

Take a look at the examples at the bottom of the sections linked to!

@Addshore Thanks.

@Verena As of the following:

BT3 - Specific Task Banner wmde_abc2017_bt3 (green)
Text: You can improve the reliability of Wikipedia! CTA: Learn how to add citations
The landing page leads to a page which lists a number of pages who include a certain template. So ideally edits were made to pages who have or had this template.

since you haven't specified what is the template that the pages belonged to, I've looked in user revisions to find out whether they have used any citations (defined as: having used the <ref> syntax in their mediatext). The answer is: no users who have registered via wmde_abc2017_bt3 made any citations at all.

@Verena And finally, as of:

BT1 - Specific Task Banner wmde_abc2017_bt1 (blue):
Text: You can make Wikipedia more vivid! CTA: Learn how to add pictures to articles
The landing page text briefly explains how to add pictures to articles and provides a video. The focus is not to upload new images but to include them in articles. However both is fine. Either way these users make an edit which involves file linking to Commons.

No wmde_abc2017_bt1 registered users have ever used File[[... something in their page revisions.

The final version of the Report is here:

GoranSMilovanovic lowered the priority of this task from High to Low.Nov 16 2017, 12:51 AM

Hi @GoranSMilovanovic,
Thanks for your investigations. As for wmde_abc2017_bt1 the Tag for linking a File would be the following [[Datei:Name.jpg|...|...]]. Did you also check this version of code? Maybe it will show other results. Best, Stefan

Hi @Stefan_Schneider_WMDE thank you for a hint, I will check it out immediately and let you know.

@Stefan_Schneider_WMDE Well, my "immediately" had obviously acquired some general relativistic property of time dilation here. It is due to a complicated dental intervention that I had to undergo. Reporting back as soon as I can.

@Stefan_Schneider_WMDE No usage of [[Datei:Name.jpg|...|...]] can be detected in the wmde_abc2017_bt1 user revisions.

@Verena @Stefan_Schneider_WMDE Please let me know whether any additional analyses are needed for the Autumn Banner Campaign 2017.

@Verena @Stefan_Schneider_WMDE Please: do you need any additional analytics for the Autumn Banner Campaign 2017, or shall we close this task?

I would like to go for T171990 next and document all necessary procedures and standards for future banner campaign analytics.

@GoranSMilovanovic

Did you use the exact text? [[Datei:Name.jpg|...|...]]
I would suggest just to use the first sample, like [[Datei:. That would be the exact sample that could represent any File, that could have been used.

Thanks in advance!

Kindly,
Stefan

@Stefan_Schneider_WMDE That is exactly what I did. The first step was to use only [[Datei: to see if it appears at all (and it does not). The second step would be a regex pattern to count the exact number of files used, but there was no need for that.