Fri, Mar 27
Fri 27 Mar 2020 11:16:16 AM UTC
Thu, Mar 26
Thu 26 Mar 2020 11:35:59 PM UTC
Tue, Mar 24
Here go your pageviews, data collection from 2020-03-10 - 2020-03-23 (two weeks):
- Data collection 2020-03-10 - 2020-03-23 started from stat1005, HiveQL.
Mon, Mar 23
@Ottomata No as far as I am concerned (switched to Pyspark, while SparkR would do - if I ever need it).
- fetch a random sample of SPARQL queries from /sparql and /bigdata/namespace/wdq/sparql paths in the wmf.webrequest;
- provide exploratory data analysis to sort out the most frequently used exactly identical queries;
- perform simple feature engineering to characterize the queries;
- perform query clustering, characterize large clusters, and then study the distribution of query frequency across the clusters;
- WDCM (S)itelinks and (T)itles still use a Hive based ETL procedure;
- it shouldn't take too much time to switch to Pyspark.
- WDCM (T)itles and WDCM (S)itelinks - HiveQL: DONE.
- WDCM Biases: DONE.
Tue, Mar 10
Just to confirm as a Data Scientist for Wikidata, WMDE: we absolutely want to have this available from our stat100* machines.
Mon, Mar 9
@Christine_Domgoergen_WMDE Found it, there were two pageviews marked by the following tags:
Thu, Mar 5
@Christine_Domgoergen_WMDE You are welcome!
What do you mean by that? Is there documentation about the standards, do you know?
Wed, Mar 4
As of T240351#5937402:
As of T240351#5936973:
Mon, Mar 2
The reporting spreadsheets are now made tidier and are certainly fully consistent with the data in the Report.
In the updated version of the report (shared here), the banner actions rates are included in section 1. 1. 4. immediately following the full data set for banner actions.
Sun, Mar 1
@Christine_Domgoergen_WMDE Also, I have noticed that some charts are not properly sorted on the horizontal axis representing date.
I need to calculate the closing and extension rate of the banner from the banner actions table - do you have the data from 1.14 in a spreadsheet?
in respect to T240351#5911537:
Feb 26 2020
I am currently working on T239200, T239199, and there are things that have a set deadline for tomorrow (e.g. updating the Wikidata Languages Landscape system). As soon as I can switch focus to this campaign I will get back to you.
Feb 15 2020
@Christine_Domgoergen_WMDE The end of February is fine, thank you for getting back to me on this one.
Feb 12 2020
As the time passes by I am getting more and more focused on other projects and some day it might be difficult (and more time consuming) for me to get back into this campaign and provide additional analysis.
@Janina_Ottma_WMDE - what are the chances to meet and discuss what else needs to be done in the near future?
Feb 4 2020
@Christine_Domgoergen_WMDE Report with updated data from our recent work is here:
Feb 3 2020
@Nuria Thank you very much for your assessment and your suggestions. Closing the ticket.
Feb 2 2020
- however, the reason why goransm.wdcm_clients_wb_entity_usage when it should not have been is still unclear;
- closing the ticket, monitoring continues.
Feb 1 2020
- WDCM_Sqoop_Clients.R seems to be running smoothly from stat1004
- first check: enwiki data set produced successfully.
Jan 31 2020
Jan 30 2020
@Nuria Thank you for a prompt reply.
Jan 29 2020
@Dydimusz and I would really like to proceed with his research project.
Could anyone take a look at these data sets and let us know if they comply with the WMF policy?
Thank you very much.
Jan 27 2020
@Janina_Ottma_WMDE: sorry - I forgot to submit the report file here ^^ T240351#5833762.
It is an R Markdown Notebook - essentially and html file - with all the data, cross-tabulations and visualizations included. Just download it and open in your browser.
We use the Google Spreadsheets only for daily reporting during the campaign.
When you read the Report, please get back in touch with any questions that you might have. The A/B tests are not included (thus: Interim Report), for example, because many are possible so we need to discuss exactly what tests do we want to run (i.e. what tests make sense).
It would be the best to have a 1:1 Google Hangouts session to discuss the Report, especially if this is your first WMDE Banner Campaign. Thanks!
@Janina_Ottma_WMDE Here is the Interim Campaign Report for the WMDE Thank You 2020 Campaign.
@Christine_Domgoergen_WMDE Just to stay in touch on this, I have just completed the report for the WMDE Thank You 2019/2020 campaign (T240351) and I will now focus on the re-do of this (WMDE Autumn Banner 2019 Campaign 2019) campaign. As soon as the report is ready it will be shared here.
Jan 23 2020
Because of the WMDE_ vs. WPDE_ thing, I will have to re-run all data collection procedures.
Reporting back ASAP. @Janina_Ottma_WMDE The Campaign Report will be delivered tomorrow - re-running data collection will take some time.
@Janina_Ottma_WMDE from what I see in the data I can confirm what @kai.nissen is saying in the following way: the only registrations that we have from desktop have occurred on the first day of the campaign. I will re-check the data sets but I am pretty sure that this is the case. I should also be able to deliver the Campaign Report tonight, and then we can discuss it whenever you are ready.
Jan 22 2020
@kai.nissen So it's a Hive table. Ok. Thank you.
Jan 21 2020
@Jan_Dittrich Great! Would like to have the ETL procedure put on a crontab and run a regular monthly update, or shall we say just ask me when you need the data again?
We have used the Training Modules for the WMDE Thank You 2019/2020 campaign and would need the data set.
Tha campaign was run between 1 and 20 January, 2020.
Jan 20 2020
From T240361: the campaign got disabled yesterday 18:00 -> On to the Final Report now.
@Janina_Ottma_WMDE The 2020/01/18 and 2020/01/19 updates are ready; no new user registrations.
Could you include the new data in the final report so we have a final version with all the available data? This would be great and has no priority, just for documentation this is important.
Jan 18 2020
@Lydia_Pintscher Is any additional work needed here or the ticket can be resolved?
@Janina_Ottma_WMDE The 2020/01/17 update is complete; no new user registrations.
So one other information I still need (or did I miss it somewhere?) is the page views via the tag WMDE_neweditors_autumn_2019_flyer from October 7th to November 1st
@Christine_Domgoergen_WMDE What I can tell you right now is that we will be able to process the pageviews from flyer from 2019/10/20 onwards (that would be 90 days in the past indeed).
Reporting back as soon as the query is done.
@Christine_Domgoergen_WMDE Sorry for not being able to respond any sooner than this.
Jan 17 2020
@Janina_Ottma_WMDE The 2020/01/16 update is complete; two new user registrations.
@Janina_Ottma_WMDE Running the 2020/01/16 update now.
How are these edits split by edit class (as in final report 3.1.)? Could you add this information?
Also if you say the numbers in the spreadsheet are the ones to trust, we would need the timeframe from October 28th onwards because the newsletter was send out on October 28. Could you add them to the table?
So just to be sure: these are all page views of Wikipedia_vor_Ort in November from the tag WMDE_neweditors_autumn_2019_nl_lp1?
Jan 16 2020
In reference to: T235839#5809630
In reference to: T235839#5809565
@Christine_Domgoergen_WMDE I will try to respond to T235839#5809565 and T235839#5809630 tonight, but I can do it only later tonight.
I will email you if I make it just to make sure you get the data in the morning at least.
@Christine_Domgoergen_WMDE And here's the November 2019 pageviews data for the WMDE_neweditors_autumn_2019_nl_lp1 newsletter banner:
@Christine_Domgoergen_WMDE We have the October 2019 data (or what is left of them because of the wmf.webrequest purge after 90 days):
@Christine_Domgoergen_WMDE I think I will be able to deliver the pageviews until this evening, and probably even earlier, say around 18:00 CET.
@Christine_Domgoergen_WMDE The pageviews thing will take a while, sorry - it's just that the code needs to search through a lot of data and was hitting heavy against the cluster resources, so I had to switch it to search day by day through October and November 2019. I will be reporting back as soon as I have something.
@Christine_Domgoergen_WMDE In order to compare the results, we will pick up everything on:
Page Views Wikipedia_vor_Ort and LerneWikipedia: in the page view tool of wikipages we have quite some different numbers (see links), which of course differ a bit but on some days are far higher than the page views you found in the database. Can you check again, if the numbers in the report are correct?
@Christine_Domgoergen_WMDE As of the following:
In an early comment in the phab ticket you mentioned some user registrations happening before the start of the banner campaign (28. - 31.10.19) via the newsletter tags (WMDE_neweditors_autumn_2019_nl_lp2 and WMDE_neweditors_autumn_2019_nl_lp1). In the final report those registrations do not show up again, maybe because the time span of the report starts at the 1st of November? Could you have a look at it again and if you can confirm those registrations update the final report and the spreadsheet if necessary? The campaign time frame starts not only with the banner on Nov 1st but with the Flyers on October 7th (see dates in the graphic in the tracking doc). Could you double check again, that we have all the figures from this whole time span included in the report? This is also relevant for the page views coming from flyers for example. Thank you!
@Christine_Domgoergen_WMDE The ticket is re-opened in respect to the following requests:
@Janina_Ottma_WMDE The 2020/01/14 and 2020/01/15 updates are ready, no new user registrations.
Jan 14 2020
@Janina_Ottma_WMDE The 2020/01/13 update is ready, no new user registrations.
The only thing that I do not understand here is the following planned column:
Jan 13 2020
Jan 12 2020
@Janina_Ottma_WMDE 2020/01/11 update is ready.
Jan 11 2020
Update for 2020/01/10 is in the Spreadsheet, one fresh user registration on January 10.
@Janina_Ottma_WMDE Please let me know until when do we run this campaign. Thank you!
Jan 10 2020
Updates for 2020/01/07 and 2020/01/08 included, no new user registrations since January 5.
@Janina_Ottma_WMDE When does this campaign end?
Jan 8 2020
Updates for 2020/01/05, 2020/01/06, and 2020/01/07 are now included.