When I checked by platform, Initial analysis shows a significant drop in init events for Desktop and not Phone. @Catrope has confirmed this could have impacted desktop. Will run few more checks and conclude my findings.
Fri, Dec 6
Amanda has also requested that in addition to the number of files they want to compare the metadata in them. Here's the mapping of template vs. sd metadata...
We are not entirely sure if it is possible to get data regarding template parameters from the data sources (data lake, mariadb replicas etc. ) we currently use. Will discuss with everyone next week to decide on this.
Here is the Github notebook that has numbers for files containing common templates like Information and Artwork.
I am yet to merge it with the wikimedia-research/SDC-metrics-2019 as it is pending review from @nettrom_WMF
Wed, Dec 4
Completed QA for Q2 and Q3. Everything looks good except for when I checked for Wiki field, I found 1 init event recorded on 10-01-2019 for commonswiki which is not part of the experiment as mentioned in the link. This is definitely Not at all an issue since it is just one event but whatever caused this event to be bucketed could potentially cause many other events from this wiki to be bucketed in the experiment maybe in the future and wanted to bring it to everyone's attention.
Summary of the checks performed posted in this QA Document.
Will post notebook soon.
Tue, Dec 3
Meeting with Amanda and Ramsey helped clarify some outstanding questions we had. Meeting minutes are provided in this document.
Some highlights from the discussion:
- for Quarterly comparison of metadata on files with a common template : Amanda and Ramsey are specifically looking for Information template and Artwork template only
- for Quarterly comparison of when metadata on file pages are edited : Did the addition of SD encourage users to visit the file page and update it after 60 days? Does that number vary between quarters? [task description has been changed accordingly].
- SDC team is interested to know : Does the presence of structured data features make people come back to old files. “Commonists add more and richer metadata to 1% of Common’s media files by the end of FY18-19.”
- Amanda was okay to go ahead with the grant proposal without these numbers, and so we have a little more time beyond Dec 6.
- Priority of the Metrics : First - quarterly number of files metrics, then Search metrics
Mon, Dec 2
Tue, Nov 26
Mon, Nov 25
Fri, Nov 22
Discussed with Mikhail and Morten on the Quarterly comparison metrics. Meeting scheduled with Amanda next week 11/26 for a few clarifications on the metrics.
Completed discussions on :
Thu, Nov 21
You should have an email with the tmp pass! @Mayakp.wiki please check in your spam folder if you don't find the email, I have tested my script to create identities only with @wikimedia.org accounts so far :)
Roan deployed the fix on 11/19 and I QAed the Init events, also checked with Ready and Loaded events but didnt find a dip in Init events. Will check again tomorrow to compare with data from 11/1 to 11/18 vs 11-19 to 11-21
Discussed with the team during Status and planning 11/19 and opened 2 tickets :
- T238682 for getting durable source of global registration dates into the data lake.
Tue, Nov 19
Hello! Requesting Kerberos credentials for Hadoop access on stat100X and notebook100X.
My username is Mayakpwiki
Moving to Backlog, as this isn't urgent and will be taken up at a later time.
Sat, Nov 16
Checked that the buckets are Balancing correctly - overall as well by Browser and Country. Updated values provided in this document.
Will add notebook to Github as well.
Fri, Nov 15
We were able to identify the root cause : as a part of T222101 Morten changed the whitelisting for tracking isApi and removed userName from the list.
PA will be looking into the next steps and this will not require any further investigation from Analytics.
Working with the team on consensus for Next steps :
Long term options : use mediawiki application table for getting account creation information instead of event logging
Short term (immediate) option : submit another request to change whitelisting and add userName to the list ?
Discussed with Kate and Max Binder about shipping this around:
@nettrom_WMF : We did change the whitelisting for serversideaccountcreation earlier this year, maybe around that time, to get isApi also stored (it was somehow not listed). And somehow ended up removing userName from the whitelist, which is probably what caused this issue. Here is the phab ticket link.
Hi Nuria, we are using data from event_sanitized database and observed this issue in event_sanitized.serversideaccountcreation
Thu, Nov 14
Ran this query with date filters and found that registrations with Null usernames started on June 5, 2019 (2019-06-05T17:00:12Z)
Steps for Data POC : as discussed with Jason / Mikhail during PA offsite
Nov 2 2019
QA for Q2 and Q3 in progress. Completed checks for few of the metrics.
There are a few observations which I will discuss with Megan next week. Update on checks are documented here.
Oct 31 2019
Discussed with Megan on 10/30 regarding:
- Clickthrough rate
- Clicks or scrolls to more results
Presented QA Assessment Round 1 findings to the Product Analytics team on 10/30. Next steps: Add the implementation slide, incorporate feedback from everyone, plan for presentation with different product teams and the Analytics team
Thanks @ppelberg for the update.
Oct 25 2019
Oct 24 2019
Presented a quick high level overview of different responses to PA QA Assessment Round 1 during the Team sharing meeting on 10/24. Next will work on creating a formal presentation of this for defining QA process, further discussions with different teams like product, analytics, etc. and the support we need from them.
Oct 22 2019
All checks completed. Data for the instrumented Actions is getting logged correctly in VisualEditorFeatureUse (for mwSave feature) and EditAttemptStep schemas. Marking this ticket as resolved.
Will post notebook link later this week.
Oct 21 2019
Megan has recently shared the list of fields that will be used to answer the research questions (T221195 and T232175#5545364 ) with me. I will be checking the data in those fields this week to answer Q2 and Q3.
Oct 17 2019
Oct 16 2019
Thanks @DLynch for that explanation. It was really helpful ! and I now have much better clarity.
Oct 15 2019
Completed Round 1 of QA assessments with analysts. Will begin deep dive i.e. Round 2 from tomorrow 10/14.
Checked the distribution using Neil/Megan's initial code, for data starting from 10/10 (date when fix was applied). The buckets seem to be balancing now. The data is being collected 50-50 in both buckets.
Oct 14 2019
The following are in Passed status as mentioned in my QA document:
Oct 12 2019
Oct 10 2019
Identified that adding these filters in the dashboards or monthly metrics is a pain point during my discussion with Connie and Megan, . Resolving this issue would greatly help everyone!
Currently working on Round 1 of QA assessment discussions with Analysts - Morten, Megan, Connie, Mikhail, Neil
Oct 9 2019
Megan will check with Peter tomorrow on the following observations. If we check when the above events started firing it will help to get better clarity.