- Providing analytics support to Moderator Tools, Language and Community Tech teams.
- For my volunteer profile, please visit: KCVelaga
User Details
- User Since
- Sep 15 2021, 11:36 AM (139 w, 6 d)
- Availability
- Available
- LDAP User
- KCVelaga
- MediaWiki User
- KCVelaga (WMF) [ Global Accounts ]
Today
@ngkountas I am not sure about schemaId, I will check and get back to you.
The revision tags are: contenttranslation or contenttranslation-v2
@Samwalton9-WMF I have expanded the analysis to the whole month of April.
I was able to individually trigger and check the events on the console (except return_from_section_selection), on tewiki. However, the events are not being processed to due to event gate validation errors. This is because the instrumentation is pointing to an older version of schema, /analytics/mediawiki/content_translation_event/1.2.0, however the latest version after updates to the event source is 1.4.0.
@Samwalton9-WMF I am resolving the task as the work is complete within the given scope. Feel free to re-open if needed.
@Pginer-WMF I am resolving this task as the work is complete within the given scope. But we can continue the discussion as needed.
Yesterday
Adding a note for future reference, that this pipeline requires looping through a list of wikis, some example DAGs for reference:
Thinking beyond just this event, this is a use-case that will pop-up across various schemas related to the Language team. The ability to capture both language code and page id for both source and language will be beneficial in the longer term. The approach you suggested sounds good. It will also make it easier for analysis, as compared to extracting values from a string in action_context.
- Stream name: mint_for_readers - I am not sure if there is any significance of adding mediawiki prefix, if yes, then it can be, 'mediawiki.mint_for_readers`
- Regarding action_context for auto_translation_card: yes, you are right, it should be language codes separated by a semi-colon. In the spec document, @phuedx proposed using a page object (a new schema fragment?), which can also capture page_id if required, as something like, source_page: {lang: 'en', 'id': 1234}. Sam: can you explain more about that here?
- For user searching a topic: you are right, this should be initiated when users types something in the search input. Thinking again on this, click doesn't make sense for this, instead we can restructure it as action: search
Update: There is nothing left to do from the analytics side, we can resolve this task after the legal review is complete.
@Arinaigu has migrated the tables, pe_ed_campaigns & pe_ed_courses from kcv db to programs_events_dashboard
Yes, I confirm all of them can be deleted.
Sun, May 5
Mon, Apr 29
Update: I have restructured the events into MP compatible format.
@Wangombe I have drafted the instrumentation spec, in according with the Metrics Platform schema. Please review and let me know your thoughts.
Thu, Apr 25
Tue, Apr 23
@Pginer-WMF the draft of the analysis report is ready for review at https://kcvelaga.quarto.pub/cx-deletion-rate-factors-2024/
Apr 18 2024
@Samwalton9-WMF as we discussed (also the original baseline), I have added another table to show average number pageviews per potentially vandalized revision.
Apr 17 2024
@Wangombe Sorry for the delayed response. For this, it is better to use Metrics Platform which has been built on Event Platform. Eventually, all schemas will be migrated to Metrics Platform, as that will be the standard. We are currently working on the same for MinT for Readers instrumentation: T341185. I will start working on creating a schema next week.
Apr 16 2024
@Samwalton9-WMF I updated the notebook to make the counts cumulative of all thresholds above a given threshold.
@Arinaigu I did some clean-up, published the processing notebook at https://gitlab.wikimedia.org/kcvelaga/pe_dashboard_programs_processing/-/blob/main/monthly_programs.ipynb?ref_type=heads.
I will share the secrets folder with you on email. We can discuss the next steps when we meet next week.
@fnegri that'll be amazing, thank you! Also, a quick question, does this also enable Superset's SQL Lab access to tool dbs, or is that seperate?
@phuedx thanks for the review.
Apr 12 2024
Could you add an extra column to the data, with the % of total pageviews to vandalised content which we would be preventing? i.e. for the enwiki @ 0.99 line, if I'm understanding correctly we have 38,527 pageviews. 92.7% of them are prevented, giving us 35,715 pageviews prevented. What is this as a % of the 2.37 million pageviews which we found were to potentially vandalised content on enwiki?
Apr 10 2024
Apr 8 2024
@nshahquinn-wmf has reviewed the analysis (thank you!). I have made some improvements based on the feedback, and it is updated at the same link.
Adding a note that this task: T354514: Transfer P&E dashboard API processing workflow (from Product Analytics to Community Growth) should be a part of this transition as well.
Apr 3 2024
@nshahquinn-wmf thanks for the review and the suggestions.
@Pginer-WMF thanks for reviewing and the comments
- You are right, the names assume some familiarity with the entry points, but that's not ideal. It is best that anyone with little to no familiarity of the CX workflow can understand the report. For now, I have added a link to the documentation. For the next iteration, we can have images and also more descriptive names rather than technical variable names for the entry points.
Apr 2 2024
This currently blocked by availability of revert risk scores for Feb 2024 and beyond (T341777#9679533)
Apr 1 2024
Documentation has been added whenever needed (comment, docstrings, readme)
Mar 28 2024
I am happy to work on making an update to the code once the classification and the naming gets finalized.