- Currently providing product analytics support to Language and Product Localization team, focusing on analytics for the Content Translation (CX) tool and related areas.
- For my volunteer profile, please visit: KCVelaga
User Details
- User Since
- Sep 15 2021, 11:36 AM (240 w, 16 h)
- Availability
- Available
- LDAP User
- KCVelaga
- MediaWiki User
- KCVelaga (WMF) [ Global Accounts ]
Tue, Apr 21
I would like to announce the new stewardship of this project as a followup to this email:
And what names would you like me to use for you?
Theoretically we can also answer most of these questions with Superset. However, Superset can be slow with webrequest and is prone to timeout. Something that could be helpful in the long term is T413952: [Request] Sample of refined webrequest table in data lake. It reduces dependency on fields being added to Turnilo each time.
Fri, Apr 17
- Analytics support provided during Q4
- Analytics support provided during Q3
All looks good to me. The maintenance DAGs have been running successfully. Thank you
Tue, Apr 14
@brennen hi! thanks for taking up the other request, wondering if you can create this one as well? Just wanted to check as it's been a few days, totally okay if not.
Thanks, all good now.
Mon, Apr 13
Hi @brennen, thanks for creating. I think I am not yet added to the repo, I am getting a 404 error when trying to access.
Fri, Apr 10
Thu, Apr 9
Tue, Apr 7
Just adding another usecase to have this data in Turnilo.
Thu, Apr 2
Wed, Mar 25
The request has been created for documentation, the request has already been completed by @JAllemandou.
Mar 4 2026
I tried to copy over the query for that graph to Quarry, but I was getting an error. Despite the error, I have the impression that Quarry seems quite SQL-centric which may make it hard to serve usecases where communities may need an intuitive overview they can easily adjust without SQL or database knowledge. I'm less familiar with Quarry and maybe the error is preventing me from making a fair comparison. So maybe someone else may be able to illustrate how to support this better with Quarry.
My understanding is that this is not currently a high priority in WE5, so we have a little bit of freedom to schedule it in the coming weeks. Let me know if this is wrong, or it changes.
Mar 3 2026
@JAllemandou @GGoncalves-WMF I put together a initial list on this spreadsheet.
Feb 27 2026
@GGoncalves-WMF the updated task description captures all the details well, thank you!
Thanks @GGoncalves-WMF for creating this.
Feb 6 2026
If this is high priority, let us know?
@bd808 I believe having a public instance is still useful for the usecases I mentioned above, even if that was not the originally intended goal. We can try Cloud VPS, but recreating dashboard, configuring DB etc. will be take some time, and not sure how much effort it would be.
Feb 3 2026
@elukey didn't realize the newly labelled levels, nice! Gerard would want analytics-privatedata-users level 1 for now, as he needs to access Superset and also the datasets restricted to analytics-privatedata-users (no need for Kerberos, and SSH for now). Thanks.
Hi @taavi! While I understand the original intention of having a public Superset instance is to replace Quarry, which didn't happen. We have been using Quarry to develop and run queries against Wiki replicas, while Superset is really intended for visualization and dashboard. Unless someone has the need for latter, they don't have the need to go to Superset instance - which may have been the case with many users of Quarry.
Jan 7 2026
@CDanis Please share if you have any other notes / details that would be useful from the SRE's approach. I think round-robin per CDN hostname across the stream would be better. Also, proposed a higher sampling rate as Spark should be able to handle it without much issues.
Nov 28 2025
Marking this as completed as the first version is functional. Additional tasks can be filed for additions/improvements.
Nov 11 2025
Nov 10 2025
Seems like it coincides, the jobs were scheduled to run on 5th of every month.
I re-ran manually, everything is back up again.
Okay, I don't know what happened, I re-ran the scripts manually again, they ran fine. Not sure if this needs to be further investigated, or probably a one-off connection issue at the time of the monthly update.
Triaging this as high as it affecting at least two public facing dashboards.
Oct 27 2025
The data is now updated and the charts are working.
@santhosh The Airflow pipeline is fixed, I will re-run the Toolforge update scripts.
Oct 24 2025
I have updated the dashboard to include rowiki.
Oct 22 2025
Oct 20 2025
Oct 8 2025
Thanks Miriam. I will update this task with more concrete support items we might need for Q2.
The analysis and the outputs have been shared with Halley.
cc @cchen @OSefu-WMF
Sep 18 2025
Sep 12 2025
Sep 11 2025
Yes, this is not relevant. Reportupdater is no longer (I think). But all the pipelines are based on Airflow
Sep 9 2025
@GGoncalves-WMF I have been able to work with the scripts Marcel and Jennifer developed for processes the logs - super helpful! A quick question, the logs at /srv/log/webrequest/archive/dumps.wikimedia.org/ and then processed. I am curious about what process publishes logs to that location - is it solely from webrequests or some other source?
Sep 8 2025
Sep 4 2025
I don't think it is possible with any of the available contextual attributes. I have to check with Exp Platform on this one.
Hi @Miriam! I can definitely say until end of Q2. At the moment, I don't know yet what needs might evolve for Q3 and beyond. If needed, I make a new request or update this one - will that work?
Aug 30 2025
cc @OSefu-WMF
Aug 28 2025
@MartinRulsch sorry, I completely missed your question, sorry for about that. In future, feel free to ping again or send an email if something hasn't been replied for long.
Aug 27 2025
@GGoncalves-WMF thank you!
Hi all, as this analysis will inform a decision within the WE5 scope of work, so I can take this up (confirmed with @HCoplin-WMF on Slack).
Aug 12 2025
Posting the A/A test and A/B test experiment flow here as well, and when a pagevisit event should be logged.
Aug 11 2025
Tagging @cchen for QA.
Aug 4 2025
For the current Synthetic A/A test, are we ONLY interested in tracking the page visit event, and not the entry points to visit MinT wiki readers (which would be handled in a later A/B test)
Do we interested if user leave the page and possible because of page loading time?
if yes then we need to log the "duration" as well when page unload.
Note: Currently, there is no "unload" log event
Jul 27 2025
Jul 24 2025
@hueitan thanks for pointing that. Please use fy25-26-we-3-1-5-mint-readers
Jul 23 2025
The dashboard is functional at: https://superset.wmcloud.org/superset/dashboard/unified-cx-metrics/
Jul 22 2025
no schema changes are required, just updating the ID is fine. 1.4.2 includes the experiment related fields, which will be used when the experiment is live.