User Details
- User Since
- Nov 7 2014, 8:52 PM (531 w, 3 d)
- Availability
- Available
- IRC Nick
- mforns
- LDAP User
- Mforns
- MediaWiki User
- Mforns (WMF) [ Global Accounts ]
Yesterday
The job finished successfully and after vetting the data for a while, I couldn't find any inconsistency or wrong data.
The MediaWiki History data had already been vetted extensively, and this job only transforms it to a TSV format.
So, I trust a simple data check to determine that this works fine :-)
Will move to done.
Fri, Jan 10
I tested the changes to the MediaWiki History reduced query, by executing it reading from the 2024-11 snapshot of the newly generated test temp account MWH data.
Just to clarify, IIUC, this will only break logging dashboards, right?
So, if we had no associated logging dashboards, then this would be a no-op?
🙏
One potential improvement regarding the clock ticks that we never tackled was to send them at progressively increasing intervals, instead of every minute.
The current session tick code sends ticks every minute, starting at 0, which gives us a very coarse granularity for short sessions, and too fine granularity for long sessions.
For instance, most sessions (>50%) are less than 2 minutes, but we only can tell if they are 0min long (rounded down) or 1min long (rounded down), so very little information.
At the same time, we can tell the difference between a 33min session and a 34min session, which is not that important, certainly does not justify sending 34 events.
Thu, Jan 9
After some discussions, we think that the current code does *not* set event_user_is_anonymous to true in this line of code, since actor_name would be null, and so event_user_is_anonymous is also null.
So, the conclusion is that the current code is implementing the version that we thought should be implemented.
Since it seems we are on the right track, will move this to done.
@Lydia_Pintscher do you have any updates on this? 🙏
Wed, Jan 8
This has been deployed, and the calculations have started, they should be available soon.
Tue, Jan 7
I think there's a tag war between us and Herald...
In any case, this data is available now as part of the Commons Impact Metrics dumps. See:
https://wikitech.wikimedia.org/wiki/Commons_Impact_Metrics/Data_Model#Media_file_metrics_snapshot
Some sorting and filtering still needs to be done manually, but the metric is there.
Mon, Dec 23
@VirginiaPoundstone Yes, I think this is fixed.
Thu, Dec 19
Dec 11 2024
Dec 10 2024
Looking into this.
Dec 4 2024
Hehe, 1 and 2 were intended to be a sequence, rather than 2 options. Sorry, I think my question in the end was misleading.
But knowing that you are OK with (1), 2 is just the application of that in the form of a schema (not a fragment).
And to clarify, the reason to structure the fields in a fragment and not in a schema directly, would be to be able to re-use the fragment for app vs web base schemas.
Dec 3 2024
- maintain a different data store for historical metrics (preferably something we already have like an Iceberg table)
+1
I tried this some years ago with the anomaly_detection table, but there were problems with Hive partitioning affecting Superset's querying performance.
Now, with Iceberg, that should be cool and useful!
@kai.nissen so sorry for having missed all the pings for so long.
Nov 27 2024
Nov 26 2024
This happened again from 2024-11-23, 07:35:00 UTC to 2024-11-23, 09:02:00 UTC.
Many mapped events had multiple retries all the way to 7 retries: https://airflow-analytics.wikimedia.org/dags/canary_events/grid?search=canary_events&num_run[…]00%3A00&tab=mapped_tasks&task_id=produce_canary_event
Nov 25 2024
I think this issue affects several endpoints.
All CIM AQS endpoints are using aqsassist's CreateCategoryNotFoundProblem method to generate a problem message.
But not all of them have a category as a parameter.
We could chose a better problem generator from aqsassist, like: CreateMediaFileNotFoundProblem.
But there are some endpoints that have more than one parameter, like category and wiki, and aqsassist doesn't have such a problem generator method.
We could just choose the most important parameter and just modify the commons-impact-analytics code,
or we could create more problem generators in aqsassist, or even generalize the existing ones to include parameters as needed.
Nov 19 2024
@Ottomata Is there anything we Data Products could do to help with this? Or is it just as a heads up? 🙏
Nov 14 2024
Nov 11 2024
Nov 7 2024
Nov 5 2024
@JEbe-WMF Hi! I saw you created the patch with a new query and a new DAG, as if we were going for running a track of pipelines parallel to production.
I thought that strategy was discarded, and that we would fo for regular modifications to existing queries and DAGs, but maybe I misunderstood it...? 🙏
@Dbrant Thanks a lot! I merged the patch, will be deployed in the next train.
Hey all! I was asked to review another unrelated sanitization patch, and I saw that the sanitization snippet for app_patroller_experience was not hashing the fields app_install_id and app_session_id.
Those are long lived identifiers and should ideally be hashed. Is there a reason not to? 🙏
Nov 4 2024
@SGupta-WMF I retried all the process and both unit an integration tests work for me, also data ingestion on the cassandra test env side.
I asked Emeka to try it too, to see if he could reproduce your errors, but we managed to ingest the data and pass the tests fine.
Could it be something on your side?
Oct 30 2024
Oct 29 2024
We fixed the issue that was preventing your category to be queried via the API endpoints.
It is working now, see: https://wikimedia.org/api/rest_v1/metrics/commons-analytics/pageviews-per-category-monthly/Files_provided_by_Centro_de_Fotograf%C3%ADa_de_Montevideo/shallow/en.wikipedia/00000101/99991231
Oct 28 2024
I think the timing depended on when is data going to start flowing in. Probably Q3?
Oct 25 2024
I finished testing the changes.
Oct 24 2024
Did you run make bootstrap?
Yes, I:
- Dropped the related docker containers all together.
- Deleted the data-gateway/ directory, since it has significant changes.
- Executed make startup.
- Executed make bootstrap.
Oct 23 2024
@SGupta-WMF could you provide more details about the failure?
(before I pushed the changes, it was working for me)
Oct 21 2024
Oct 18 2024
@Stevietheman Hi! Thanks for letting us know about this issue.
Oct 16 2024
Oct 15 2024
@SGupta-WMF Thanks a lot! I have pushed the changes on the cassandra test env, plus some more changes on the AQS service MR, related to the integration tests. 🙏 🙏 🙏
Oct 11 2024
Oh my. Just realized I've been neglecting this task for months. Sorry for that.
Oct 9 2024
Thanks for the ping @Aklapper!