User Details
- User Since
- Feb 15 2022, 2:51 PM (198 w, 2 d)
- Availability
- Available
- IRC Nick
- aiko
- LDAP User
- Unknown
- MediaWiki User
- AChou-WMF [ Global Accounts ]
Mon, Dec 1
Thanks for everyone's help. This task is resolved. :)
Fri, Nov 28
Weekly Report
Wed, Nov 26
Tue, Nov 25
Fri, Nov 21
Weekly Report
Thu, Nov 20
Tue, Nov 18
T363725 might be related, though that task focuses on handling redirects.
A continuation of this task is T408341: Q1 FY2025-26 Goal: Task generation engine for Revise Tone task
Hi, thanks all for the input. :) Due to our tight timeline, ML team has decided to move forward with Option D for now.
That said, we can later define a set of performance expectations (eg for latency), which will then help us to assess whether any of the other options would provide sufficient benefit to justify any additional efforts. Thoughts?
I agree! We should follow up on this and revisit the topic in the future. ML team would really like to see this work happen, as we will have other similar use cases that could benefit from mediawiki.page_content_change.v1 in Kafka main.
Mon, Nov 17
Fri, Nov 14
Weekly Report
Thu, Nov 13
After meeting with @Michael today, we agreed to first enable Testwiki for more controlled experimentation with both the update pipeline and the Newcomer Task integration. This means we will (1) load the initial Testwiki dataset to staging Cassandra and Search weight tags, and (2) enable the Revise Tone Task Generator on Lift Wing for Testwiki.
cc @BWojtowicz-WMF
@BWojtowicz-WMF We have the initial dataset for frwiki. We can use this dataset to test our new service.
Once the Cassandra <-> Lift Wing connection is built, we can load this data to staging Cassandra from Lift Wing. Then using test events to trigger Lift Wing updates in Cassandra and verify our Cassandra integration works.
We have the initial dataset for frwiki.
Hi @Joe! The Machine Learning and Growth teams are collaborating on a GrowthExperiments newcomer task for revising tone (associated hypotheses are WE1.1.2 & WE1.1.17).
Wed, Nov 12
With respect to GRANTs, is it safe to assume that MODIFY is sufficient? There is no requirement to do reads here, is there?
When the service starts, Lift Wing will validate whether the target table exists, so we'll need SELECT as well. @BWojtowicz-WMF, is it correct?
Tue, Nov 11
Mon, Nov 10
@DPogorzelski-WMF The service to connect to Cassandra is the revise-tone-task-generator that @BWojtowicz-WMF is working on in T408538. Currently, it is only deployed in the experimental namespace on ml-staging. We're thinking to either create a new namespace for this service or deploy it under the edit-check namespace.
Option A would require some talk with SRE but given the size of the topic and the current /srv usage in main-eqiad / codfw I don't see any big opposition in having the stream hosted there (especially if we advertise that ML will not need to query the mediawiki API as direct consequence for the use case). It would probably be the most clean and reliable option in my opinion.
regarding 2. would flipping egress to true here be sufficient? https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/+/refs/heads/master/charts/kserve-inference/values.yaml#43 or perhaps a specific policy in GlobalNetworkPolicies in ml-serve.yaml?
Fri, Nov 7
Weekly Report
Thu, Nov 6
@DPogorzelski-WMF Yes, Cassandra is on the prod network, and @Eevans should be able to provide more info about this.
Hi @klausman, I'd love to hear your thoughts on what we need to do to make this Cassandra integration.
Wed, Nov 5
Done. This is the notebook that demonstrates how to generate tasks.
@dcausse Thanks a lot! I found it was also missing $schema. (eventgate complained about it)
I've collected articles in English (en), French (fr), Arabic (ar), and Japanese (ja), then generated paragraph data using Spark.
- Article topics
"Culture.Biography.Biography*", "Culture.Biography.Women", "Culture.Sports",
- Data cleaning
- Sections to skip
"en": [
'See also',
'References',
'External links',
'Further reading',
'Notes',
'Additional sources',
'Sources',
'Bibliography'
],
"fr": [
'Notes et références',
'Annexes',
'Bibliographie',
'Articles connexes',
'Liens externes',
'Voir aussi',
'Notes',
'Références'
],
"ar": [
'وصلات خارجية',
'قراءة موسَّعة',
'الهوامش',
'انظر أيضاً',
'الاستشهاد بالمصادر',
'انظر أيضًا',
'مراجع',
],
"ja": [
'脚注',
'参考文献',
'関連項目',
],- Prefixes for links/files/category to remove
"en": ("file:", "image:", "category:"),
"fr": ("fichier:", "image:", "catégorie:"),
"ar": ("صورة" ,"ملف" ,"تصنيف"),
"ja": ("file:", "image:", "category:"),- Paragraphs/plaintext that start with to skip
- *: list items
- |: table or template leftovers
- <blockquote>
- <ref>
Nov 4 2025
Given our tight timeline, we'd like to have Cassandra and the Data Gateway ready this week, so we can begin integrating with Lift Wing soon. I need to make the final call to move things forward. I've read through both of your points, they're all valid.
Nov 3 2025
I think so, yes. If you have specific mock data in mind, a csv-formatted file should work.
Oct 31 2025
Weekly Report
@Eevans I also wanted to follow up on the next step for this task.
Oct 30 2025
It’s good that we’re discussing this! I've learned a lot :)
Oct 29 2025
Trying to address the blocking ones: