Page MenuHomePhabricator
Feed Advanced Search

Jan 24 2024

klausman moved T355757: Drain & shutdown ml-serve2005.codfw.wmnet for physical move from Unsorted to In Progress on the Machine-Learning-Team board.
Jan 24 2024, 3:20 PM · Machine-Learning-Team
klausman added a comment to T355437: Relocating servers out of A1 in codfw.

ml-serve2005 is off and ready

Jan 24 2024, 2:15 PM · Data-Persistence, SRE, ops-codfw
klausman updated the task description for T355437: Relocating servers out of A1 in codfw.
Jan 24 2024, 2:13 PM · Data-Persistence, SRE, ops-codfw
klausman updated the task description for T355757: Drain & shutdown ml-serve2005.codfw.wmnet for physical move.
Jan 24 2024, 2:03 PM · Machine-Learning-Team
klausman updated the task description for T355759: Drain and silence ml-serve2002.codfw.wmnet.
Jan 24 2024, 2:03 PM · Machine-Learning-Team
klausman created T355759: Drain and silence ml-serve2002.codfw.wmnet.
Jan 24 2024, 10:40 AM · Machine-Learning-Team
klausman renamed T355757: Drain & shutdown ml-serve2005.codfw.wmnet for physical move from Drain & shudtown ml-serve2005.codfw.wmnet for physical move to Drain & shutdown ml-serve2005.codfw.wmnet for physical move.
Jan 24 2024, 10:38 AM · Machine-Learning-Team
klausman created T355757: Drain & shutdown ml-serve2005.codfw.wmnet for physical move.
Jan 24 2024, 10:35 AM · Machine-Learning-Team
klausman updated the task description for T355437: Relocating servers out of A1 in codfw.
Jan 24 2024, 10:29 AM · Data-Persistence, SRE, ops-codfw

Jan 23 2024

isarantopoulos awarded T354516: Requesting write access to ml-staging-codfw for ML team a Yellow Medal token.
Jan 23 2024, 5:34 PM · Patch-For-Review, SRE, Machine-Learning-Team
klausman added a comment to T354516: Requesting write access to ml-staging-codfw for ML team.

This has been solved for now, though needs better docs and possibly simplification, as an extra step is needed:

Jan 23 2024, 5:00 PM · Patch-For-Review, SRE, Machine-Learning-Team

Jan 18 2024

klausman triaged T354516: Requesting write access to ml-staging-codfw for ML team as High priority.
Jan 18 2024, 10:32 AM · Patch-For-Review, SRE, Machine-Learning-Team
klausman moved T354516: Requesting write access to ml-staging-codfw for ML team from Ready To Go to In Progress on the Machine-Learning-Team board.
Jan 18 2024, 10:32 AM · Patch-For-Review, SRE, Machine-Learning-Team
klausman moved T347262: Set SLO for the recommendation-api-ng service hosted on LiftWing from 2023-2024 Q3 Done to In Progress on the Machine-Learning-Team board.
Jan 18 2024, 10:31 AM · Machine-Learning-Team
klausman moved T347262: Set SLO for the recommendation-api-ng service hosted on LiftWing from In Progress to 2023-2024 Q3 Done on the Machine-Learning-Team board.
Jan 18 2024, 10:30 AM · Machine-Learning-Team
klausman added a comment to T347262: Set SLO for the recommendation-api-ng service hosted on LiftWing.

SLO Dashboard now available here: https://grafana.wikimedia.org/d/slo-Lift_Wing_Recommendation_API_NG/lift-wing-recommendation-api-ng-slo-s?orgId=1

Jan 18 2024, 10:29 AM · Machine-Learning-Team

Jan 16 2024

klausman moved T337213: Update to KServe 0.11 from In Progress to Ready To Go on the Machine-Learning-Team board.
Jan 16 2024, 3:39 PM · Machine-Learning-Team

Jan 12 2024

klausman created P54717 (An Untitled Masterwork).
Jan 12 2024, 3:09 PM

Jan 10 2024

klausman claimed T352756: Gap in metrics rendered from Thanos Rules.
Jan 10 2024, 2:22 PM · SRE Observability (FY2023/2024-Q4), Observability-Metrics, Machine-Learning-Team

Jan 9 2024

klausman removed a project from T354516: Requesting write access to ml-staging-codfw for ML team: SRE-Access-Requests.

I think we can remove the SRE-Access-Requests tag, since this likely can be entirely covered on the k8s permission level.

Jan 9 2024, 3:25 PM · Patch-For-Review, SRE, Machine-Learning-Team
klausman added a comment to T347262: Set SLO for the recommendation-api-ng service hosted on LiftWing.

Unfortunately, Istio does not export a le=7500 bucket, but only these (the base unit is milliseconds)

Jan 9 2024, 2:12 PM · Machine-Learning-Team
klausman added a comment to T349632: Add deprecation warnings to ORES-related repositories on Github.

Amir set the mentioned repo to archived, so I think we're now done here.

Jan 9 2024, 9:22 AM · Patch-For-Review, ORES, Machine-Learning-Team

Jan 8 2024

klausman added a comment to T349632: Add deprecation warnings to ORES-related repositories on Github.

I've created https://gerrit.wikimedia.org/r/c/mediawiki/services/ores/deploy/+/988488 for the deploy repo. While it's unlikely anyone would ever want to use that repo, better safe than sorry.

Jan 8 2024, 1:33 PM · Patch-For-Review, ORES, Machine-Learning-Team
klausman added a comment to T353622: Improve Istio's mesh traffic transparent proxy capabilities for external domains accessed by Lift Wing.

Both changes have been merged and pushed to ml-staging.

Jan 8 2024, 10:21 AM · Machine-Learning-Team

Dec 14 2023

klausman added a comment to T347262: Set SLO for the recommendation-api-ng service hosted on LiftWing.

We decided to use 95% for experimental new services, see https://wikitech.wikimedia.org/wiki/SLO/Lift_Wing#Calculate_the_realistic_targets

99% seems to be a lot for the first step, even if the ceiling is higher.

Dec 14 2023, 12:24 PM · Machine-Learning-Team
klausman moved T347263: Create external endpoint for recommendation-api-ng hosted on LiftWing from In Progress to 2023-2024 Q3 Done on the Machine-Learning-Team board.
Dec 14 2023, 11:25 AM · Machine-Learning-Team
klausman moved T349632: Add deprecation warnings to ORES-related repositories on Github from In Progress to 2023-2024 Q3 Done on the Machine-Learning-Team board.
Dec 14 2023, 11:25 AM · Patch-For-Review, ORES, Machine-Learning-Team
klausman added a comment to T347262: Set SLO for the recommendation-api-ng service hosted on LiftWing.

For the availability of the service, I think an SLO similar to our Revscoring and Revertrisk services would be a good fit.

Dec 14 2023, 11:18 AM · Machine-Learning-Team
klausman added a comment to T347263: Create external endpoint for recommendation-api-ng hosted on LiftWing.

This is complete. I'll track the slash-vs-no-slash matter mentioned above in a separate ticket.

Dec 14 2023, 11:10 AM · Machine-Learning-Team
klausman moved T347263: Create external endpoint for recommendation-api-ng hosted on LiftWing from 2023-2024 Q3 Done to In Progress on the Machine-Learning-Team board.
Dec 14 2023, 11:04 AM · Machine-Learning-Team
klausman moved T347263: Create external endpoint for recommendation-api-ng hosted on LiftWing from In Progress to 2023-2024 Q3 Done on the Machine-Learning-Team board.
Dec 14 2023, 10:59 AM · Machine-Learning-Team
klausman moved T347262: Set SLO for the recommendation-api-ng service hosted on LiftWing from Ready To Go to In Progress on the Machine-Learning-Team board.
Dec 14 2023, 10:59 AM · Machine-Learning-Team
klausman moved T349180: Discuss caching strategies for Lift Wing from Ready To Go to In Progress on the Machine-Learning-Team board.
Dec 14 2023, 10:59 AM · Machine-Learning-Team, Lift-Wing
klausman added a comment to T349632: Add deprecation warnings to ORES-related repositories on Github.

The above mentioned GH PRs have all been merged.

Dec 14 2023, 10:59 AM · Patch-For-Review, ORES, Machine-Learning-Team

Dec 8 2023

klausman added a comment to T347263: Create external endpoint for recommendation-api-ng hosted on LiftWing.

From my workstation at home, both the API endpoint and the spec query work, but there are still things that could improve (more on that below):

Dec 8 2023, 9:55 AM · Machine-Learning-Team

Dec 7 2023

klausman created P54269 (An Untitled Masterwork).
Dec 7 2023, 11:20 AM
klausman created P54268 (An Untitled Masterwork).
Dec 7 2023, 11:05 AM

Dec 6 2023

klausman added a comment to T349632: Add deprecation warnings to ORES-related repositories on Github.

Opened these pull requests:

Dec 6 2023, 11:22 AM · Patch-For-Review, ORES, Machine-Learning-Team

Dec 4 2023

klausman added a comment to T349632: Add deprecation warnings to ORES-related repositories on Github.

Final version. Unless there are any objections today, I will update the linked PR today and after review,
we can merge it tomorrow. I'll then add this text to the other repos Aiko has mentioned as well.

Dec 4 2023, 4:06 PM · Patch-For-Review, ORES, Machine-Learning-Team
klausman added a comment to T349632: Add deprecation warnings to ORES-related repositories on Github.

New version:

Warning: The ORES infrastructure and Revscoring models are being deprecated by the WMF 
Machine Learning team, please check https://wikitech.wikimedia.org/wiki/ORES for more info.
Dec 4 2023, 3:10 PM · Patch-For-Review, ORES, Machine-Learning-Team
klausman added a comment to T349632: Add deprecation warnings to ORES-related repositories on Github.

My proposed big warning for those repos:

Dec 4 2023, 2:53 PM · Patch-For-Review, ORES, Machine-Learning-Team
klausman moved T347278: Decommission ORES configurations and servers from Ready To Go to 2023-2024 Q3 Done on the Machine-Learning-Team board.
Dec 4 2023, 2:46 PM · Patch-For-Review, Machine-Learning-Team
klausman moved T351114: Transient error while running lift wing topic model from In Progress to 2023-2024 Q3 Done on the Machine-Learning-Team board.
Dec 4 2023, 2:45 PM · Product-Analytics, User-Iflorez, Machine-Learning-Team
klausman closed T347278: Decommission ORES configurations and servers as Resolved.

We're all done here. The to-be-archived repos we'll handle in the separate ticket.

Dec 4 2023, 2:45 PM · Patch-For-Review, Machine-Learning-Team
klausman committed rLPRIf78f24f25fa1: hiera: clean up more ORES leftovers.
hiera: clean up more ORES leftovers
Dec 4 2023, 2:33 PM
klausman added a comment to T347278: Decommission ORES configurations and servers.

These are the remaining hits in the puppet repo.

Dec 4 2023, 11:20 AM · Patch-For-Review, Machine-Learning-Team

Nov 29 2023

klausman created P53946 (An Untitled Masterwork).
Nov 29 2023, 12:50 PM
klausman created P53940 (An Untitled Masterwork).
Nov 29 2023, 11:07 AM
klausman created P53939 (An Untitled Masterwork).
Nov 29 2023, 10:56 AM

Nov 28 2023

klausman created T352189: Document procedure for updating the underlying distro for model servers (e.g. Bullseye -> Bookworm).
Nov 28 2023, 3:40 PM · Machine-Learning-Team
klausman created P53921 (An Untitled Masterwork).
Nov 28 2023, 11:49 AM
klausman updated the language for P53919 (An Untitled Masterwork) from autodetect to python.
Nov 28 2023, 11:12 AM
klausman created P53919 (An Untitled Masterwork).
Nov 28 2023, 11:12 AM
klausman created P53918 (An Untitled Masterwork).
Nov 28 2023, 10:48 AM

Nov 21 2023

klausman moved T276438: Establish processes for running the dataset pipeline from Unsorted to Watching on the Machine-Learning-Team board.
Nov 21 2023, 3:44 PM · Growth-Team, Machine-Learning-Team, Growth-Scaling, Add-Link
klausman added a comment to T344010: Discuss potential migration from toolforge to liftwing.

@calbon I think this task is outdated, should we just close it?

Nov 21 2023, 3:33 PM · Machine-Learning-Team
klausman moved T351278: Improving error message for Revertrisk models from Unsorted to Ready To Go on the Machine-Learning-Team board.
Nov 21 2023, 3:31 PM · Patch-For-Review, Machine-Learning-Team
klausman moved T351390: Istio recording rules for Pyrra and Grizzly from Unsorted to In Progress on the Machine-Learning-Team board.
Nov 21 2023, 3:17 PM · Machine-Learning-Team, observability

Nov 20 2023

klausman added a comment to T337213: Update to KServe 0.11.

Images have been built and published:

Nov 20 2023, 5:48 PM · Machine-Learning-Team
klausman added a comment to T337213: Update to KServe 0.11.

0.11.2 was released recently. I will update the images to that version before proceeding.

Nov 20 2023, 3:40 PM · Machine-Learning-Team
klausman updated the task description for T349619: Migrate roles to puppet7.
Nov 20 2023, 11:28 AM · Patch-For-Review, Data-Platform-SRE (2024.06.17 - 2024.07.07), serviceops, collaboration-services, SRE-tools, Puppet-Core, Puppet (Puppet 7.0), Infrastructure-Foundations, SRE
klausman updated the task description for T349619: Migrate roles to puppet7.
Nov 20 2023, 10:59 AM · Patch-For-Review, Data-Platform-SRE (2024.06.17 - 2024.07.07), serviceops, collaboration-services, SRE-tools, Puppet-Core, Puppet (Puppet 7.0), Infrastructure-Foundations, SRE

Nov 17 2023

klausman added a comment to T347278: Decommission ORES configurations and servers.

@klausman everything should be done, except the work in T349632, lemme know if anything is missing, otherwise this is done.

Nov 17 2023, 2:50 PM · Patch-For-Review, Machine-Learning-Team

Nov 15 2023

klausman triaged T349632: Add deprecation warnings to ORES-related repositories on Github as Medium priority.
Nov 15 2023, 2:52 PM · Patch-For-Review, ORES, Machine-Learning-Team
klausman added a comment to T351114: Transient error while running lift wing topic model .

Ah, correction: Mikhail mentions that the actual querying of LW is only done one request at a time (the Spark/Large bits are for getting the input data, IIUC). So the delay option seems the most promising.

Nov 15 2023, 11:21 AM · Product-Analytics, User-Iflorez, Machine-Learning-Team
klausman updated subscribers of T351114: Transient error while running lift wing topic model .

@klausman: would introducing a slight delay (e.g. 0.5s) between each of the 34K requests help?

Nov 15 2023, 11:09 AM · Product-Analytics, User-Iflorez, Machine-Learning-Team
klausman closed T350762: Fix the Lift Wing documentation about how to decode the ACCESS TOKEN as Resolved.

Added to the talk page:

Nov 15 2023, 11:02 AM · Machine-Learning-Team
klausman added a comment to T350137: importOresTopics script fails to import topics.

For technical/simplcity matters, the model_info query was simplified and doesn't support all the parameters anymore. I am not all that familiar with the PHP script, but does the output of https://ores.wikimedia.org/v3/scores/?model_info (note absence of =version) still have all the necessary information? If so, this should be easily fixable.

Nov 15 2023, 10:47 AM · MW-1.42-notes (1.42.0-wmf.7; 2023-11-28), Growth-Team (Sprint 3 (Growth Team)), Machine-Learning-Team, GrowthExperiments

Nov 14 2023

klausman moved T344537: Fast Vandalism Detection from Unsorted to 2023-2024 Q3 Done on the Machine-Learning-Team board.
Nov 14 2023, 3:59 PM · Machine-Learning-Team
klausman moved T345233: Check ores2008's cable from Unsorted to 2023-2024 Q3 Done on the Machine-Learning-Team board.
Nov 14 2023, 3:59 PM · SRE, ops-codfw, Machine-Learning-Team
klausman moved T345320: Fatal Exception (MWException) in arwiki when opening prefrences from Unsorted to 2023-2024 Q3 Done on the Machine-Learning-Team board.
Nov 14 2023, 3:59 PM · Wikimedia-production-error, Machine-Learning-Team, MediaWiki-extensions-ORES
klausman moved T345898: Utilize ChatGPT for categorizing and extracting metadata from files on Commons from Unsorted to 2023-2024 Q3 Done on the Machine-Learning-Team board.
Nov 14 2023, 3:59 PM · Commons, Machine-Learning-Team
klausman moved T346032: Elevate LiftWing access to WME tier for development and production environment from Unsorted to 2023-2024 Q3 Done on the Machine-Learning-Team board.
Nov 14 2023, 3:58 PM · Wikimedia Enterprise, Machine-Learning-Team
klausman reopened T264774: Create three Phab Projects for Machine Learning: Lift Wing, Pilot Flag, Test Grounds as "Open".
Nov 14 2023, 3:58 PM · Machine-Learning-Team, Project-Admins
klausman moved T264774: Create three Phab Projects for Machine Learning: Lift Wing, Pilot Flag, Test Grounds from Unsorted to Ready To Go on the Machine-Learning-Team board.
Nov 14 2023, 3:58 PM · Machine-Learning-Team, Project-Admins
klausman moved T347243: ORES article quality is gone from euwiki in Mozilla Firefox 117.0.1 from Unsorted to 2023-2024 Q3 Done on the Machine-Learning-Team board.
Nov 14 2023, 3:57 PM · Machine-Learning-Team, Wikimedia-Site-requests, ORES
klausman moved T349180: Discuss caching strategies for Lift Wing from Unsorted to Ready To Go on the Machine-Learning-Team board.
Nov 14 2023, 3:56 PM · Machine-Learning-Team, Lift-Wing
klausman moved T349274: Apply multi-processing to preprocess() in isvcs that suffer from high latency from Unsorted to In Progress on the Machine-Learning-Team board.
Nov 14 2023, 3:56 PM · Machine-Learning-Team
klausman assigned T349274: Apply multi-processing to preprocess() in isvcs that suffer from high latency to elukey.
Nov 14 2023, 3:56 PM · Machine-Learning-Team
klausman moved T349632: Add deprecation warnings to ORES-related repositories on Github from Unsorted to In Progress on the Machine-Learning-Team board.
Nov 14 2023, 3:55 PM · Patch-For-Review, ORES, Machine-Learning-Team
klausman moved T349635: Special:NewPagesFeed broken on beta cluster testwiki from Unsorted to Ready To Go on the Machine-Learning-Team board.
Nov 14 2023, 3:51 PM · Machine-Learning-Team, Beta-Cluster-Infrastructure, ORES, PageTriage
klausman moved T349722: URI to use when hitting the Pageviews API on rest-gateway from Unsorted to 2023-2024 Q3 Done on the Machine-Learning-Team board.
Nov 14 2023, 3:50 PM · Machine-Learning-Team, serviceops, Data-Engineering
klausman moved T349844: Increased latencies with Kserve 0.11.1 (cgroups v2) from Unsorted to Ready To Go on the Machine-Learning-Team board.
Nov 14 2023, 3:48 PM · Patch-For-Review, Machine-Learning-Team
klausman moved T349919: Apply common settings to publish events from Lift Wing staging to EventGate from Unsorted to Ready To Go on the Machine-Learning-Team board.
Nov 14 2023, 3:46 PM · Machine-Learning-Team
klausman moved T349968: Add language support for Malay language (ms) from Unsorted to Ready To Go on the Machine-Learning-Team board.
Nov 14 2023, 3:40 PM · artificial-intelligence, Machine-Learning-Team, Bad-Words-Detection-System, revscoring
klausman assigned T349968: Add language support for Malay language (ms) to calbon.

@calbon Can you weigh in on this? AIUI, this would be nontrivial update to Revscoring.

Nov 14 2023, 3:40 PM · artificial-intelligence, Machine-Learning-Team, Bad-Words-Detection-System, revscoring
klausman moved T349996: Globally fix ores.wikipedia.org/ui to new legacy domain from Unsorted to Ready To Go on the Machine-Learning-Team board.
Nov 14 2023, 3:38 PM · Patch-For-Review, Machine-Learning-Team
klausman moved T350137: importOresTopics script fails to import topics from Unsorted to Ready To Go on the Machine-Learning-Team board.
Nov 14 2023, 3:36 PM · MW-1.42-notes (1.42.0-wmf.7; 2023-11-28), Growth-Team (Sprint 3 (Growth Team)), Machine-Learning-Team, GrowthExperiments
klausman claimed T350137: importOresTopics script fails to import topics.
Nov 14 2023, 3:35 PM · MW-1.42-notes (1.42.0-wmf.7; 2023-11-28), Growth-Team (Sprint 3 (Growth Team)), Machine-Learning-Team, GrowthExperiments
klausman moved T350389: Upgrade xgboost in knowledge_integrity from Unsorted to Ready To Go on the Machine-Learning-Team board.
Nov 14 2023, 3:32 PM · Research, Machine-Learning-Team
klausman moved T330148: Support the Revert-Review API/tool on Toolforge from Unsorted to Backlog/Lift Wing on the Machine-Learning-Team board.
Nov 14 2023, 3:31 PM · Machine-Learning-Team, Lift-Wing
klausman moved T350762: Fix the Lift Wing documentation about how to decode the ACCESS TOKEN from Unsorted to Ready To Go on the Machine-Learning-Team board.
Nov 14 2023, 3:29 PM · Machine-Learning-Team
klausman claimed T350762: Fix the Lift Wing documentation about how to decode the ACCESS TOKEN.
Nov 14 2023, 3:29 PM · Machine-Learning-Team
klausman moved T350986: Use expression builder instead of raw SQL in ORES from Unsorted to Ready To Go on the Machine-Learning-Team board.
Nov 14 2023, 3:28 PM · MW-1.42-notes (1.42.0-wmf.21; 2024-03-05), MediaWiki-extensions-ORES, Machine-Learning-Team, Technical-Debt
klausman moved T351021: Revertrisk models are unable to provide scores for single-revision pages from Unsorted to Ready To Go on the Machine-Learning-Team board.
Nov 14 2023, 3:27 PM · Machine-Learning-Team, Lift-Wing
klausman moved T351022: Multilingual model fails from python from Unsorted to Ready To Go on the Machine-Learning-Team board.
Nov 14 2023, 3:26 PM · Machine-Learning-Team, Lift-Wing
klausman moved T351114: Transient error while running lift wing topic model from Unsorted to In Progress on the Machine-Learning-Team board.
Nov 14 2023, 3:22 PM · Product-Analytics, User-Iflorez, Machine-Learning-Team
klausman updated the task description for T349619: Migrate roles to puppet7.
Nov 14 2023, 1:57 PM · Patch-For-Review, Data-Platform-SRE (2024.06.17 - 2024.07.07), serviceops, collaboration-services, SRE-tools, Puppet-Core, Puppet (Puppet 7.0), Infrastructure-Foundations, SRE
klausman updated the task description for T349619: Migrate roles to puppet7.
Nov 14 2023, 11:17 AM · Patch-For-Review, Data-Platform-SRE (2024.06.17 - 2024.07.07), serviceops, collaboration-services, SRE-tools, Puppet-Core, Puppet (Puppet 7.0), Infrastructure-Foundations, SRE
klausman updated the task description for T349619: Migrate roles to puppet7.
Nov 14 2023, 10:49 AM · Patch-For-Review, Data-Platform-SRE (2024.06.17 - 2024.07.07), serviceops, collaboration-services, SRE-tools, Puppet-Core, Puppet (Puppet 7.0), Infrastructure-Foundations, SRE