Page MenuHomePhabricator

elukey (Luca Toscano)
Site Reliability Engineer - Analytics/Data engineering

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
Jan 5 2016, 9:54 PM (344 w, 1 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
LToscano (WMF) [ Global Accounts ]

Recent Activity

Yesterday

elukey added a comment to T313915: Move revscoring isvcs to async architecture.

Tests for articlequality:

Wed, Aug 10, 2:18 PM · Patch-For-Review, Lift-Wing, Machine-Learning-Team (Active Tasks)
elukey committed rMLIS51f74fe4e61c: articlequality: move preprocess() to async (authored by elukey).
articlequality: move preprocess() to async
Wed, Aug 10, 1:37 PM
elukey updated the task description for T310146: (Need By:TBD) rack/setup/install row D new PDUs.
Wed, Aug 10, 10:26 AM · Patch-For-Review, SRE-swift-storage, DBA, SRE, ops-codfw

Tue, Aug 9

elukey added a comment to T301878: Send score to eventgate when requested.

Hello! Separate streams for different models seems fine, but perhaps what you want are separate events for each model, not necessarily different streams? I guess it depends on how you expect people to consume these events. If you expect a given consumer to ever only be interested in a single model, then separate streams makes sense. However, if you expect a consumer to want some or all model scores, then keeping them in the same stream, or even in the same event, might be better. It makes reasoning about ordering easier. If a page is edited often, you'll want to make it as easy as possible to consume the scores in the order that the edits happen. The more events and streams you have the more you have to worry about ordering.

Tue, Aug 9, 3:10 PM · Patch-For-Review, Epic, Lift-Wing, Machine-Learning-Team (Active Tasks)
elukey added a comment to T301878: Send score to eventgate when requested.

All revscoring-based models are now able to accept a revision-create event, generate a revision-score one and send it to EventGate! \o/

Tue, Aug 9, 3:05 PM · Patch-For-Review, Epic, Lift-Wing, Machine-Learning-Team (Active Tasks)
elukey added a comment to T314835: wdqs space usage on thanos-swift.
root@thanos-fe1001:/home/elukey# source /etc/swift/account_AUTH_wdqs.env
root@thanos-fe1001:/home/elukey# swift list 
rdf-streaming-updater-codfw
rdf-streaming-updater-codfw+segments
rdf-streaming-updater-eqiad
rdf-streaming-updater-eqiad+segments
rdf-streaming-updater-staging
thanos-swift
updater
updater+segments
updater-zbyszko
updater-zbyszko-v2
Tue, Aug 9, 1:33 PM · Patch-For-Review, Data Engineering Planning, wdwb-tech, Wikidata, SRE-swift-storage, SRE, Wikidata-Query-Service
elukey committed rMLISa3718224df8e: Update README.md files after the recent Blubber config refactor (authored by elukey).
Update README.md files after the recent Blubber config refactor
Tue, Aug 9, 1:20 PM
elukey committed rMLIS3503f36546de: python: Add more info about Docker image rebuild (authored by elukey).
python: Add more info about Docker image rebuild
Tue, Aug 9, 1:20 PM
elukey committed rMLIS80aa7ea6b0df: outlink: move Blubber config to the new standard (authored by elukey).
outlink: move Blubber config to the new standard
Tue, Aug 9, 1:19 PM
elukey committed rMLIS9a563c99a766: drafttopic: add code to send events to EventGate (authored by elukey).
drafttopic: add code to send events to EventGate
Tue, Aug 9, 10:27 AM
elukey committed rMLIS5bbd8b446e98: draftquality: add code to send events to EventGate (authored by elukey).
draftquality: add code to send events to EventGate
Tue, Aug 9, 8:41 AM
elukey added a comment to T287056: Deploy Outlinks topic model to production.

@Isaac I can reproduce the error from stat1004, I think that you are going through the http(s) proxy for a .discovery.wmnet domain (internal one). Try with unset https_proxy, it should work afterwards!

Tue, Aug 9, 7:40 AM · Machine-Learning-Team (Active Tasks), Lift-Wing

Mon, Aug 8

elukey added a comment to T301878: Send score to eventgate when requested.

@Ottomata Hi! I am slowly rolling out the code to allow to all revscoring-based models to push mediawiki.revision-score events to EventGate main (precisely, to a test stream). Now I'd like to do the next step, namely having something that:

  • listens to mediawiki.revision-create events
  • based on some easy rules, decide what Lift Wing endpoint/model to call (for example, call articlequality/editquality/etc.. when an enwiki revision is created, etc..)
Mon, Aug 8, 4:19 PM · Patch-For-Review, Epic, Lift-Wing, Machine-Learning-Team (Active Tasks)
elukey committed rMLISb40a5043e1b0: articlequality: add code to send events to EventGate (authored by elukey).
articlequality: add code to send events to EventGate
Mon, Aug 8, 10:09 AM
elukey committed rMLIS8b446eb1fbec: editquality: move rev-id preprocess functions to a separate module (authored by elukey).
editquality: move rev-id preprocess functions to a separate module
Mon, Aug 8, 9:32 AM

Thu, Aug 4

elukey committed rMLIS3d437e99dbfa: editquality: fix .pipeline's config settings (authored by elukey).
editquality: fix .pipeline's config settings
Thu, Aug 4, 2:11 PM
elukey committed rMLIS37dba3c9ea16: editquality: refactor Blubber config to share code (authored by elukey).
editquality: refactor Blubber config to share code
Thu, Aug 4, 8:35 AM

Wed, Aug 3

elukey committed rMLIS636fd0c3bd8d: Remove Dockerfiles from model-server directories (authored by elukey).
Remove Dockerfiles from model-server directories
Wed, Aug 3, 9:16 AM

Tue, Aug 2

elukey added a reverting change for rMLIS27a7dda823a5: editquality: use a Ray worker for model serving: rMLIS0c98179f61c6: Revert "editquality: use a Ray worker for model serving".
Tue, Aug 2, 12:55 PM
elukey committed rMLIS0c98179f61c6: Revert "editquality: use a Ray worker for model serving" (authored by elukey).
Revert "editquality: use a Ray worker for model serving"
Tue, Aug 2, 12:55 PM
elukey added a comment to T313915: Move revscoring isvcs to async architecture.

Some high level numbers in staging for the non-async docker image of editquality-goodfaith, enwiki:

Tue, Aug 2, 12:53 PM · Patch-For-Review, Lift-Wing, Machine-Learning-Team (Active Tasks)
elukey committed rMLIS27a7dda823a5: editquality: use a Ray worker for model serving (authored by elukey).
editquality: use a Ray worker for model serving
Tue, Aug 2, 10:09 AM
elukey closed T307349: Accidental removal of some files under /srv/deployment on deploy1002 as Resolved.

We can close this task and see if any clean up is needed in the follow up task :)

Tue, Aug 2, 9:35 AM · Parsoid (Tracking), Deployments, Release-Engineering-Team (Doing), bacula, SRE
elukey added a comment to T300130: Move Kafka logging to the new intermediate PKI.

@colewhite hi! Periodical ping to see if we can move forward with this task. IIRC there were some clients to move to the new bundle, what's the status? Thanks :)

Tue, Aug 2, 7:04 AM · Patch-For-Review, observability, SRE

Mon, Aug 1

elukey committed rLPRI32b0febb0942: Add fake config for ml-service drafttopic (authored by elukey).
Add fake config for ml-service drafttopic
Mon, Aug 1, 2:22 PM
elukey added a comment to T312626: Replace RAID controller battery in an-worker1082.

silenced the alert in alerts.wikimedia.org for a couple of weeks :)

Mon, Aug 1, 9:44 AM · SRE, ops-eqiad, DC-Ops
elukey committed rMLIS0964e12bfebb: editquality - add MWAPICache to preprocess (authored by elukey).
editquality - add MWAPICache to preprocess
Mon, Aug 1, 9:04 AM

Thu, Jul 28

elukey moved T313915: Move revscoring isvcs to async architecture from Parked to In Progress on the Machine-Learning-Team (Active Tasks) board.
Thu, Jul 28, 2:49 PM · Patch-For-Review, Lift-Wing, Machine-Learning-Team (Active Tasks)
elukey claimed T313915: Move revscoring isvcs to async architecture.
Thu, Jul 28, 2:49 PM · Patch-For-Review, Lift-Wing, Machine-Learning-Team (Active Tasks)
elukey closed T311982: Upgrade ml clusters to kserve 0.8 as Resolved.

ml-serve-eqiad completed as well.

Thu, Jul 28, 1:15 PM · Patch-For-Review, Machine-Learning-Team (Active Tasks)

Wed, Jul 27

elukey added a comment to T311982: Upgrade ml clusters to kserve 0.8.

ml-serve-codfw upgraded, all good up to now, waiting a day and some deployments before proceeding wit eqiad as well.

Wed, Jul 27, 4:13 PM · Patch-For-Review, Machine-Learning-Team (Active Tasks)
elukey created T313915: Move revscoring isvcs to async architecture.
Wed, Jul 27, 2:08 PM · Patch-For-Review, Lift-Wing, Machine-Learning-Team (Active Tasks)
elukey added a comment to T312518: Migrate ORES clients to LiftWing.

The ORES extension runs PHP code that calls ORES for damaging and goodfaith only (but others are supported, see the extension.json file). The function that returns the HTTP URL to hit is:

Wed, Jul 27, 1:54 PM · Machine-Learning-Team
elukey closed T309623: Test async preprocess on kserve as Resolved.

This has been worked on in various tasks, we decided to:

Wed, Jul 27, 9:17 AM · Patch-For-Review, Lift-Wing, Machine-Learning-Team (Active Tasks)
elukey closed T309623: Test async preprocess on kserve, a subtask of T296173: Load test the Lift Wing cluster, as Resolved.
Wed, Jul 27, 9:16 AM · Lift-Wing, Machine-Learning-Team (Active Tasks)
elukey added a comment to T313822: codfw: ml-serve2001 memmory issue DIMM A2.

@Papaul host rebooted! It is not running any K8s pods at the moment so if any maintenance is needed, feel free to downtime and go ahead :)

Wed, Jul 27, 9:11 AM · Machine-Learning-Team, SRE, ops-codfw

Tue, Jul 26

elukey added a comment to T313493: Add support for async session to python-mwapi.

https://pypi.org/project/mwapi/0.6.1/ :)

Tue, Jul 26, 4:21 PM · Machine-Learning-Team (Active Tasks)
elukey closed T313816: Add nokafor to receive analytics-alerts emails and have sudo -u hdfs rights in hdfs as Resolved.
Tue, Jul 26, 3:54 PM · Data-Engineering
elukey added a comment to T313816: Add nokafor to receive analytics-alerts emails and have sudo -u hdfs rights in hdfs.

Added to the analytics-alerts@ mailing list :)

Tue, Jul 26, 3:53 PM · Data-Engineering
elukey added a comment to T312550: uwsgi socket/UDP logger is broken if no other logger uses the same format.

Important note: we moved from 2.0.14+20161117-3+deb9u2+wmf1 (custom version on wikimedia-stretch) to 2.0.18-1 (upstream version on Debian Buster).

Tue, Jul 26, 9:45 AM · SRE
elukey added a comment to T311982: Upgrade ml clusters to kserve 0.8.

The new storage-initializer image works! KServe 0.8 is deployed in staging and so far everything works fine. The next step is to plan and execute the deployment to production.

Tue, Jul 26, 8:49 AM · Patch-For-Review, Machine-Learning-Team (Active Tasks)

Mon, Jul 25

elukey added a comment to T311982: Upgrade ml clusters to kserve 0.8.

Tested all Docker images locally (and added documentation on Wikitech). Merged the change and updated the isvc images for articlequality and editquality in staging, all tests passed.

Mon, Jul 25, 9:37 AM · Patch-For-Review, Machine-Learning-Team (Active Tasks)
elukey committed rMLIS7fd064807810: Update Python model servers and requirements to KServe 0.8 (authored by elukey).
Update Python model servers and requirements to KServe 0.8
Mon, Jul 25, 9:02 AM
elukey added a comment to T313493: Add support for async session to python-mwapi.

@achou let's create a pull request when you are ready, I'll ask Aaron to review and cut 6.1 :)

Mon, Jul 25, 8:53 AM · Machine-Learning-Team (Active Tasks)

Fri, Jul 22

elukey added a comment to T313493: Add support for async session to python-mwapi.

Aiko's patch has been merged!

Fri, Jul 22, 4:33 PM · Machine-Learning-Team (Active Tasks)
elukey added a comment to T313386: archiva1002 is running low on space left in the root partition.

@BTullis another way could be to add a new disk of say 200G, format it and then mount /var/lib/archiva on it.

Fri, Jul 22, 1:17 PM · Data Engineering Planning (Sprint 01), wmde-team-b-tech, SRE, Discovery-ARCHIVED
elukey added a comment to T313386: archiva1002 is running low on space left in the root partition.

Previous occurrence: https://phabricator.wikimedia.org/T304224

Fri, Jul 22, 7:19 AM · Data Engineering Planning (Sprint 01), wmde-team-b-tech, SRE, Discovery-ARCHIVED

Thu, Jul 21

elukey committed rLPRI24f131807cf3: Add fake secrets for the ml revscoring-articletopic k8s ns (authored by elukey).
Add fake secrets for the ml revscoring-articletopic k8s ns
Thu, Jul 21, 9:20 AM

Wed, Jul 20

elukey created T313386: archiva1002 is running low on space left in the root partition.
Wed, Jul 20, 8:22 AM · Data Engineering Planning (Sprint 01), wmde-team-b-tech, SRE, Discovery-ARCHIVED
elukey added a comment to T311982: Upgrade ml clusters to kserve 0.8.

@achou thanks a lot! I have tested revscoring on stat1004 in the following way:

Wed, Jul 20, 8:06 AM · Patch-For-Review, Machine-Learning-Team (Active Tasks)

Tue, Jul 19

elukey claimed T311982: Upgrade ml clusters to kserve 0.8.
Tue, Jul 19, 2:30 PM · Patch-For-Review, Machine-Learning-Team (Active Tasks)
elukey moved T311982: Upgrade ml clusters to kserve 0.8 from Parked to In Progress on the Machine-Learning-Team (Active Tasks) board.
Tue, Jul 19, 2:29 PM · Patch-For-Review, Machine-Learning-Team (Active Tasks)

Mon, Jul 18

elukey added a comment to T309623: Test async preprocess on kserve.

New version (still missing tests, will add them tomorrow): https://github.com/elukey/revscoring/commit/962d336a5b2b84d7c60a639c3a1e6fd4b38b266c

Mon, Jul 18, 3:14 PM · Patch-For-Review, Lift-Wing, Machine-Learning-Team (Active Tasks)
elukey added a comment to T301878: Send score to eventgate when requested.

I was able to generate a revision-score-test event from the enwiki editquality goodfaith model in ml-staging (verified that the event landed correctly on kafka).

Mon, Jul 18, 10:41 AM · Patch-For-Review, Epic, Lift-Wing, Machine-Learning-Team (Active Tasks)
elukey created T313202: Improve EventGate's error message when the client's HTTP Content-Type is not the one expected.
Mon, Jul 18, 8:41 AM · Data-Engineering, Event-Platform Value Stream
elukey committed rMLIS68d25df840ae: editquality: set Content-type when sending events to EventGate (authored by elukey).
editquality: set Content-type when sending events to EventGate
Mon, Jul 18, 8:36 AM

Thu, Jul 14

elukey committed rMLIS6a7e089706fd: editquality: use json.dumps instead of urlencode for EventGate (authored by elukey).
editquality: use json.dumps instead of urlencode for EventGate
Thu, Jul 14, 10:53 AM

Wed, Jul 13

elukey committed rMLISd1a48e55be91: editquality: add support for revision-score events (authored by elukey).
editquality: add support for revision-score events
Wed, Jul 13, 1:27 PM
elukey added a comment to T309623: Test async preprocess on kserve.

Another experiment that I made in these days, namely adding a simple HTTP cache to revscoring: https://github.com/elukey/revscoring/commit/aaa8a59c6f25ff9ba5c6ac718010e0837cdd8d3d

Wed, Jul 13, 8:26 AM · Patch-For-Review, Lift-Wing, Machine-Learning-Team (Active Tasks)

Tue, Jul 12

elukey added a comment to T312843: Possible ORES outage for PageCuration tags "vandalism", "spam", "attack".

Hi! From the ORES infrastructure point of view there is nothing in our metrics that indicates a problem. It would be useful to find a clear vandalism change that was not flagged correctly, then we can start from there.

Tue, Jul 12, 1:12 PM · Machine-Learning-Team, ORES, PageTriage, Growth-Team

Jul 11 2022

elukey created P31003 (An Untitled Masterwork).
Jul 11 2022, 2:32 PM

Jul 10 2022

elukey added a project to T312722: Thumbor units failing / service general slowness: SRE.
Jul 10 2022, 2:06 PM · SRE, Thumbor
elukey added a comment to T312722: Thumbor units failing / service general slowness.

https://github.com/netblue30/firejail/issues/5222#issuecomment-1172925721 references a similar problem, and there seems to be a patch available.

Jul 10 2022, 2:05 PM · SRE, Thumbor
elukey added a comment to T312722: Thumbor units failing / service general slowness.

I found a lot of the following logs:

Jul 10 2022, 2:00 PM · SRE, Thumbor

Jul 7 2022

elukey added a comment to T311982: Upgrade ml clusters to kserve 0.8.
root@build2001:/srv/images/production-images# build-production-images 
== Step 0: scanning /srv/images/production-images/images ==
Will build the following images:
* docker-registry.discovery.wmnet/kserve-build:0.8.0-1
* docker-registry.discovery.wmnet/kserve-controller:0.8.0-1
* docker-registry.discovery.wmnet/kserve-agent:0.8.0-1
* docker-registry.discovery.wmnet/kserve-storage-initializer:0.8.0-1
== Step 1: building images ==
* Built image docker-registry.discovery.wmnet/kserve-build:0.8.0-1
* Built image docker-registry.discovery.wmnet/kserve-controller:0.8.0-1
* Built image docker-registry.discovery.wmnet/kserve-agent:0.8.0-1
* Built image docker-registry.discovery.wmnet/kserve-storage-initializer:0.8.0-1
== Step 2: publishing ==
Successfully published image docker-registry.discovery.wmnet/kserve-controller:0.8.0-1
Successfully published image docker-registry.discovery.wmnet/kserve-agent:0.8.0-1
Successfully published image docker-registry.discovery.wmnet/kserve-build:0.8.0-1
Successfully published image docker-registry.discovery.wmnet/kserve-storage-initializer:0.8.0-1
== Build done! ==
You can see the logs at ./docker-pkg-build.log
== Step 0: scanning /srv/images/production-images/istio ==
Will build the following images:
== Step 1: building images ==
== Step 2: publishing ==
== Build done! ==
You can see the logs at ./docker-pkg-build.log
== Step 0: scanning /srv/images/production-images/cert-manager ==
Will build the following images:
== Step 1: building images ==
== Step 2: publishing ==
== Build done! ==
You can see the logs at ./docker-pkg-build.log
Jul 7 2022, 3:31 PM · Patch-For-Review, Machine-Learning-Team (Active Tasks)
elukey added a comment to T309623: Test async preprocess on kserve.

@achou thank you for digging into the async-mediawiki library. Following yesterday's chat in the meeting, I wonder whether we would benefit more from adding async/await to the preprocess method in model.py or adding async support on the mwapi library — considering we wouldn't want to disrupt other mwapi library users that don't have our specific use-case.

Jul 7 2022, 3:29 PM · Patch-For-Review, Lift-Wing, Machine-Learning-Team (Active Tasks)
elukey updated the task description for T312518: Migrate ORES clients to LiftWing.
Jul 7 2022, 1:47 PM · Machine-Learning-Team
elukey added a comment to T301878: Send score to eventgate when requested.

The curl command above works now! I can see the eqiad.mediawiki.revision-score-test topic in Kafka main too.

Jul 7 2022, 1:24 PM · Patch-For-Review, Epic, Lift-Wing, Machine-Learning-Team (Active Tasks)
elukey created T312518: Migrate ORES clients to LiftWing.
Jul 7 2022, 9:54 AM · Machine-Learning-Team
elukey added a comment to T301878: Send score to eventgate when requested.

Ack! The link for full service restart may be broken, is it https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate/Administration#Roll_restart_all_pods right? Can I do it anytime?

Jul 7 2022, 8:51 AM · Patch-For-Review, Epic, Lift-Wing, Machine-Learning-Team (Active Tasks)

Jul 6 2022

elukey added a comment to T301878: Send score to eventgate when requested.

I tried to send an event manually with curl (see below) and I am getting:

Jul 6 2022, 2:35 PM · Patch-For-Review, Epic, Lift-Wing, Machine-Learning-Team (Active Tasks)
elukey added a comment to T311390: Github's wikimedia/ores not mirroring to Gerrit's scoring/ores/ores.

@MarcoAurelio Thanks a lot for the help! I hope to be able to deprecate all these repos soon-ish :)

Jul 6 2022, 1:40 PM · User-MarcoAurelio, Machine-Learning-Team (Active Tasks), Phabricator, Release-Engineering-Team
elukey added a comment to T301878: Send score to eventgate when requested.

@Ottomata if you have time I'd need some help in deploying https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/810007, I haven't done this in a while :) (otherwise I can add it to the deployment window schedule).

Jul 6 2022, 1:30 PM · Patch-For-Review, Epic, Lift-Wing, Machine-Learning-Team (Active Tasks)
elukey added a comment to T302232: Set up the ml-cache clusters.

Informed @LSobanski via email as well, so Data Persistence is aware of this extra new cluster :) I think that, if everybody agrees, this task can be closed and the ML testing phase can start. Once we'll have all our results we'll show them to people and we'll decide how to proceed, does it make sense?

Jul 6 2022, 1:23 PM · Patch-For-Review, Epic, Lift-Wing, Machine-Learning-Team (Active Tasks)

Jul 5 2022

elukey added a comment to T302232: Set up the ml-cache clusters.

Just to reiterate for posterity sake. The Platform team's Generated Data Platform (aka AQS) isn't the only option for (Cassandra-based) storage, we have another multi-tenant storage cluster as well (the cluster formerly exclusive to RESTBase). Obviously we'd want to do due diligence with respect to establishing ORES score caching storage size and throughput requirements, but I suspect we have more than enough capacity.

We're in the process of trying to pull the various Cassandra clusters together under the umbrella of SRE Data Persistence. I don't know that that means a team-owned special-purpose Cassandra cluster would be discouraged (that's not for me to say), but it does seem like we'd want to be explicit about that if it's the case.

Jul 5 2022, 2:59 PM · Patch-For-Review, Epic, Lift-Wing, Machine-Learning-Team (Active Tasks)
elukey added a comment to T301878: Send score to eventgate when requested.

Ah ok I was assuming that the revision score streams would have changed their name anyway after T308017, but it may not be the case so keeping the names available is good.

Jul 5 2022, 2:53 PM · Patch-For-Review, Epic, Lift-Wing, Machine-Learning-Team (Active Tasks)
elukey moved T311982: Upgrade ml clusters to kserve 0.8 from Unorganized to Active Tasks on the Machine-Learning-Team board.
Jul 5 2022, 2:10 PM · Patch-For-Review, Machine-Learning-Team (Active Tasks)
elukey committed rORES587d00fa4eb2: Update requirements.txt to avoid Github's security alerts (authored by elukey).
Update requirements.txt to avoid Github's security alerts
Jul 5 2022, 12:18 PM
elukey added a comment to T302195: Create the ml-serve-staging k8s cluster.
elukey@ml-serve-ctrl1001:~$ curl "https://inference-staging.svc.codfw.wmnet:30443/v1/models/enwiki-articlequality:predict" -X POST -d @input.json -i -H "Host: enwiki-articlequality.revscoring-articlequality.wikimedia.org" --http1.1
HTTP/1.1 200 OK
content-length: 225
content-type: application/json; charset=UTF-8
date: Tue, 05 Jul 2022 08:53:20 GMT
server: istio-envoy
x-envoy-upstream-service-time: 317
Jul 5 2022, 8:54 AM · Patch-For-Review, Epic, Lift-Wing, Machine-Learning-Team (Active Tasks)
elukey added a comment to T302195: Create the ml-serve-staging k8s cluster.

articlequality pods up and running! The swift credentials are working as expected.

Jul 5 2022, 8:48 AM · Patch-For-Review, Epic, Lift-Wing, Machine-Learning-Team (Active Tasks)
elukey closed T311628: Create Swift account for readonly access to ML models as Resolved.

All working! Added the new account to the ML staging cluster, and it worked nicely. We'll move away from the admin account in prod as well. Thanks!

Jul 5 2022, 8:47 AM · Machine-Learning-Team (Active Tasks), SRE-swift-storage, Lift-Wing
elukey added a comment to T311628: Create Swift account for readonly access to ML models.

Tried to upload a model with the new read only account and I got access denied (good):

Jul 5 2022, 8:34 AM · Machine-Learning-Team (Active Tasks), SRE-swift-storage, Lift-Wing
elukey added a comment to T311628: Create Swift account for readonly access to ML models.

Filippo applied the following rule and everything now works:

Jul 5 2022, 8:31 AM · Machine-Learning-Team (Active Tasks), SRE-swift-storage, Lift-Wing

Jul 4 2022

elukey added a comment to T311628: Create Swift account for readonly access to ML models.

The new mlserve:ro account has been added, but if I try to use s3cmd with the new credentials I get an error:

Jul 4 2022, 3:30 PM · Machine-Learning-Team (Active Tasks), SRE-swift-storage, Lift-Wing
elukey committed rLPRId49724705063: profile::thanos::swift: add mlserve_ro account (authored by elukey).
profile::thanos::swift: add mlserve_ro account
Jul 4 2022, 2:11 PM
elukey added a comment to T310643: Build Bigtop 1.5 Hadoop packages for Bullseye.

I'll start work on this today. I have come across this ticket: https://issues.apache.org/jira/browse/BIGTOP-3600 in which @elukey has confirmed that the build process now works with bulseye.

I believe that the build command I need will be:

docker run --rm  -v `pwd`:/ws --workdir /ws bigtop/slaves:trunk-debian-11 bash -c '. /etc/profile.d/bigtop.sh; ./gradlew allclean hadoop-pkg'

I'll then transfer the built packages to apt1001 and then add them to our repo with reprepro. I'll check to see which packages are already present in the repo for buster and only add the same for bullseye.

@elukey does this procedure seem right to you?

Jul 4 2022, 1:34 PM · Data Engineering Planning (Sprint 02)
elukey added a comment to T309623: Test async preprocess on kserve.

Reporting a summary of what has been discussed over IRC. The extractor calculates most of the above features, so one way forward could be to instruct it to have a sort of http_cache parameter able to cache raw results from the MW API.

Jul 4 2022, 1:22 PM · Patch-For-Review, Lift-Wing, Machine-Learning-Team (Active Tasks)
elukey updated subscribers of T309623: Test async preprocess on kserve.

I think that I got, more or less, how the ORES feature injection works (https://www.mediawiki.org/wiki/ORES/Feature_injection#Feature_injection:_playing_with_what_ORES_sees).

Jul 4 2022, 9:32 AM · Patch-For-Review, Lift-Wing, Machine-Learning-Team (Active Tasks)
elukey created T311982: Upgrade ml clusters to kserve 0.8.
Jul 4 2022, 8:17 AM · Patch-For-Review, Machine-Learning-Team (Active Tasks)
elukey added a comment to T301878: Send score to eventgate when requested.

Looks good. My only worry is that these making these new streams now, and planning to refactor their data model them later based on T308017 might confuse folks? How do you expect people to use 'mediawiki.revision-score-editquality' now?

Jul 4 2022, 8:07 AM · Patch-For-Review, Epic, Lift-Wing, Machine-Learning-Team (Active Tasks)

Jul 1 2022

elukey added a comment to T307389: Upgrade wikilabels databases to buster/bullseye.

@taavi we are going to ask to the community if it is ok to drop these, would it be ok to wait a few days more?

Jul 1 2022, 10:09 AM · Wikilabels, Machine-Learning-Team, cloud-services-team (Kanban), Data-Services, Cloud-VPS (Debian Stretch Deprecation)

Jun 30 2022

elukey added a comment to T310980: Allow Cassandra to be deployed on Bullseye nodes.

I would propose that the way to think about this might be to ask ourselves how much runway we want/need from here to 4.x. 3.11.x is scheduled to be EOL mid-2023, about the same time we should be off of Buster. That's kind of tight given everything that needs to be done (including Bullseye and Cassandra 4 upgrades of all nodes). Backporting that changeset costs time/effort but lets us use 3.11 on Bullseye, is that a net win (not a rhetorical question)?

Jun 30 2022, 3:23 PM · Cassandra, SRE
elukey added a comment to T302232: Set up the ml-cache clusters.

@lbowmaker @Eevans I had a long chat with my team about the AQS cluster and our use cases, we reached some consensus about how to proceed, lemme try to summarize and then I'd love some feedback :)

Jun 30 2022, 1:51 PM · Patch-For-Review, Epic, Lift-Wing, Machine-Learning-Team (Active Tasks)
elukey added a comment to T302232: Set up the ml-cache clusters.

codfw cluster up and running on Buster :)

Jun 30 2022, 1:26 PM · Patch-For-Review, Epic, Lift-Wing, Machine-Learning-Team (Active Tasks)
elukey updated the task description for T311628: Create Swift account for readonly access to ML models.
Jun 30 2022, 8:41 AM · Machine-Learning-Team (Active Tasks), SRE-swift-storage, Lift-Wing
elukey updated the task description for T311628: Create Swift account for readonly access to ML models.
Jun 30 2022, 8:41 AM · Machine-Learning-Team (Active Tasks), SRE-swift-storage, Lift-Wing
elukey updated subscribers of T311628: Create Swift account for readonly access to ML models.

@MatthewVernon hi! Do you have any guidance about how to proceed?

Jun 30 2022, 8:40 AM · Machine-Learning-Team (Active Tasks), SRE-swift-storage, Lift-Wing
elukey closed T279271: ORES gives internal error on an invalid model_info parameter as Resolved.

@Gethan your change has been deployed today, thanks a lot for your contribution!

Jun 30 2022, 8:36 AM · Machine-Learning-Team, ORES
elukey added a comment to T310980: Allow Cassandra to be deployed on Bullseye nodes.

I checked in the jira that was pointed out earlier, and I noticed two things:

Jun 30 2022, 8:08 AM · Cassandra, SRE
elukey added a comment to T310980: Allow Cassandra to be deployed on Bullseye nodes.

The main worry that I have now is that moving to Bullseye for Cassandra nodes will mean upgrading to 4.x at this point, unless we find a way to move cqlsh.py to python 3 in our 3.x packages. For all our clusters it seems very overkill and risky given the timeframe :(

Jun 30 2022, 7:58 AM · Cassandra, SRE