Tests for articlequality:
Tue, Aug 9
All revscoring-based models are now able to accept a revision-create event, generate a revision-score one and send it to EventGate! \o/
root@thanos-fe1001:/home/elukey# source /etc/swift/account_AUTH_wdqs.env root@thanos-fe1001:/home/elukey# swift list rdf-streaming-updater-codfw rdf-streaming-updater-codfw+segments rdf-streaming-updater-eqiad rdf-streaming-updater-eqiad+segments rdf-streaming-updater-staging thanos-swift updater updater+segments updater-zbyszko updater-zbyszko-v2
@Isaac I can reproduce the error from stat1004, I think that you are going through the http(s) proxy for a .discovery.wmnet domain (internal one). Try with unset https_proxy, it should work afterwards!
Mon, Aug 8
@Ottomata Hi! I am slowly rolling out the code to allow to all revscoring-based models to push mediawiki.revision-score events to EventGate main (precisely, to a test stream). Now I'd like to do the next step, namely having something that:
- listens to mediawiki.revision-create events
- based on some easy rules, decide what Lift Wing endpoint/model to call (for example, call articlequality/editquality/etc.. when an enwiki revision is created, etc..)
Thu, Aug 4
Wed, Aug 3
Tue, Aug 2
Some high level numbers in staging for the non-async docker image of editquality-goodfaith, enwiki:
We can close this task and see if any clean up is needed in the follow up task :)
@colewhite hi! Periodical ping to see if we can move forward with this task. IIRC there were some clients to move to the new bundle, what's the status? Thanks :)
Mon, Aug 1
silenced the alert in alerts.wikimedia.org for a couple of weeks :)
Thu, Jul 28
ml-serve-eqiad completed as well.
Wed, Jul 27
ml-serve-codfw upgraded, all good up to now, waiting a day and some deployments before proceeding wit eqiad as well.
The ORES extension runs PHP code that calls ORES for damaging and goodfaith only (but others are supported, see the extension.json file). The function that returns the HTTP URL to hit is:
This has been worked on in various tasks, we decided to:
@Papaul host rebooted! It is not running any K8s pods at the moment so if any maintenance is needed, feel free to downtime and go ahead :)
Tue, Jul 26
Added to the analytics-alerts@ mailing list :)
Important note: we moved from 2.0.14+20161117-3+deb9u2+wmf1 (custom version on wikimedia-stretch) to 2.0.18-1 (upstream version on Debian Buster).
The new storage-initializer image works! KServe 0.8 is deployed in staging and so far everything works fine. The next step is to plan and execute the deployment to production.
Mon, Jul 25
Tested all Docker images locally (and added documentation on Wikitech). Merged the change and updated the isvc images for articlequality and editquality in staging, all tests passed.
@achou let's create a pull request when you are ready, I'll ask Aaron to review and cut 6.1 :)
Fri, Jul 22
Aiko's patch has been merged!
@BTullis another way could be to add a new disk of say 200G, format it and then mount /var/lib/archiva on it.
Previous occurrence: https://phabricator.wikimedia.org/T304224
Thu, Jul 21
Wed, Jul 20
@achou thanks a lot! I have tested revscoring on stat1004 in the following way:
Tue, Jul 19
Mon, Jul 18
New version (still missing tests, will add them tomorrow): https://github.com/elukey/revscoring/commit/962d336a5b2b84d7c60a639c3a1e6fd4b38b266c
I was able to generate a revision-score-test event from the enwiki editquality goodfaith model in ml-staging (verified that the event landed correctly on kafka).
Thu, Jul 14
Wed, Jul 13
Another experiment that I made in these days, namely adding a simple HTTP cache to revscoring: https://github.com/elukey/revscoring/commit/aaa8a59c6f25ff9ba5c6ac718010e0837cdd8d3d
Tue, Jul 12
Hi! From the ORES infrastructure point of view there is nothing in our metrics that indicates a problem. It would be useful to find a clear vandalism change that was not flagged correctly, then we can start from there.
Jul 11 2022
Jul 10 2022
https://github.com/netblue30/firejail/issues/5222#issuecomment-1172925721 references a similar problem, and there seems to be a patch available.
I found a lot of the following logs:
Jul 7 2022
root@build2001:/srv/images/production-images# build-production-images == Step 0: scanning /srv/images/production-images/images == Will build the following images: * docker-registry.discovery.wmnet/kserve-build:0.8.0-1 * docker-registry.discovery.wmnet/kserve-controller:0.8.0-1 * docker-registry.discovery.wmnet/kserve-agent:0.8.0-1 * docker-registry.discovery.wmnet/kserve-storage-initializer:0.8.0-1 == Step 1: building images == * Built image docker-registry.discovery.wmnet/kserve-build:0.8.0-1 * Built image docker-registry.discovery.wmnet/kserve-controller:0.8.0-1 * Built image docker-registry.discovery.wmnet/kserve-agent:0.8.0-1 * Built image docker-registry.discovery.wmnet/kserve-storage-initializer:0.8.0-1 == Step 2: publishing == Successfully published image docker-registry.discovery.wmnet/kserve-controller:0.8.0-1 Successfully published image docker-registry.discovery.wmnet/kserve-agent:0.8.0-1 Successfully published image docker-registry.discovery.wmnet/kserve-build:0.8.0-1 Successfully published image docker-registry.discovery.wmnet/kserve-storage-initializer:0.8.0-1 == Build done! == You can see the logs at ./docker-pkg-build.log == Step 0: scanning /srv/images/production-images/istio == Will build the following images: == Step 1: building images == == Step 2: publishing == == Build done! == You can see the logs at ./docker-pkg-build.log == Step 0: scanning /srv/images/production-images/cert-manager == Will build the following images: == Step 1: building images == == Step 2: publishing == == Build done! == You can see the logs at ./docker-pkg-build.log
The curl command above works now! I can see the eqiad.mediawiki.revision-score-test topic in Kafka main too.
Ack! The link for full service restart may be broken, is it https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate/Administration#Roll_restart_all_pods right? Can I do it anytime?
Jul 6 2022
I tried to send an event manually with curl (see below) and I am getting:
@MarcoAurelio Thanks a lot for the help! I hope to be able to deprecate all these repos soon-ish :)
if you have time I'd need some help in deploying https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/810007, I haven't done this in a while :) (otherwise I can add it to the deployment window schedule).
Informed @LSobanski via email as well, so Data Persistence is aware of this extra new cluster :) I think that, if everybody agrees, this task can be closed and the ML testing phase can start. Once we'll have all our results we'll show them to people and we'll decide how to proceed, does it make sense?
Jul 5 2022
Ah ok I was assuming that the revision score streams would have changed their name anyway after T308017, but it may not be the case so keeping the names available is good.
elukey@ml-serve-ctrl1001:~$ curl "https://inference-staging.svc.codfw.wmnet:30443/v1/models/enwiki-articlequality:predict" -X POST -d @input.json -i -H "Host: enwiki-articlequality.revscoring-articlequality.wikimedia.org" --http1.1 HTTP/1.1 200 OK content-length: 225 content-type: application/json; charset=UTF-8 date: Tue, 05 Jul 2022 08:53:20 GMT server: istio-envoy x-envoy-upstream-service-time: 317
articlequality pods up and running! The swift credentials are working as expected.
All working! Added the new account to the ML staging cluster, and it worked nicely. We'll move away from the admin account in prod as well. Thanks!
Tried to upload a model with the new read only account and I got access denied (good):
Filippo applied the following rule and everything now works:
Jul 4 2022
The new mlserve:ro account has been added, but if I try to use s3cmd with the new credentials I get an error:
Reporting a summary of what has been discussed over IRC. The extractor calculates most of the above features, so one way forward could be to instruct it to have a sort of http_cache parameter able to cache raw results from the MW API.
I think that I got, more or less, how the ORES feature injection works (https://www.mediawiki.org/wiki/ORES/Feature_injection#Feature_injection:_playing_with_what_ORES_sees).
Jul 1 2022
@taavi we are going to ask to the community if it is ok to drop these, would it be ok to wait a few days more?
Jun 30 2022
codfw cluster up and running on Buster :)
@MatthewVernon hi! Do you have any guidance about how to proceed?
@Gethan your change has been deployed today, thanks a lot for your contribution!
I checked in the jira that was pointed out earlier, and I noticed two things:
The main worry that I have now is that moving to Bullseye for Cassandra nodes will mean upgrading to 4.x at this point, unless we find a way to move cqlsh.py to python 3 in our 3.x packages. For all our clusters it seems very overkill and risky given the timeframe :(