Page MenuHomePhabricator

ORES deployment - Winter 2022 - nlwiki articlequality/hiwiki editquality/ores observability
Closed, ResolvedPublic

Description

This is a task to track the deployment of the nlwiki dutchquality model, hiwiki editquality models, and some improvements to ores observability. We will need to update each model repo to get changes into gerrit and then we can deploy https://gerrit.wikimedia.org/r/c/mediawiki/services/ores/deploy/+/755731

Thanks @ACraze! It looks like I no longer have permission to manually mirror changes into the gerrit model repos. See https://wikitech.wikimedia.org/wiki/ORES/Deployment#Updating_model_repositories

Can you either restore my rights or perform these operations for me? It looks like the git lfs push gerrit master operation won't do everything we need anymore because basic mirroring is also not working anymore. So you'll need to be able to run git push gerrit master (note I dropped the "lfs") too in order to get the changes. This needs to be done for all of the model repos:

  • editquality
  • articlequality
  • draftquality
  • drafttopic

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Thanks @ACraze! I've been testing the deployment configuration and ran into a surprising compatibility issue with the current enwiki articlequality model (built with revscoring 2.8.2). I'm digging to figure out what might have caused the issue and will be submitting some rebuilt model PRs using the new revscoring 2.11.1 as they finish. Sorry for the delay, folks.

I have rebuilt the English Wikipedia model. It now loads fine with revscoring 2.11.1. https://github.com/wikimedia/articlequality/pull/171

I still haven't been able to work out why the version of the model built with 2.8.2 didn't load correctly. For some reason, the unpickling process couldn't find revscoring.datasources.meta.extractors.trie, yet it clearly did exist in 2.8.2. See https://github.com/wikimedia/revscoring/blob/20cdfedae7317023feff846089d279d64fe08829/revscoring/datasources/meta/extractors.py#L71

Regardless, it seems that all other models load as expected in my test environment. I think it is fine to moving forward with this one model rebuilt. What do y'all think?

The articlequality PR has been merged and the repo has been mirrored to Gerrit again.

Config updated. "(WIP)" removed. I think we're good to go

Change 755731 merged by Accraze:

[mediawiki/services/ores/deploy@master] nlwiki articlequality, hiwiki editquality, ores observability

https://gerrit.wikimedia.org/r/755731

The deploy CR has been merged and we are scheduling a deployment on Wednesday morning, will update here once it is complete.

Just checking -- will this be going to beta first? I'd like to poke the system in a prod-like environment a little bit before the actual deployment goes out.

Tried to deploy in Beta (deployment-prep) and scap fails after:

Installing setuptools, pkg_resources, pip, wheel...done.
gensim-3.8.3-cp37-cp37m-manylinux1_x86_64.whl is not a supported wheel on this platform.

The wheel looks indeed weird, cp37 should be cpython3.7 and we have 3.5 on Stretch. Did we change dependencies by any chance?

@Halfak there is a problem that I just noticed in https://gerrit.wikimedia.org/r/c/mediawiki/services/ores/deploy/+/755731, namely it seems that you changed the version of mwparserfromhell (downgrading it). We spent a lot of time around Thanksgiving to track down a segfault in celery, deploying this would get us back to the previous state.

Yes the dependencies got bumped to Python 3.7 indeed: https://gerrit.wikimedia.org/r/c/research/ores/wheels/+/748390

@Halfak @ACraze is there a reason behind bumping to Python 3.7?

To be clear: I would be extremely happy to move to python 3.7 but we are on Stretch for the moment :(

Oh! The move the 3.7 is kind of old. It was a (I think) two year old request from @akosiaris that we move to 3.7. We can go back though. I'll take a look at that and getting the most recent mwparserfromhell today.

@Halfak my understanding is that moving to Python 3.7 may cause troubles in various part of ORES (like un-pickling etc..) and that some models may need to be retrained/built. We are currently running on Stretch (Debian 9) and Python 3.5 for this reason, moving the target to Python 3.7 would also require us (most likely) to jump to a different OS. Can we revert https://gerrit.wikimedia.org/r/c/research/ores/wheels/+/748390 entirely and start from a more stable building env?

It shouldn't cause problems for unpickling. But it is a good idea to stick to the versions in the prod environment regardless. We'll want new version of the wheels built with python 3.5 anyway so I don't think reverting will get us much. I'll start the process now. Luckily, it's pretty easy. I should have a new patchset ready in a few hours.

It shouldn't cause problems for unpickling. But it is a good idea to stick to the versions in the prod environment regardless. We'll want new version of the wheels built with python 3.5 anyway so I don't think reverting will get us much. I'll start the process now. Luckily, it's pretty easy. I should have a new patchset ready in a few hours.

Thanks a lot for the work!

It looks like the mwparserfromhell change was manually changed without changing any of the requirements for revscoring. That's going to be an issue any time we try to rebuild the wheels.

I've created https://github.com/wikimedia/revscoring/pull/515 to address that going forward.

Change 761015 had a related patch set uploaded (by Halfak; author: Halfak):

[research/ores/wheels@master] Reverts back to python 3.5 wheels.

https://gerrit.wikimedia.org/r/761015

OK I think that patchset is good for review. We end up rolling back a lot of versions, but a quick spot check suggests these versions were present before we switched to 3.7.

OK I think that patchset is good for review. We end up rolling back a lot of versions, but a quick spot check suggests these versions were present before we switched to 3.7.

Thanks for the work Aaron, I left some comments because I see new dependencies that weren't there before (at least from a quick check on the commits before the last one) and celery gets bumped from 4.1.1 to 4.4.7 (should be ok but it is a critical component of ORES).

OK updates made.

I'm not sure why gunicorn got pulled in. I re-ran the pipeline for building the wheels and it didn't get pulled in this time so I removed it and its dependencies.

I've also downgraded celery to 4.1.1.

@Halfak rechecked and it looks better now, thanks! I'll let @ACraze to do a final check but we should be in a better position to merge and re-test the deployment.

I have to say that I am a little worried to deploy revscoring 2.11.1 in this way, there could be a potential backfire with models not being able to be loaded, I'd have preferred to keep 2.8.

2.11.1 has some useful improvements. I've tested the loading of models. But you're right, there can always be issues that pop up with any difference in versions.

I've kicked off a full model rebuild process. So far, I have fresh models built for all of the repos except for editquality.

Change 761015 merged by Elukey:

[research/ores/wheels@master] Reverts back to python 3.5 wheels and includes mwparserfromhell 0.6.3

https://gerrit.wikimedia.org/r/761015

Change 761946 had a related patch set uploaded (by Elukey; author: Elukey):

[mediawiki/services/ores/deploy@master] Update wheels submodules with latest changes

https://gerrit.wikimedia.org/r/761946

Change 761974 had a related patch set uploaded (by Halfak; author: Halfak):

[operations/puppet@production] Adds aspell-hi to ores/manifests/base.pp

https://gerrit.wikimedia.org/r/761974

Change 761974 merged by Dzahn:

[operations/puppet@production] Adds aspell-hi to ores/manifests/base.pp

https://gerrit.wikimedia.org/r/761974

Aha! I caught something rebuilding the models. Cheers @elukey for your insightful comment. See https://gerrit.wikimedia.org/r/761974. It looks like we somehow had the hindi dictionary (aspell-hi) installed on our model building server (ores-misc-01) but it wasn't included in the puppet config so it won't be in production. That would have made the new hiwiki editquality models unusable. This patch should fix that.

Mentioned in SAL (#wikimedia-operations) [2022-02-11T19:13:58Z] <mutante> running puppet on all ores machines to install aspell-hi (gerrit:761974) which for some reason was installed on a random subset of ores servers (1002,2001,2005 but not the other 19 ones) T300195 T252581 - after this the package is now installed on 18 servers (1001-1009, 2001-2009)

@Halfak @elukey For some reason aspell-hi was already installed on ores1002,2001,2002 (manual test?) or so..but not the rest of the servers.

I deployed the change and ran puppet on all ores*. Now it's installed on 1001-1009 and 2001-2009.

https://debmonitor.wikimedia.org/packages/aspell-hi

Thanks @Dzahn! Can you also run puppet on stat1007? I'm using that server to rebuild one of the models that needs this package. I believe it draws from the same puppet config.

@thanks a lot @Dzahn!

@Halfak package should be installed on stat1007, you are free to test.. When you have a moment lemme know if you want https://gerrit.wikimedia.org/r/c/mediawiki/services/ores/deploy/+/761946/ to be merged+deployed on beta for the weekend or not :)

@elukey thanks for asking. It would be great to get that on beta for the weekend. I'll be able to blow some smoke through it in the meantime.

Also, confirmed that I now have the package on stat1007. Continuing the broad model rebuild with revscoring 2.11.1. I should have a set of PRs to review by EOD (which can obviously wait until Monday).

Change 761946 merged by Elukey:

[mediawiki/services/ores/deploy@master] Update wheels submodules with latest changes

https://gerrit.wikimedia.org/r/761946

@Halfak this time the deployment to beta was better, but it failed during pip install due to:

Processing ./submodules/wheels/importlib_resources-5.4.0-py3-none-any.whl
importlib-resources requires Python '>=3.6' but the running Python is 3.5.3

In https://gerrit.wikimedia.org/r/c/research/ores/wheels/+/748390 importlib-resources was added (version 5.4), that wasn't there before. The importlib-metadata wheel was bumped from 1.6 to 4.9, I think that importlib-resources is a dependency.

Also, confirmed that I now have the package on stat1007. Continuing the broad model rebuild with revscoring 2.11.1. I should have a set of PRs to review by EOD (which can obviously wait until Monday).

I'd stop, if possible, to add changes to this deployment, it is getting bigger and bigger. If 2.11.1 works with the existing set of models we can go forward with what we have, and think about future changes later on. I don't want to stop changes to ORES but to avoid too many changes at once, deployments are more complicated when a lot of things change at once. Lemme know what you think :)

Also, last but not the least - stat1007 runs Debian 10 Buster, with python3.7, if you are testing on it and rebuilding models it will probably not work on the ores workers (running Debian Stretch and Python 3.5).

I understand your hesitation to minimizing changes to ORES deployments. I think the real issues is ORES gets a deployment once every 6 months and I'm blowing out cobwebs every time I ask for one to go out! I don't have the resources to fix any of our workarounds, so I've been just adding and documenting new workarounds every time we turn the deployment crank.

Revscoring 2.11.1 is revscoring 2.8.x with a better way of detecting references and some bug fixes. 2.9 and 2.10 got reverted because of overly complex language utilities that didn't get us where we wanted to be. While the number change may be intimidating, it's really quite similar.

Python 3.7 does not change the way that pickle works from 3.5. Our pickle situation (and everything related to models) is tied to a specific version of sklearn that we keep pinned. The pickle protocols haven't been updated between python 3.4 and 3.8. See https://docs.python.org/3/library/pickle.html

I'm not sure how we scooped up that version of importlib_resources. I'll look into that and report back.

I understand your hesitation to minimizing changes to ORES deployments. I think the real issues is ORES gets a deployment once every 6 months and I'm blowing out cobwebs every time I ask for one to go out! I don't have the resources to fix any of our workarounds, so I've been just adding and documenting new workarounds every time we turn the deployment crank.

I really appreciate all the work that you still put in ORES, and we are grateful for that. I have to disagree on the procedure though, since during this round of deployment:

  1. We started with a new model running on revscoring 2.8 and some logging changes. On paper few easy things, siloed and easy to control/check post-deployment.
  2. Revscoring 2.8 didn't process/read correctly the new model during tests (IIUC), so we decided to bump revscoring from 2.8 to 2.11.1. This is a big change not scheduled, added in.
  3. In the meantime, there was a big change waiting to be deployed to bump all Python wheels to 3.7 (and being on Stretch it is something that we can't do easily now). We only realized it after deploying to Beta, and probably not a lot of testing was done at the time. We discussed to push these changed out, another big change added in. We decided not to proceed and revert to Python 3.5.
  4. The procedure to generate wheels seems not very precise, this is the second round of changes that doesn't work in Beta (that we have to review/merge/etc..), it doesn't even reach the deployed state.

To be clear - I am not blaming anybody, I think that there were little errors in procedures from multiple people, but it is clear that we started from something and we are now talking about much bigger things (like rebuild models, etc..). I am happy to help deploying changes to ORES while we build Lift Wing, but at the same time I'd prefer to have small and well tested changes to be deployed at any given time, to ease everybody's work.

Revscoring 2.11.1 is revscoring 2.8.x with a better way of detecting references and some bug fixes. 2.9 and 2.10 got reverted because of overly complex language utilities that didn't get us where we wanted to be. While the number change may be intimidating, it's really quite similar.

I think it is great, surely reassuring, but at the same time we may have said the same thing for 2.9 and 2.10. There may be some issues waiting for us in 2.11 due to unforeseen corner cases, maybe only with some models/requests, and it would be great to deal with them without thinking about other variables at the same time.

Python 3.7 does not change the way that pickle works from 3.5. Our pickle situation (and everything related to models) is tied to a specific version of sklearn that we keep pinned. The pickle protocols haven't been updated between python 3.4 and 3.8. See https://docs.python.org/3/library/pickle.html

Yep I think on the pickle side, on paper, we should be good. We haven't really tested it so I am still thinking it is very risky. Moving to Python 3.7 would also imply upgrading all nodes to Buster, something time consuming (18 worker nodes) and we have not a lot of time for it at the moment (due to Lift Wing).

I'm not sure how we scooped up that version of importlib_resources. I'll look into that and report back.

Thanks a lot! Let me know if I can help :)

I think that there were little errors in procedures from multiple people

I totally agree. This should be automated and testing should happen through that automated process. Having someone turn the keys is a workaround. I've added a little bit of automation (see the update_wheels.py script in the wheels repo) but there should be more.

The new model was built with 2.11.1. The improvements to ref extraction were designed specifically with nlwiki in mind. I'm not sure how you would like to schedule a bump from revscoring 2.8.2 to 2.11.1. Any guideance here would be well received.

we may have said the same thing for 2.9 and 2.10

Those version changed the way we tokenized text in a big way. That is part of the reason we didn't put those into production. 2.11.0 was a direct copy of 2.8.2. A clean revert.

The upgrade to 3.7 was a very old request and I thought I was just getting us caught up. No worries though. I'm happy to have clarity there and to bring things back.

Ultimately, I'm working on this during breaks in my work day. So, I'm not the right person to manage or improve our deployment process. But for what it is worth, pushing these things to beta and running tests against it has always been part of our process. As the first place that all models, code, and requirements come together in a prod-like environment, it has proven useful for discovering issues that are hard to see before the full integration. It would be nice if we had a test environment that did this for individual bits of code/repos. There is an old task somewhere for doing that with the wheels repo because the wheels are often what we discovered issues with, but I can't seem to find it right now.

Regardless of all said above, I'm happy to work with your direction. It's hard for me to adjust now to feedback like "I'd stop, if possible, to add changes to this deployment" when I was doing something in response to a concern you had raised previously. I'm not trying to add anything. I'm trying to listen to concerns you are raising and address them as fast as I can. If you'd rather leave those updated models for later, that's cool with me. Just tell me how you want to move forward, and I'll try to get things organized in a way that aligns with how you want to proceed.

I think that there were little errors in procedures from multiple people

I totally agree. This should be automated and testing should happen through that automated process. Having someone turn the keys is a workaround. I've added a little bit of automation (see the update_wheels.py script in the wheels repo) but there should be more.

The new model was built with 2.11.1. The improvements to ref extraction were designed specifically with nlwiki in mind. I'm not sure how you would like to schedule a bump from revscoring 2.8.2 to 2.11.1. Any guideance here would be well received.

If it is compatible with other pre-existing models, you just deploy the version bump by itself, without adding other models or changes alongside with it. This is my idea, but it is a little late to implement it, let's do it next time.

we may have said the same thing for 2.9 and 2.10

Those version changed the way we tokenized text in a big way. That is part of the reason we didn't put those into production. 2.11.0 was a direct copy of 2.8.2. A clean revert.

The upgrade to 3.7 was a very old request and I thought I was just getting us caught up. No worries though. I'm happy to have clarity there and to bring things back.

Ultimately, I'm working on this during breaks in my work day. So, I'm not the right person to manage or improve our deployment process. But for what it is worth, pushing these things to beta and running tests against it has always been part of our process. As the first place that all models, code, and requirements come together in a prod-like environment, it has proven useful for discovering issues that are hard to see before the full integration. It would be nice if we had a test environment that did this for individual bits of code/repos. There is an old task somewhere for doing that with the wheels repo because the wheels are often what we discovered issues with, but I can't seem to find it right now.

Again we thank you a lot for all the time that you put in ORES, we are very grateful and we understand that this is not your day job. You are aware though that our future is Lift Wing, and during the next months we hope to transition to the new system. This means that the current way of deploying ORES models, including tests/storage/etc.., will be deprecated in favor of something else. We are building test capabilities and staging environments, in the future it will be way more easier to deploy and add models. We will keep working on the revcoring-based models, but the procedure will be way more different. Docker containers will also ease the job of whoever wants to contribute and test locally. So we are working on improving our infrastructure, and we are currently not planning to extend ORES' capabilities in this regard for time constraints (we are a little team and cannot concentrate on multiple big projects at the same time).

Regardless of all said above, I'm happy to work with your direction. It's hard for me to adjust now to feedback like "I'd stop, if possible, to add changes to this deployment" when I was doing something in response to a concern you had raised previously. I'm not trying to add anything. I'm trying to listen to concerns you are raising and address them as fast as I can. If you'd rather leave those updated models for later, that's cool with me. Just tell me how you want to move forward, and I'll try to get things organized in a way that aligns with how you want to proceed.

I think that you are not being constructive anymore, so I'll avoid to comment the above sentences. Let's just postpone any updated model for the next deployment please.

I really am trying to be constructive. I'm sorry but I think I came off badly.

I really am trying to be constructive. I'm sorry but I think I came off badly.

Not a problem, we can keep going with the deployment :) The next step is, IIUC, to fix the wheels issue, so that we can deploy to Beta and see how it goes. It seems mostly a problem with the extra importlib-resources pulled in, nothing else afaics.

Change 762868 had a related patch set uploaded (by Halfak; author: Halfak):

[research/ores/wheels@master] Removes importlib_resources that was picked up in python 3.7

https://gerrit.wikimedia.org/r/762868

Change 762868 merged by Elukey:

[research/ores/wheels@master] Removes importlib_resources that was picked up in python 3.7

https://gerrit.wikimedia.org/r/762868

Change 762872 had a related patch set uploaded (by Elukey; author: Elukey):

[mediawiki/services/ores/deploy@master] Bump the submodule wheel to include the latest changes

https://gerrit.wikimedia.org/r/762872

I'm not sure this change alone will unblock deployment to beta. I'm running some tests.

Change 762872 merged by Elukey:

[mediawiki/services/ores/deploy@master] Bump the submodule wheel to include the latest changes

https://gerrit.wikimedia.org/r/762872

Processing ./submodules/wheels/typing_extensions-4.0.1-py3-none-any.whl
typing-extensions requires Python '>=3.6' but the running Python is 3.5.3

Same issue but with another wheel!

Change 762894 had a related patch set uploaded (by Halfak; author: Halfak):

[research/ores/wheels@master] Removes old unused packages.

https://gerrit.wikimedia.org/r/762894

I'm trying to work through these now. It turns out sticking with python 3.5 is a pain point because many libraries have dropped support for it in recent versions. I needed to identify these issues and manually set some versions for our libraries.

One other issue and the reason we're getting bit by these wheels is that the process I implemented in "update_wheels.py" doesn't account for wheels that should be removed. So I'm doing that manually now.

After we get past this stage, I can make some recommendations for a process change. I think just deleting the old wheels before copying the new wheels over will resolve the issue because the deletions will be picked up by git.

I was able to start the server and load all of the models after removing those packages. It looks like a mixture of packages that were used in testing (e.g. pytest) and packages that got picked up for python 3.7

Change 762901 had a related patch set uploaded (by Halfak; author: Halfak):

[mediawiki/services/ores/deploy@master] Updates frozen-requirements after wheel cleanup.

https://gerrit.wikimedia.org/r/762901

I included an update to the frozen-requirements.txt that matches the updated wheels. I generated that set of requirements by running make deployment_wheels in the repo (which is the normal process). But I needed to manually adjust the requirements.txt for each submodule to make sure it referenced yamlconf==0.2.4 because yamlconf==0.2.5 (the latest) requires PyYAML==5.4 which is not compatible with python 3.5.

Note that this is related to a security issue that existed in PyYAML==4.2. We aren't adding a new issue here. But the whole point of switching to PyYAML==5.4 and releasing yamlconf==0.2.5 was to address this security issue. See https://github.com/halfak/yamlconf/pull/6 and https://github.com/halfak/yamlconf/commit/39bdf0ee32c09c14dab2a6623cdd47412bdb95fb

Change 762894 merged by Elukey:

[research/ores/wheels@master] Removes old unused packages.

https://gerrit.wikimedia.org/r/762894

Change 763178 had a related patch set uploaded (by Elukey; author: Elukey):

[mediawiki/services/ores/deploy@master] Bump the wheels submodule to pick up the latest changes

https://gerrit.wikimedia.org/r/763178

Change 763178 merged by Elukey:

[mediawiki/services/ores/deploy@master] Bump the wheels submodule to pick up the latest changes

https://gerrit.wikimedia.org/r/763178

@Halfak great work, deployment to Beta done, all good afaics. If you have time to test and check in there let me know :)

Thanks also for the check of pyyaml, unfortunately we have the same problem for celery, so we cannot really easily upgrade at the moment. Moving to Python 3.7 is a sizeable amount of work, we'll see what we can do.

Recap of the next steps (as I understand them):

  • Summarize the final list of changes to deploy
  • Test in Beta
  • Deploy to prod

Hey folks! Just got back from some vacation and taking a look at this. It seems like the deployment on beta is broken. The redis server is complaining about passwords:

E.g. go to https://ores-beta.wmflabs.org/v3/scores/nlwiki and get:

redis.exceptions.AuthenticationError: Client sent AUTH, but no password is set

I'm guessing some puppet run reinstalled redis and we need to set a password again. The beta password should be in a config file either in the deployment location or in /etc/ores/ I have some other deadlines pressing, but I can take a look on Friday if no one beats me to it.

@Halfak already fixed sorry, I was rebooting the instance.

I was trying to trigger the new logging, so I tested calls like curl -i localhost:8081/v3/scores/enwiki/1 but I keep getting a 5s timeout for the local proxy to mwapi (the same call on ores1001 works fine).

Aha! What are the chances!

I wonder if the local proxy isn't working in beta? Can you hit the local proxy manually?

I wonder if the local proxy isn't working in beta?

No-one had ever set it up there! I just did that and it seems to be working now.

In T300195#7732748, @Majavah wrote:

I wonder if the local proxy isn't working in beta?

No-one had ever set it up there! I just did that and it seems to be working now.

There is a manual nginx instance that we created when deploying the envoy proxy in there was not possible (there were some puppet constraints).

Stopped nginx (it was bound to port 6500), restarted envoy, definitely a better option now, thanks @Majavah

Edit: @Majavah the code that you committed works nicely but I just realized that it points to deployment-prep's mediawiki, that normally it would be perfect for integration testing but in our case it limits a lot the revisions that we can test (compared to prod).

It looks like we're still not getting back JSON from the local proxy API endpoint. Instead, it looks like we're getting HTML instead. Is this because we're still pointing to deployment-prep's mediawiki and it's returning an HTML page as an error?

Temporarily "fixed" it disabling puppet on deployment-ores01 and using text-lb.eqiad.wikimedia.org in envoy's config. We shouldn't really use prod endpoints for a beta testing, but for this use case (we need to test several models and fetch rev ids from a lot of wikis that are not in Beta) we need to add an exception. We could add in cloud's puppet a special mwapi-async-prod endpoint for the ORES use case, but it may be mis-used by others so not sure what's best.

I just realized that a RevisionNotFound error is a subclass of MissingResources, so nothing will be really logged if I try to score a non existent revision. Not sure how to trigger a feature extraction error, so we may have to skip this test (it is a minor one).

@Halfak besides specific testing, I am trying to come up with a list of URIs to execute as script to test all models and verify that nothing changed behavior inadvertently by a deployment. I'll work on it with @achou :)

For example, it would be nice to use https://wikitech.wikimedia.org/wiki/Httpbb

Quick example from deployment-deploy03:

elukey@deployment-deploy03:~$ cat httpbb.yaml 
http://ores-beta.wmflabs.org:
- path: /v3/scores/hiwiki/555
  assert_status: 200
  assert_body_contains: probability

elukey@deployment-deploy03:~$ httpbb /home/elukey/httpbb.yaml --hosts=deployment-ores01.deployment-prep.eqiad1.wikimedia.cloud --http_port=8081
Sending to deployment-ores01.deployment-prep.eqiad1.wikimedia.cloud...
PASS: 1 request sent to deployment-ores01.deployment-prep.eqiad1.wikimedia.cloud. All assertions passed.

Hi, I generated a list of URIs to test all models and they are all passed!

aikochou@deployment-deploy03:~$ httpbb /home/aikochou/httpbb2.yaml --hosts=deployment-ores01.deployment-prep.eqiad1.wikimedia.cloud --http_port=8081
Sending to deployment-ores01.deployment-prep.eqiad1.wikimedia.cloud...
PASS: 124 requests sent to deployment-ores01.deployment-prep.eqiad1.wikimedia.cloud. All assertions passed.

I checked /srv/log/ores/main.log, it looks normal. But in /srv/log/ores/app.log, it shows many lines of warning as follows:

2022-02-24 16:15:40,342 WARNING ores.scoring_systems.scoring_system: Can not lock in lock manager
2022-02-24 16:15:40,367 WARNING ores.scoring_systems.scoring_system: Can not lock in lock manager
2022-02-24 16:15:40,379 WARNING ores.scoring_systems.scoring_system: Can not lock in lock manager

I'm not sure what is that. :/

https://github.com/wikimedia/ores/blob/6ef6d22f8bc2800e4c7a25f99f5f4d9ab437cd72/ores/scoring_systems/scoring_system.py#L292

This is where that warning is coming from. The "lock_manager" is used to rate limit requests based on the incoming IP address. I don't recognize this code. It looks like @awight and @Ladsgroup originally committed it. I'm not sure if they are available to comment. See https://github.com/wikimedia/ores/pull/260 for the original pull request.

Sometimes it happens in prod as well afaics, it doesn't seem a regression as far as I can see. @achou nice work :)

@achou I just noticed that https://gerrit.wikimedia.org/g/operations/puppet/%2B/production/modules/profile/files/httpbb holds config files for other projects, that puppet automatically deploys. Can you send a patch with your work so it gets saved?

My spot checking looks good on Beta. I see performance improvements where expected and consistency elsewhere.

Thanks a lot @Halfak for the tests!

At this point I think that we are good to think about a prod deployment. I am going to discuss this with the team in today's meeting and write back to the task.

Change 766590 had a related patch set uploaded (by AikoChou; author: AikoChou):

[operations/puppet@production] httpbb: Add some tests for ores

https://gerrit.wikimedia.org/r/766590

Change 762901 merged by Elukey:

[mediawiki/services/ores/deploy@master] Updates frozen-requirements after wheel cleanup.

https://gerrit.wikimedia.org/r/762901

Change 766590 merged by Elukey:

[operations/puppet@production] httpbb: Add some tests for ores

https://gerrit.wikimedia.org/r/766590

last commit on ores1001:

elukey@ores1001:/srv/deployment/ores/deploy$ git log
commit 69ed06126017034d1b3f8ad68b13365b19467514 (HEAD, tag: scap/sync/2021-11-28/0001)
Author: Luca Toscano <ltoscano@wikimedia.org>
Date:   Sat Nov 27 17:46:24 2021 +0100

    Bump mwparserfromhell dependency to 0.6.3
    
    Bug: T296563
    Change-Id: I541cdc3d0832cd0137eb5e5b88d8a22288e842bf

Mentioned in SAL (#wikimedia-operations) [2022-03-01T15:35:14Z] <elukey@deploy1002> Started deploy [ores/deploy@29de1cc]: ORES Winter deployment - T300195

Mentioned in SAL (#wikimedia-operations) [2022-03-01T16:11:27Z] <elukey@deploy1002> Finished deploy [ores/deploy@29de1cc]: ORES Winter deployment - T300195 (duration: 36m 13s)

Deployment completed!

The new hiwiki goodfaith/damaging models seem to work:

I also see new logs flowing in like:

Feature extraction error for model 1572393231 and revision itemquality due to: JSONDecodeError: Failed to process datasource.wikibase.revision.entity_doc: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
...

Feature extraction error for model 1074690851 and revision articletopic due to: Timed out after 15 seconds.
...

These are the new observability logs that we added, that will be very useful to debug scores errored. The error message is not right, my code change swapped model/revision in the msg, we'll fix it :)

This is great, thank you so much elukey! It already works so much better!!

I informed the community and asked for their feedback in the local Village Pump, and bring it back if there are any problems.
https://nl.wikipedia.org/wiki/Wikipedia:De_kroeg#Update_ORES-kwaliteitsmodel_voor_artikelen

@Ciell thanks a lot!

I created https://github.com/wikimedia/ores/pull/357 as follow up for the logging issue (not really urgent).

elukey claimed this task.

Closing this task since the deployment seems to be successful. Please re-open if anything pops up during the next days or if you see anything weird going on with Ores scores. Thanks all for the work!