There are three ways to handle this, I'm planning to do all to make sure it won't happen again. Each of them might take one day or two to implement and deploy.
With T160692: Use poolcounter to limit number of connections to ores uwsgi implemented, this is basically done.
It's done :]
Sun, Sep 23
It's a little bit hard to understand the query in P7570 (for example do you mean page_title.judgment_page instead of judgment_page.page_title?) but I can suggest trying to select from jade tables (as they are smaller) and then join them with revision table specially since we are using the PK index. It's worth noting that from my basic knowledge and according to my bible, It should not matter whether you join jade with revision or the way around and the optimizer should understand and changes order of the join but in reality things might be different and it's better not to risk it.
Fri, Sep 21
Thanks. I take it over from here.
Tested with my hammer, if you go above the limit, you get it, if you go below the limit you never get it. Hence this is resolved.
That's basically sending 429 to random users.
Thu, Sep 20
I have one question. Do I have to add words with and without accents?
Yes please. That would be great.
@Elisardojm Thank you! It's great. One thing left for me before moving on is list of informal words that you can extract from the generated list. Per the help page:
This is pretty funny because for one revision it works as expected: https://ores.wikimedia.org/v3/scores/enwiki/?models=articlequality&revids=64196208800&features
Random observation: should we be using a discovery address rather than en.wikipedia.org?
We don't have enwiki.discovery.wmnet, we have lvs nodes for mw nodes and we need to hit lvs ones with enwiki determined as the host in header. That's lots of work :(((
I don't think it should be a blocker to the train as it's due to changes happening to the service (T160692: Use poolcounter to limit number of connections to ores uwsgi) but it should be pretty high priority (and I will get it fixed)
Yes, basically for each deployment we get an overload error spike: https://grafana.wikimedia.org/dashboard/db/ores?refresh=1m&panelId=9&fullscreen&orgId=1&from=now-2d&to=now-1m
The reason is that all restarts basically happen at the same time, is there a way around that in scap?
This is done.
Wed, Sep 19
This reason was that it just locks the IP and then immediately release it: https://github.com/wikimedia/ores/pull/262
I reviewed all graphs and can't find what you're saying. The job failure rate is around 0.4% and it never was zero: https://grafana.wikimedia.org/dashboard/db/ores-extension?orgId=1&from=now-30d&to=now
This doesn't also give any jumps after the switchover or deployment of wmf.20: https://logstash.wikimedia.org/goto/85e622ce496419c9cd87483793aeabf5
Do you have a link to logstash handy?
It seems strange because grafana says we didn't have any overload error except on spike (That was because of a deployment, it's fine): https://grafana.wikimedia.org/dashboard/db/ores?refresh=1m&panelId=9&fullscreen&orgId=1&from=now-30d&to=now-1m
The colors design team has determined are different from what it's been used by the website. This page on colors in the design style guide written by WMF design team for example defines 36c as blue (main accent) but I can't find any notion of that color in the website. Also we have M82 (basically the same). Did you know about these color guidelines?
I think the nodes belong to multimedia team not ours. Correct me if I'm wrong.
tofawiki is done, That was a beast, I need to puppetize it.
Tue, Sep 18
Merging this patch would resolve this task: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/413891
Mon, Sep 17
So for zuul, we can control it using tox.ini file. I already did it for two python repos: https://github.com/wikimedia/wikibase-property-suggester-scripts/blob/master/tox.ini https://github.com/wikimedia/pywikibot/blob/master/tox.ini
I think that would suffice and even we need root access to install something related to LFS, we can simply install it on all nodes using puppet rules. It's pretty straightforward. I'm not sure if really need to do anything here for now.
Tell me if I can do anything. Right now, it's not determined which way we should go and how Rdf exports should look like.
Fri, Sep 14
It's fixed now.
Wed, Sep 12
Tue, Sep 11
This is a basic POC:
I give this a try.