Based on feedback from Research and Legal, our requirements for this are:
Thu, Oct 15
Approved! Thank you much Amir for the help!
Oh awesome. I was just working through the docs and the next part was the floating IP section, which I didn't have a quota for.
Fri, Oct 9
I was thinking of sometime more simple: using fabric as a local script (run manually) to ssh into bastion, jumpproxy to an ores box, restart the service. Then I can just loop that if we need to restart ores. It isn't a long term solution but it beats sshing into each one manually
Thu, Oct 8
Due to T263910: ORES redis: max number of clients reached... continuing to be a problem (I suspect), I am bringing back this task. Cumin isn't an option because it is SRE-specific. I'll get on coding it up in fabric.
Suspiciously similar just happened to ores on eqiad. I reset the uwsgi service on all ores100x boxes and will monitor.
Wed, Oct 7
That is a good point. I should move this to a backlog
Tue, Oct 6
@Aklapper Not yet! That is going to be some upcoming phabricator tickets. :)
Mon, Oct 5
Thanks @ACraze. I am talking to legal about it. They'll have the final say.
Hey! Yep, the modeling work needs to be completed first.
Fri, Oct 2
I just had to do another service restart.
Thu, Oct 1
No objection from me. The ORES data referred to in this ticket is not sensitive or PII.
Wed, Sep 30
I'm going to close this ticket because the issue is being caused by Okapi and will be resolved soon enough that this script isn't required.
I think the even simpler solution would be to change payment method to a Wikimedia credit card, that would remove any individual's liability if something goes wrong.
Tue, Sep 29
Nope no deploys have happened recently. It has been happening every few hours since the 24th
If nobody else has worked on this yet I can take it.
Mon, Sep 28
@Ladsgroup Let's figure out a way to get your personal paypal out of there. That is unfair to you and a vulnerability to the system. Now we have about 10 different reasons to have our call!
Sun, Sep 27
Any ideas on causes and solutions @Halfak?
I've sudo service uwsgi-ores restart on all ORES200x boxes again. This fixes the problem temporarily (~24 hours)
and tomorrow we'll start on a permanent solution
Sat, Sep 26
It started happening again, I went into each Ores200X box and manually sudo service uwsgi-ores restart to restart it.
every time it starts at 16:00
Thu, Sep 24
I am not certain what this ticket is about so I'll admit I have no thoughts smart or otherwise. Give me a bit to dig into the ticket discussion and figure it out
I've reopened it. We have been clearing out the backlog and this must have gotten caught up in it. Apologies from me
@Ghassanmas Huge thanks for the offer to help. To give context, we are planning on retraining all the models in the future from new data which would solve this particular task.
@kevinbazira where did we end up with this?
@Aklapper @Chtnnh @Nuria For the next 9 months the team will be focused on building a model training, deployment, and management infrastructure. We can deploy models other folks create but we don't have the bandwidth to tackle this ourselves.