Tue, Nov 28
Am now mostly convinced this is because of hairpin mode problems - a pod trying to talk to a service IP for itself runs into issues reaching itself, causing the 599 issues.
Oct 4 2017
on a personal note, at some point in the debugging I was ready to give up and let PAWS die - thankfully with some encouragement from @bd808 and @chasemp it did not happen. However, without stronger institutional support, I'm unsure how much longer PAWS can survive. @bd808, @chasemp and @madhuvishy need more help & resources if PAWS is to continue to thrive.
Sep 12 2017
From my experience with uptimerobot, it's quite flaky - we got one or two false positive alerts each day from it. I've switched to using Pingdom for that reason.
Sep 6 2017
is this still the case?
It would be great if someone could make a PR on https://github.com/yuvipanda/paws fixing it :) If not I'll try to fix this next week.
Aug 29 2017
Is this still the case with paws.wmflabs.org now? It's running on a completely different backend!
Aug 20 2017
It's possible that you just ran out of RAM and the kernel died? I think we have a 1G memory limit...
Aug 19 2017
I've switched the backend of paws completely as of today, and that should help fix most of these!
I'm going to say 'too late' and close this, unfortunately.
Heya! This should be fixed now, since I deployed a new backend for paws.wmflabs.org. Let me know if it isn't!
I deployed new backend for paws.wmflabs.org now, and it should allow you to download PDFs. Please re-open if it doesn't!
What browser are you on?
Fixed and deployed. paws.wmflabs.org should have it in about 20 minutes or so.
Done now! \o/
I've shut it down.
Feel free to junk the mattermost repo.
Aug 9 2017
I don't think anyone is using this project, and I've no time, so am happy to just shut the tool :)
Aug 3 2017
Here's what I want to do:
@Halfak should be all sorted out now!
Aug 2 2017
I'm trying to set up auto deploys to PAWS, so I'll wait till I'm done with that.
@Halfak that's strange. I totally do see an 'R' in the new dropdown menu....
@MarcoAurelio - is that showing up at paws.tools.wmflabs.org and not at paws.wmflabs.org? Or is it showing up at both?
can you try https://paws.tools.wmflabs.org? It's a new installation of PAWS (you get to keep all your old files!). I'll switch out the URL in a couple days if it goes well :)
Heya! https://paws.tools.wmflabs.org is the newer version of PAWS (URLs will switch shortly!). It does have an R kernel! Can you check it out and confirm?
@MisterSynergy can you (and others!) try out a new install at https://paws.tools.wmflabs.org? All your old files will still be here, but the authentication (and other code) is vastly improved and stable. I'll point paws.wmflabs.org to this in the next few days. Try it out and see if that solves your problem?
@Ebraminio cool! I'm going to upgrade Python to 3.6 as well (it's currently 3.5 I think) :) npm will also be available by default.
@Ebramino can you try https://paws.tools.wmflabs.org? It's going to replace paws.wmflabs.org shortly. Same everything, just a different underlying base.
Since labs in general hates it when I try to create new instances, it took me almost 20 tries with lots of instances failing, but I now have ten nodes! Hopefully I'll not have to touch instance creation again for a long time. It's quite frustrating... :(
It's at 5 nodes now, and I've run out of quota in tools now.
Not worth it.
Aug 1 2017
Going to close this for now!
Closing, since the modle of pasting won't really ever work with remote kernels like PAWS :)
Heya! Thanks for filing this!
To be clear, I don't actually care too much about the size of Tools' k8s cluster :) I only want about 8-10 nodes for the PAWS cluster :) Am happy to let toolforge admins decide if they wanna down size tools' cluster
Things seem to be working now! Thank you very much, everyone involved! I'll leave it to someone else to close this ticket :)
Nah, don't care for now. Can move whenever other stuff moves :)
Yup, new cluster!
Note that I'm now running into errors like:
Jul 30 2017
As a note, @zhuyifei1999 has graciously offered to look at some of the outstanding Quarry issues. I've given them merge rights + labs admin.
Jul 18 2017
Yup, we can keep a static version running forever.
I still want it, but I'm part of the ops group anyway. So in the interest of keeping this list clean you might remove me (I was added to this group before I became ops)
Jul 12 2017
All of the things I was thinking of and can think of now seem terrible and seem to enable our current set of terrible practices. I'll leave it to the current stewards of tools to figure out what to do :) I do agree that enforcing a vcs seems best possible option.
Jul 10 2017
Note that 1.7 landed https://kubernetes.io/docs/admin/extensible-admission-controllers/ which will allow us to remove all of our custom patches used in tools.
Jul 6 2017
You can get a list of all pods with kubectl get --all-namespaces pods and then do bash magic from there.
When I was doing it, I'd just do some shell scripting to delete all the pods in all namespaces that aren't paws. k8s will start them back up.
Jul 4 2017
Jul 1 2017
You'd have to do some amount of proxy magic to get it to work with mediawiki auth. We should have an authenticator running as a separate app. The proxy should check for a cookie, and try to validate it (with a HMAC, shouldn't be too hard). If it fails, or there's no cookie, we'll redirect to our authenticator, which will do the MW OAuth flow and set a cookie. On second attempt, we'll see the valid cookie in the proxy, and set a trusted header for Redash to consume.
k8s 1.7 was released yesterday with support for node-local persistent storage: https://kubernetes.io/docs/concepts/storage/volumes/#local
Jun 30 2017
Jun 22 2017
Jun 20 2017
Going to keep it inside tools!
I just deleted these :)
Jun 19 2017
Jun 16 2017
I've a working cluster on the PAWS project now! https://paws.deis.youarenotevena.wiki/hub/login :D
https://phabricator.wikimedia.org/T168039 for an IP quota increase :)
Jun 13 2017
On chatting more, if I have to use puppet then using the tools puppetmaster will make my life easier. So I'm going to prototype this on the paws project and move it to tools if using the puppetmaster will make my life easier in any way.
One of the things I'd like to do is to use an nginx ingress directly for getting traffic into the cluster, instead of using the tools-proxy machinery. My plan is to totally not use puppet at all and try a coreos type setup - you set up an image once with cloud-init type thing, and then there are no changes to it ever (except base puppet in our case, which won't have any roles related to k8s). You just make new instances for upgrades, and run everything in containers.
Jun 12 2017
It looks like everyone's onboard with this plan, so I'll start poking at it in a week or so.
Jun 8 2017
Jun 7 2017
@madhuvishy +1, I'd love it to be a separate share!
@madhuvishy if this happens, I'll also need to transfer the entire contents of the paws tool dir on tools NFS to this new project's NFS. And you (I think?) need to be ok with the paws project getting NFS enabled :)
@Andrew right. However, it's already the case tho - PAWS right now is still mostly reliant on me, for mostly resourcing reasons. The way I'd do this is to make it quite easy for people to just follow kubeadm upstream tutorials on setting up on labs (maybe even make it into a wikitech page) so other people who might want to use it can. I also believe that kubeadm is the correct long term solution for both tools and prod, so more people playing with that doesn't sound bad...