seems related to T201924: Can't create new proxies so I'll close but reopen for the cloud folks if really not resolved still
novaproxy domains that should go along with these
Mon, Aug 13
| f4d46a5c-f8cd-405d-9cbf-82710a9564e2 | two-factor | SHUTOFF | public=10.68.21.34 | | a82089e3-b0c7-4f01-a2d3-4d15dc113ade | jobs | SHUTOFF | public=10.68.17.248 | | 49fb74ed-734a-4332-9fb3-6d2c4903e98e | openvas | SHUTOFF | public=10.68.16.85 | | b0859108-799d-4dbc-a4cc-d2179686bcf4 | keystone2 | SHUTOFF | public=10.68.21.241 | | fbed895f-ccfe-4403-ba13-5a12183daab9 | xsstest | SHUTOFF | public=10.68.17.178, 22.214.171.124 | | 1f5637e7-ecc8-482c-8b3b-6ded44878c06 | torproxy | SHUTOFF | public=10.68.18.15 |
Since no one seems to know what most of these are for I removed one in https://phabricator.wikimedia.org/T194150#4499280 and am shutting down the others to flush anyone who cares out of the woodwork :)
Considering https://phabricator.wikimedia.org/T161107#3123690 I am going to purge scanner00.security-tools.eqiad.wmflabs
Wed, Jul 25
Small note we had major troubles today w/ 1007 going away and though instances by all accounts should not have cared they did and load was out of control. It sure seemed like instances were trying to mount 1007 but from what directive I don't understand. /etc/fstab had the entry removed and puppet as disabled. Rebooting a tools-worker and cheating to make it appear no NFS mounts were at that IP seeemd to settle things but I don't feel like I know much about what is going on
Sun, Jul 22
Sat, Jul 21
Thu, Jul 19
After a bit of poking and feeling like things /should/ be working if the right scripts were run I issued:
I did a survey of instance count and load on existing hypervisors in the scheduler, and then took stock of how the existing spread was here for these nodes and it already looked fairly good. In the end I created the 6th node on a distinct hypervisor and the current situation is:
Wed, Jul 18
Oh I see they are created :). Could we start from scratch/are they configured?
It is indeed easier to distribute them at the time of creation but yeah that's a root function. I am not sure if we have 6 virts that have headroom. What size instances and would 3 virts with 2 each work?
We are going to review the Security workflow soon (august or sept?) so I'll keep this in mind
Jul 12 2018
+ /* Cloud public prefix via labnet100 */ + route 126.96.36.199/25 next-hop 10.64.22.4;
Jul 11 2018
I have been hit too :)
Jul 10 2018
Jul 9 2018
wait, can we make these cloudvirt1023 and cloudvirt1024? I think @aborrero is getting into the naming adjustments now.
Jul 8 2018
Jul 5 2018
Want to reimage as cloudvirt? Would need subtasks to handle the long list of things to update :)
@Andrew which labvirt should we put in eqiad1 deployment?
Jul 2 2018
Jun 30 2018
This also needs to remove from fstab. I really suspect we should have nfs mounts managed outside of the normal puppet mount provider and not have them automounted on boot but instead through some explicit puppet action as otherwise a missing NFS server is really tricky to recover from.
Jun 28 2018
labstore1007 has been restored to service and NFS clients and web users are pointed at it (https://gerrit.wikimedia.org/r/c/operations/puppet/+/442913)
We ran into trouble here:
possibly related to T196651: rack upgraded storage capacity in labstore100.eqiad.wmnet?
@Volans can you help make sense of this?
and labstore1006 as well? [from irc]
I don't quite understand this. Is this trying to say 6 failed drives?
Jun 27 2018
dug up an old task that said rootdelay is the way to address this in jessie, with permanent fixes having landed in stretch. So far that seems to have corrected the issue. Both labnet1003 and 1004 are now running jessie using 10G nics. Need to get a bit further to confirm the instance facing interface works but I'm considering this closed until I know better ;)
We moved past the DHCP/NIC issue and now are failing with
trying https://gerrit.wikimedia.org/r/c/operations/puppet/+/442295 to see if there is any change
Attempting Boot From NIC
Jun 26 2018
openstack role add --project admin --user novaadmin admin
Jun 25 2018
I don't think there is anything for us to do other than notify cloud-announce, is there?
Please use the template from https://phabricator.wikimedia.org/project/view/2880/
Jun 20 2018
Jun 14 2018
We met today to sync up on moving the remaining lab* servers. Hopefully these days/times al work for @Cmjohnson (I added him to the calendar invites to confirm)
timeout 180s bash -x /data/project/paws/paws-userhomes-hack.bash + true + find /data/project/paws/userhomes/ -maxdepth 1 -user root + xargs -L1 chown -v tools.paws:tools.paws changed ownership of ‘/data/project/paws/userhomes/51990302’ from root:root to tools.paws:tools.paws changed ownership of ‘/data/project/paws/userhomes/8631757’ from root:root to tools.paws:tools.paws changed ownership of ‘/data/project/paws/userhomes/54479680’ from root:root to tools.paws:tools.paws changed ownership of ‘/data/project/paws/userhomes/1281’ from root:root to tools.paws:tools.paws changed ownership of ‘/data/project/paws/userhomes/54533986’ from root:root to tools.paws:tools.paws changed ownership of ‘/data/project/paws/userhomes/54469129’ from root:root to tools.paws:tools.paws changed ownership of ‘/data/project/paws/userhomes/54485747’ from root:root to tools.paws:tools.paws changed ownership of ‘/data/project/paws/userhomes/54475927’ from root:root to tools.paws:tools.paws changed ownership of ‘/data/project/paws/userhomes/52360419’ from root:root to tools.paws:tools.paws changed ownership of ‘/data/project/paws/userhomes/53338127’ from root:root to tools.paws:tools.paws changed ownership of ‘/data/project/paws/userhomes/46482467’ from root:root to tools.paws:tools.paws changed ownership of ‘/data/project/paws/userhomes/54491825’ from root:root to tools.paws:tools.paws changed ownership of ‘/data/project/paws/userhomes/5424947’ from root:root to tools.paws:tools.paws changed ownership of ‘/data/project/paws/userhomes/54476398’ from root:root to tools.paws:tools.paws changed ownership of ‘/data/project/paws/userhomes/54485761’ from root:root to tools.paws:tools.paws + sleep 1 + true + find /data/project/paws/userhomes/ -maxdepth 1 -user root + xargs -L1 chown -v tools.paws:tools.paws chown: missing operand after ‘tools.paws:tools.paws’ Try 'chown --help' for more information.
Jun 13 2018
ping @aborrero who indicated he had seem a similar issue in the past
@Cmjohnson could you describe a bit what you've tried to get the 10G ports to work?
Nope, it's entirely possible that multi-region glance is just that simple because glance isn't too picky. I think though that what you want is glance running on labtestcontrol2003 (debian) to use the databases from labtestcontrol2001's glance. small change here and if this works I imagine that's no big deal.
Jun 12 2018
We decided to mark this as stalled for our next meeting as we need to get input from team members who were not present. Nothing bad here just a very busy few weeks.
Created sciencesource project. Let me know if you have issues!
created opencr project with you as a member. Let me know if you have any issues.
I think this is ready for OS install and such? I spoke with @Bstorm who is going to take this on and may need some advice.
approved as 'general-k8s'
By chance was there an update to some global ferm variable?
I believe tools-worker-1029.tools.eqiad.wmflabs needs to be deleted, but I'm unsure why these three are cordon'd. relic of forgotten maint?
Jun 11 2018
A space for this type of work does exist but was more narrowly scoped at the time. All the stakeholders there have since left. Instead of creating a new space I did some shuffling to reuse S11 seen in T150046. S11 is owned by https://phabricator.wikimedia.org/project/profile/2356/ for editing the space itself, and visible to those folks and acl*operations_team. I also disabled the old herald rule H205.
root@labstore1003:~# /usr/local/lib/nagios/plugins/check_raid megacli OK: optimal, 5 logical, 34 physical OK
Jun 8 2018
I totally forgot to tell @Bstorm we have announced the invasive parts of this in the past, old example: https://lists.wikimedia.org/pipermail/labs-l/2016-May/004493.html