User Details
- User Since
- Sep 30 2014, 4:39 PM (229 w, 1 d)
- Roles
- Administrator
- Availability
- Available
- IRC Nick
- mutante
- LDAP User
- Dzahn
- MediaWiki User
- Unknown
Today
Fri, Feb 15
Hi, generally access is granted based on admin groups instead of individual host names. So it can move to new servers without having to make changes.
Double checked the 3 listed users are all in "wmf" and should have access as confirmed by TJones.
reverted in T216199
I did not start the "non-interrupt steps" now or anything but i marked the first 2 check boxes there because it's already removed from puppet (was never in /removed as bast3003 / was removed as amslvs) and it had already been powered off quite some time ago.
I removed production DNS entries but kept mgmt because the host does not have mgmt entries by asset tag. The DRAC IP is: 10.21.0.109 and it's still reachable under bast3003.mgmt.esams.wmnet.
Thu, Feb 14
The other 2 changes above i should have linked to T201366 but it's kind of related.
If the file in question does not exist, puppet will create it, avoiding this issue from happening next time we apply the role to a fresh host. If the file already exists it will not change the content though to allow testers to edit change IDs without having to submit puppet changes each time.
please excuse me, this is a test
deployed in production (phab1001) , please test that mail still works
Wed, Feb 13
@Krinkle The change above has been merged today. This removed the non-working sql / sqldump scripts from canary appservers, including mwdebug1002.
Doing 'Active -> Staged' transition https://wikitech.wikimedia.org/wiki/Server_Lifecycle#Active_-%3E_Staged
This issue happened today on ms-be2021
.Mon, Feb 11
Pinging @jrbs to help answering that question, per IRC.
It was in the SRE meeting and there were no objections (SRE-2019-02-11#Access_Requests)
Normally would have merged Gerrit change but see comments from Moritz there, he said we should wait until buster is installed and he will take care of it once approved.
Approved in SRE meeting (SRE-2019-02-11#Access_Requests)
Sat, Feb 9
I started making a new class to replace the current one without having to change it directly.
@Slevinski One way we could do this is to create a new instance where all software versions are newer and we can confirm things work before your websites are changed in any way and then switch it over from the old instance and delete the old instance. The fewer things have been manually installed the easier it would be.
Fri, Feb 8
@Slevinski Not yet, i would warn you before i actually merge any changes. But be aware this role is currently broken already (for quite some time and unrelated reasons) so it means you currently can't upgrade your OS version or create a new instance, you would run into puppet errors and installation of mysql and apache would fail if you were to apply this on a new host. And we don't want that of course, we want you to be able to upgrade your project to use the latest distro version stretch and newer Apache and PHP versions ideally.
tested on a fresh instance. still broken, ack
It looks like hieradata/labs.yaml in operations/puppet should be disabling apparmor, but it still seems in effect:
mysql::server::use_apparmor: false
puppet is creating her user on all the relevant servers right now. in max. 30 min it should all work
Was it really both "passive checks" and "downtime stopped working" at the same time or just one of them?
P.S. running it in codfw is blocked on unrelated things (lack of dbproxy) and the host currently called phab1002 with 32GB would immediately go back to pool
Hi @faidon, let me explain. It was never a request for running 2 phabricator hosts in each datacenter. That's a misunderstanding.
lowering priority since Subbu is unblocked and can use the new box and we have switched varnish over. the remaining part is just some cleanup i should do to make it better next time we upgrade
deployed in production and we tested mail still works. this just adds the new config and does not remove the old config though
[puppetmaster1001:~] $ sudo -i confctl depool --hostname mw1299.eqiad.wmnet
it's back up and running right now but depooled because this isn't the first time it happened on this machine
20:12 < mutante> [mw1299:~] $ depool 20:12 < mutante> Depooling all services on mw1299.eqiad.wmnet
Thu, Feb 7
Aha! This makes it much more doable. Yes, i am open to making a special table with just these.
@Joe I recommended starting it, partially because i thought the outcome to use a VM was pretty likely and partially because actually listing what resources are needed might be a valuable input for the discussion.
I didn't know about the fandom part. Any idea roughly how many wikis exist in each domain? Are there reliable ways to get a list of all of them?
Yea, it's a known issue since many years. The number of wikia wikis was too large and i didn't have a good way to import a list of all wikia wikis and keep it updated. Also fetching stats from each one would take a really long time. Once i was told there was a contact at Wikia to talk about that but that never materialized.
If we use "present" (and not a specific version or "latest" either) we would get whatever the latest version is at the moment of installing a new server. A change in version at the time of installing a new server would be more expected than one on a random Sunday?
The new check "Gerrit JSON" works now:
Wed, Feb 6
Thanks @Nuria! setting status to stalled to reflect that we should wait.
done! @brennen You have been added and things depending on these groups should work now. Welcome to WMF!
@Ha78na Resolved! you have been added to ldap/wmf. Everything that relies on it should work now.