Thank you @herron ! Yes, confirmed. No more duplicates right now.
To SRE doing clinic duty: This is an easy one. Really just need to move the existing user from group "restricted" to group "deployment" and the other steps are all done from the previous request.
percentage of jessie systems left as of today (because we were asked): 4.2%
mw2187, mw2188 are new canary appservers, replacing mw2271, mw2272
I'm afraid this is one of those tickets that you can never really close because at any given time there will be 1 or more feeds with errors. Some will be fixed later, some will go away, some will change URLs due to new software... It is just a constant maintenance and one has to check the logs every once in a while. I am not sure I want to keep a ticket open forever though.. .so at some point it will have to be "good enough for now".
The 2 boxes not checked on this ticket are meanwhile working again. A good example why deleting all feeds with an occasional 404 too quickly is not the best idea.
Fri, May 22
Technically resolved because we made more than enough room for the 5 (not 15 anymore, 10 were used for T252185) servers.
23 servers from rack C3 have been decom'ed. mw2150 through mw2172. (lower part of the rack)
@Papaul 20 servers from rack C3 have been decom'ed. mw2150 through mw2169. (lower part of the rack)
reopening because i am decom'ing servers in T247018 and that included some canaries.
These were changed in https://gerrit.wikimedia.org/r/c/operations/dns/+/595959/2/templates/wmnet
As disussed on IRC, i'd rather not use /srv/docroot and would prefer instead if we can clean out the repo called "docroot" to actually just contain files for the docroot.
Is there a specific thing we are waiting for?
@Mjohnson_WMF I temporarily added an alias for grants@ to myself on the mail servers and then hit the password recovery form again. I see it has now delivered an email to you. Please check if you received it and you are now unblocked.
It seems like the previously existing email@example.com Google group has been deleted as part of T191881 or otherwise.
Thu, May 21
wmcs team reported not being able to do installs from the cloudvirt VLAN (cloudnet1004)
please also see T246945 which might conflict with this effort and introduce yet another place for documentation rather than having a single place for developers to go to
This has happened and an announcement has been sent to ops and wikitech-l lists.
@lmata Welcome! This is done. You should now be able to log into Icinga, Grafana etc.
unassigning from me because i got the "cookie-licking" warning from T228575
I think option 3) is the easiest of all, has no risk to break existing checks and we already have many different check_commands using check_http in different ways so another one should not hurt us.
Giving it back to the pool and setting to stalled because of ongoing discussion whether this should be on a dedicated VM or on mwmaint.
Thanks for taking care of this @RLazarus I can confirm Max's user exists on the Phabricator prod server, is in the new group and that group has the sudo privileges to run the requested comment.
from the log files on bast1002 and cumin1001 I can see there are 2 different keys involved.
Wed, May 20
added to Wikidata: https://www.wikidata.org/wiki/Q94952969
Added to wikistats.wmflabs.org
re-added now that the wiki has been created.
Thanks for reporting. Likely it will be similar to:
+---[RSA 2048]----+ +---[ECDSA 256]---+ +--[ED25519 256]--+
|. o.o.||. + . ..|
|.||. E + o.||X o. ..|
|+||. +.* . .||o o *+.. ..|
|+ = S o||+S+.=.o .||. B = oS.. .|
|E o & = .||+ *o+.=.o .||= X .+E. .. .|
|o* .. + o X||. = Xo=o. o||= =.. o ..oo.|
|= =o+*o+ = B||. o +.B .o||o +.+ .. .oo.|
|. .=OB+o=.o||... .o=o...||.*.... ..|
+----[SHA256]-----+ +----[SHA256]-----+ +----[SHA256]-----+
A VM called malmok.wikimedia.org has been created and can be used now. Currently it has the "insetup" role in site.pp.
@ssingh The VM has been created (now with public IP).
@soworu I found the following comment in the Apache config of superset:
@soworu Looks like it works now, i think i just saw you login in the log files, am i right?
@soworu Try again now, please. Looks like i forgot to add the -01 part when adding you the right group. Sorry about that.
Please try the following variations of the username including the capitalization:
This is apparently stalled because T252910 has been declined with a reason that there were only few commits in a specific repo. That seems to cause a dependency cycle though which i find pretty unfortunate because what EddieGP describes here is totally legit and really the worst possible outcome.
These 4 hosts have been reimaged and now have RAID5 instead of RAID1 after gerrit:597261
Handing back over for the next "init" command steps you have mentioned are needed next.
@akosiaris All of these hosts have RAID5 now:
Enabled remote IPMI on these machines which was disabled but is needed. (wikitech how to
Ready to create Ganeti VM malmok.codfw.wmnet in the ganeti01.svc.codfw.wmnet cluster on row A with 2 vCPUs, 8GB of RAM, 30GB of disk in the private network.
@RobH Remote IPMI was disabled on these hosts which popped up when i tried to run the reimage cookbook (to change software RAID level from 1 to 5) and it failed.
@QChris I added you to the nda group. You should now be able to login to Icinga.
Tue, May 19
Deleted it again for now because adding it before the wiki has been created cause more issues like the one linked above and then T253115.
@Xqt I have deleted "awa" from the wikistats db again. I will reopen the task to re-add it once it has actually been created.
I think the best solution is probably that i don't add sites before they are created and simply delete it for now.
done! the module has been removed
I did the latter and set total= and good= to 0 in the db. Did that fix it, @Xqt?
I could also just set "total" to 0 manually for now so that it has a number.
I did not even know "wikistats_tests" is a thing.
Is the requested hostname "homer" a copy/paste error?
This has happened. The module is gone now.
I think both Commons and Youtube, depending on licenses, would be good places to store the actual video. The wikiworkshop site could then link to them or embed them directly.
Mon, May 18
I fixed this by creating "role::simplelamp2" which uses mariadb instead of mysql. This has been applied on glampipe instead of the old simplelamp role and puppet runs fine again.
This is done.
Ok, resolving. Note: The thresholds are currently set to 240 hours (10 days) for WARN and 480 hours (20 days) for CRIT.
I think "how to handle Icinga warnings" is not something specific to this task about monitoring screens.