Change 247589 had a related patch set uploaded (by Ottomata):
Remove uid setting from file_mover user. enforce-users-groups-cleanup was removing this
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Oct 20 2015
Change 247587 had a related patch set uploaded (by Alex Monk):
Change star.wmflabs.org to beta certificate
Change 247584 merged by Ottomata:
Set file_mover primary gid by name rather than number
Change 247584 had a related patch set uploaded (by Ottomata):
Set file_mover primary gid by name rather than number
Hm, actually, let's make the gid for the file_mover user use the name rather than the gid. file_mover group exists as gid 997 on erbium.
Hm, no idea. Can we just do
I have renamed the table on all wikis to delete_user_daily_contribs; I will leave it as is for some time, will delete it afterwards after checking that no code is requiring it.
Seems ok now
Looks fixed, will leave it alone for a day or so to see if icinga state stabilizes, then un-downtime it for a day or two and see if we get alerts, then repool.
Yes, my initial debug rules simply used syslog, but for a more complete fleet-wide solution I was thinking of ulogd(2).
SSL OK - Certificate wikitech.wikimedia.org valid until 2016-01-25 04:44:13 +0000 (expires in 96 days)
Yeah that's a good idea. If we do, should we consider userspace logging with ulogd instead of spamming dmesg? That way we could potentially collect it in the future as well and e.g. detect anomalies.
done
Removed
Swapped the fiber and sfp+'s for Juniper 10G copper cable.
I made a certificate for beta on deployment-puppetmaster and replaced the star.wmflabs.org cert with it there (also had a mess around with some other settings to get it to work), then went and changed all the cache instances so they could get the new cert, start nginx and get puppet working again. We probably want to separate that cert from star.wmflabs.org so we can get the patch into the operations/puppet repository.
Change 247564 merged by Ottomata:
Deploy VarnishReqstats diamond collector on remaining cache hosts
[stat1002:~] $ id bd808
uid=3518(bd808) gid=500(wikidev) groups=500(wikidev),731(analytics-privatedata-users)
on stat1002:
Change 247295 merged by Dzahn:
admin: add bd808 to analytics-privatedata-users
Change 244610 merged by Faidon Liambotis:
wikitech: add SSL cert expiry monitoring
Change 247564 had a related patch set uploaded (by Ottomata):
Deploy VarnishReqstats diamond collector on remaining cache hosts
FWIW you should be able to obtain the same with a diamond collector and export data into graphite for graphing (and possibly alerting)
There are 30 million rows on these tables (on enwiki, fewer on the others). This makes this a slightly more complex issue due to potential impact on the 5.5 masters. We will have to do it slowly, specially for s1 and s3.
According to https://gerrit.wikimedia.org/r/#/c/246689/1/wmf-config/InitialiseSettings.php, this should have been enabled on all production wikis.
Change 247542 had a related patch set uploaded (by Jcrespo):
[WIP] Script to genereate openssh TLS keys for mysql replication
So, I have some questions here for the TLS experts:
In T50501#1669896, @Chmarkine wrote:Let's Encrypt provides free trusted(*) DV non-wildcard certs. We have 31 domains lists here. If you think it's plausible, we can obtain 31 certs (one for each domain) from Let's Encrypt at zero cost.
(*) They will have their CA certificate cross-signed by IdenTrust next month, so the certs they issued won't be trusted until then.
Do we really care of having status.wikimedia.org to be served over TLS? I am not sure it is worth it (and the price of a host cert), so I would rather disable HTTPS and just use http.
@Dzahn I do not see anything in a critical state - I suppose you meant the "lag" between the servers.
space there because it was reinstalled with a larger partition. sorry, I should have closed this. doing so now.
Well as those puppet runs and logs prove, user trebuchet belongs in group wikidev only for about 1 minute every 30. I think we can pretty safely conclude that it's not strictly needed.
Is this task being tracked on two tickets? Anyway, you can make the change as far as I'm concerned as long as the campaign names are included in the schema as before (unless there is some other way to map banners to campaigns in one of the other tables). Also, make sure to multiply the counts by 10 or what ever the rate ends up being used.
@VBaranetsky Hi, any response from Doneva yet?
group added, but needs to be added to hosts. and there is the question on https://gerrit.wikimedia.org/r/#/c/246850/
check for gerrit cert added:
made this an access-request
Change 244618 merged by Dzahn:
gerrit: add cert expiry check
Oct 19 2015
@akosiaris wanna review https://gerrit.wikimedia.org/r/#/c/244627/ ?
Change 246848 merged by Dzahn:
admin: add new group for datacenter ops
Change 247005 merged by Dzahn:
ntp: do not 'ensure latest'
These are identical to the other mw* systems in codfw, so it makes sense to simply append these to the end of the mw system range and use them. (I'll create the onsite tasks shortly, as well as the new dns entries for mgmt.)
What's still missing?
Indeed, thx for the linking (I've now resolved the second of the two due to link ;)
See also T115950 and https://gerrit.wikimedia.org/r/#/c/242127/
(See also T84823 and T84812)
Change 247481 had a related patch set uploaded (by Legoktm):
Send bounce and unsubscribe counts to graphite
We can send the data to graphite and put up some graphs on grafana.
Thank you!
Change 247480 had a related patch set uploaded (by Alex Monk):
Removed mgmt DNS for virt20[0-1][1-9], pc200[1-3], labsdb200[1-3] and WMF5709
looks like it was reinstalled, yep
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 59G 25G 32G 44% /
..
dataset1001.wikimedia.org:/data 57T 40T 17T 71% /mnt/data
Over 2 years later, and we still have pages like status.watchmouse.com giving
I see enough disk space on snapshot1001 now. Don't know how it was resolved, but it looks it is.
@Andrew Did that answer the question or should the ticket stay open?