User Details
- User Since
- Apr 13 2015, 10:09 PM (518 w, 5 d)
- Roles
- Disabled
- LDAP User
- Unknown
- MediaWiki User
- MViswanathan (WMF) [ Global Accounts ]
Apr 24 2018
Apr 20 2018
Apr 17 2018
@Ottomata Thanks for fixing up the rsync jobs! Can we close this task now?
Apr 13 2018
Apr 11 2018
@hoo, alright thanks, feel free to ping me or the team if you need to reenable nfs for some reason!
Apr 10 2018
The rsync config that allows old style @ezachte to sync to labstore1006 & 7 already exist. We haven't talked about switching on the old setup for the new servers since I thought the jobs are being changed so we can sync from stat1005. I'm ready to turn on the rsync jobs whenever, but I don't see the data in /srv/dumps yet either.
Apr 6 2018
Fixed by running sudo exportfs -ra on the nfs servers and remounting on notebook*.
Apr 4 2018
This is all done. Leaving it open until all existing connections to dataset1001 drop off and we stop the web server there.
Notes from migration etherpad:
Y'all I'd like to gently point out the primary goal here - we want the rsyncs to happen on the labstores and not from stat1005. To that end, I'm just looking for a directory(ies) to pull from on stat1005. I think we've all agreed on /srv/dumps as the container at least once. The directory already exists. /srv/public-other seems even more generic to me. I'm happy to add a README to /srv/dumps that says this is the container directory for things are shipped to the dumps distribution servers.
Let's just go with /srv/dumps since we already have that set up then.
Apr 3 2018
Apr 2 2018
You'd need to apply class https://github.com/wikimedia/puppet/blob/production/modules/statistics/manifests/dataset_mount.pp, and add the servers to https://github.com/wikimedia/puppet/blob/production/hieradata/common/profile/dumps/distribution.yaml#L14 (nfs_clients), to get this to work.
Nothing to actively do here, we've let the mirrors know that they should use the dumps.wikimedia.org url and that dataset1001 and it's associated IPs will be going away after the switch over.
@ezachte Hello, after chatting with Andrew a bit, here's the direction we have in mind (pretty similar to what we talked about with some naming adjustments)
This went well. Clean up task pending: Remove the dumps NFS export from dataset1001
This went pretty well! Todos for clean up:
Notes from migration plan doc:
Notes from migration plan:
From comparing the 2 kernels for NFSd and raw disk performance, I can see that there's a small loss in performance on both reads and writes in the new Spectre kernel. Looking at the load graphs from the fio tests, there's no significant difference in how the kernels perform under heavy load. These patterns don't seem similar to what we saw when we upgraded labstore1004 & 5 to 4.9 kernels, and my suspicion is nfs isn't the issue there.
@Ottomata Thanks so much, /srv/wikistats_1 seems fine. There's also media and pagecounts-ez, cool to have those at the top level in /srv too?
Update based on my discussion with @ezachte over email:
For the web service migration, broader email blast:
Running some load/performance tests. All tests from local machine.
To failover between the two labstores for webservice:
Mar 30 2018
I also ran various tests using fio across the 2 kernels over NFSd - https://tools.wmflabs.org/labstore-profiling/
Reporting back here on what I found
Mar 29 2018
Mar 28 2018
Mar 27 2018
@Nehajha Thanks, this looks good to me. Good luck!
@djff This looks great! Good luck :)
Mar 26 2018
I've reviewed and +1-ed the microtask. Thank you! Looking forward to seeing your proposal.
Mar 25 2018
@djff Hello! If you are having trouble with the microtask, do ask us questions at #wikimedia-cloud on IRC. Looking forward to your patch and proposal!
@APerson Hello! If you are having trouble with the microtask, do ask us questions at #wikimedia-cloud on IRC. Looking forward to your patch and proposal!
I've +1-ed the patch and resolving this task! /me waves at @Nehajha, looking forward to your proposal, do hang out at #wikimedia-cloud and ask questions if anything comes up.
Mar 21 2018
@Legoktm cool! Thanks for weighing in. Looks like we're good to continue deprecating serving these from the servers then.
Mar 20 2018
Note: Slowparse in other/ logs are being deprecated (T189284)
Update: I've removed all rsync related jobs and code from puppet on both dumps servers and mwlog servers. To do: stop serving at https://dumps.wikimedia.org/other/slow-parse/, and cleanup existing data from other/ on the dumps servers.
Mar 19 2018
Mar 16 2018
Also pinging @Krinkle
Chatted with @Ottomata today in #wikimedia-analytics, and we decided to use a similar strategy for the stat/notebook mounts. We'll mount shares from labstore1006/7 in /mnt, and symlink the active NFS one to /mnt/data (which is the current access point for stat users).
Initial PoC patch for nfsclient.pp changes https://gerrit.wikimedia.org/r/#/c/403767/1
@Volans Indeed, I fixed up the script based on the comments, we can close this task when the patch is merged! Thank you
I think the firewall stuff already exists now, we can do this as part of the dumps NFS migration on April 2, and have the shares available from labstore1006|7 on notebook*. I added T188644 as a parent task.
@Ottomata Hey! Do you know anything about these logs? :) I'd like to make it so that we can fetch from the mwlog server when we move to the new dumps set up.
Mar 14 2018
Mar 13 2018
@Kolossos I see utilization has climbed up again to over 600G. How can we ensure we don't have to keep making these tickets to clean up? We are happy to help figure out long term strategies!
Resolving this for now. This project still has high utilization, albeit less than before. We can discuss strategies to mitigate in T159930.