Page MenuHomePhabricator

jbond (John Bond)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Jan 7 2019, 1:06 PM (15 w, 11 h)
Availability
Available
IRC Nick
jbond42
LDAP User
Jbond
MediaWiki User
JBond (WMF) [ Global Accounts ]

Recent Activity

Thu, Apr 18

jbond added a comment to T221265: Discussion: Explore push notifications options .

I have some doubts/questions about the specific choice of Prowl.

For the record im not stuck with this as a solution its just something i have used before

Thu, Apr 18, 1:09 PM · Operations
jbond added a comment to T221343: puppet fails to run in cp1008 under certain conditions.

I have noticed you can tirgger this bug by using a locale not present on the server

Thu, Apr 18, 10:27 AM · Packaging, Puppet, Operations
jbond added a comment to T221343: puppet fails to run in cp1008 under certain conditions.

I had a quick look at this and was unable to recreate it, i did come across the following though and wonder if the work around there may work

Thu, Apr 18, 10:18 AM · Packaging, Puppet, Operations
jbond closed T216995: Off board Adam Wight as Resolved.
Thu, Apr 18, 10:01 AM · Patch-For-Review, WMF-NDA-Requests
jbond closed T216995: Off board Adam Wight, a subtask of T216425: Volunteer NDA for AWight, as Resolved.
Thu, Apr 18, 10:00 AM · WMF-NDA-Requests
jbond closed T216425: Volunteer NDA for AWight as Resolved.

congratulations, ill close this ticket now

Thu, Apr 18, 10:00 AM · WMF-NDA-Requests

Wed, Apr 17

jbond updated the task description for T221265: Discussion: Explore push notifications options .
Wed, Apr 17, 5:11 PM · Operations
jbond triaged T221265: Discussion: Explore push notifications options as Normal priority.
Wed, Apr 17, 5:00 PM · Operations
jbond added a comment to T220987: Ferm: send ferm/iptables/ulogd logs to Kafaka/logstash/elasticsearch.

sounds good

Wed, Apr 17, 3:50 PM · Wikimedia-Logstash, Security, Operations
jbond created P8413 (An Untitled Masterwork).
Wed, Apr 17, 3:12 PM
jbond triaged T221226: Create canary roles for all canaries as Low priority.
Wed, Apr 17, 12:07 PM · Patch-For-Review, Puppet, Operations
jbond created P8412 (An Untitled Masterwork).
Wed, Apr 17, 10:48 AM

Tue, Apr 16

jbond updated the task description for T221083: puppet fact: migrate away from the uniqueid fact.
Tue, Apr 16, 7:03 PM · Puppet, Operations
jbond updated the task description for T221083: puppet fact: migrate away from the uniqueid fact.
Tue, Apr 16, 7:02 PM · Puppet, Operations
jbond created P8409 lookup.
Tue, Apr 16, 5:08 PM
jbond created P8405 raid.
Tue, Apr 16, 3:04 PM
jbond created T221083: puppet fact: migrate away from the uniqueid fact.
Tue, Apr 16, 1:35 PM · Puppet, Operations
jbond added a comment to T219803: upgrade facter and puppet across the fleet.

uniqueid fact is also missing

Tue, Apr 16, 12:29 PM · Patch-For-Review, Packaging, Puppet, Operations

Mon, Apr 15

jbond added a comment to T220987: Ferm: send ferm/iptables/ulogd logs to Kafaka/logstash/elasticsearch.

Its also worth pointing out that ulogd supports native json output[1] but only to a separate log file not syslog. Prometheus is was simpler to keep things raw and in (r)syslog which may be the case here as well. In relation to writing our own grok rules i suspect the mtail patterns will be a good starting point[2]

Mon, Apr 15, 6:44 PM · Wikimedia-Logstash, Security, Operations
jbond updated the task description for T220987: Ferm: send ferm/iptables/ulogd logs to Kafaka/logstash/elasticsearch.
Mon, Apr 15, 2:32 PM · Wikimedia-Logstash, Security, Operations
jbond triaged T220987: Ferm: send ferm/iptables/ulogd logs to Kafaka/logstash/elasticsearch as Normal priority.
Mon, Apr 15, 2:32 PM · Wikimedia-Logstash, Security, Operations

Fri, Apr 12

jbond added a comment to T220787: Fix RAID handler alert and puppet facter to work with Gen10 hosts and ssacli tool.

I have created a series of changes starting with 503332 which adds ssacli to the raids array fact if a "Smart Storage PQI 12G SAS/PCIe 3" devices is detected. This is based on the pci-id [1].

Fri, Apr 12, 1:27 PM · Patch-For-Review, Operations, Icinga, monitoring

Thu, Apr 11

Krenair awarded T213546: Prepare puppet infrastructure for Debian buster a Like token.
Thu, Apr 11, 4:07 PM · Patch-For-Review, Packaging, Puppet, Operations

Wed, Apr 10

jbond closed T220003: Add security apt security suites to pbuilder base images as Resolved.
Wed, Apr 10, 12:11 PM · Patch-For-Review, Packaging, Operations

Tue, Apr 9

jbond added a comment to T219803: upgrade facter and puppet across the fleet.

below is a diff between facter2 and facter3. most things are the same but there are a few things which are different, of course there are many more new structured facts.

Tue, Apr 9, 12:53 PM · Patch-For-Review, Packaging, Puppet, Operations

Mon, Apr 8

jbond added a comment to T220377: Check PPI leftovers - awight.

@elukey he is remaining as a volunteer so i agree this probably doesn't need an action. however im not familiar enough with HDFS/PPI stuff to know if there is a difference between the WMF and NDA groups. so thought i would raise this ticket to check. If you are confident then please close the ticket and i will know for next time :)

Mon, Apr 8, 12:52 PM · Analytics, WMF-NDA-Requests
jbond triaged T220377: Check PPI leftovers - awight as Normal priority.
Mon, Apr 8, 12:27 PM · Analytics, WMF-NDA-Requests
jbond added a comment to T216995: Off board Adam Wight.

awight is remaining as a vluntear and has signed the NDA. I have created a subtask and tagged analytics to chgeck the dir for PPI stuff as per the off boarding doc[1]

Mon, Apr 8, 12:26 PM · Patch-For-Review, WMF-NDA-Requests
jbond created T220377: Check PPI leftovers - awight.
Mon, Apr 8, 12:24 PM · Analytics, WMF-NDA-Requests
jbond added a comment to T216425: Volunteer NDA for AWight.

hi adam,

Mon, Apr 8, 12:21 PM · WMF-NDA-Requests

Fri, Apr 5

jbond added a comment to T219803: upgrade facter and puppet across the fleet.

@CDanis thanks i will try a patch with some of the other maps however the problem is that the std::unsorted_map is available but it has a bug in the library[1] so the bug may trigger with them as well. Further i didn't think that the performance difference would be that noticeable in facter and as its only facter on jessie the risk is even smaller

Fri, Apr 5, 4:37 PM · Patch-For-Review, Packaging, Puppet, Operations
jbond added a comment to T219803: upgrade facter and puppet across the fleet.

A bit more to the picture, managed to get facter to build by updating all refrence of std::unordered_map to std::map. however im now getting the following test faliures

Fri, Apr 5, 11:05 AM · Patch-For-Review, Packaging, Puppet, Operations
jbond added a comment to T166066: Integrate the puppet compiler in the puppet CI pipeline.

In addition, Jenkins doesn't seem to like having more than Change-id and Bug in the footer:

Fri, Apr 5, 9:47 AM · Puppet, puppet-compiler, Release-Engineering-Team (Watching / External), Operations

Thu, Apr 4

jbond closed T219333: apt-get update broken on jessie: jessie-updates and jessie-backports removed by Debian as Resolved.
Thu, Apr 4, 2:03 PM · Patch-For-Review, Operations
jbond triaged T220003: Add security apt security suites to pbuilder base images as Normal priority.
Thu, Apr 4, 2:02 PM · Patch-For-Review, Packaging, Operations
jbond added a comment to T220003: Add security apt security suites to pbuilder base images .

we will also need to configure http proxy for the security updates

Thu, Apr 4, 9:08 AM · Patch-For-Review, Packaging, Operations

Wed, Apr 3

jbond added a comment to T219803: upgrade facter and puppet across the fleet.

both rapidjson and catch build find using libc++ however even using theses packages we get the above error

Wed, Apr 3, 8:58 PM · Patch-For-Review, Packaging, Puppet, Operations
jbond added a comment to T219803: upgrade facter and puppet across the fleet.

Looks like we may need to rebuild everything with stdc++[1] already tried leatherman and get a simlar errors pointing to relating to boost, hopefully we dont need to rebuild boost

Wed, Apr 3, 4:10 PM · Patch-For-Review, Packaging, Puppet, Operations
jbond added a comment to T219803: upgrade facter and puppet across the fleet.

have created https://phabricator.wikimedia.org/T220003

Wed, Apr 3, 3:22 PM · Patch-For-Review, Packaging, Puppet, Operations
jbond created T220003: Add security apt security suites to pbuilder base images .
Wed, Apr 3, 3:22 PM · Patch-For-Review, Packaging, Operations
jbond added a comment to T219803: upgrade facter and puppet across the fleet.

Clang-4.0 is provided by security jessie/updates and have managed to get pbuilder working by adding the following

Wed, Apr 3, 3:05 PM · Patch-For-Review, Packaging, Puppet, Operations
jbond created P8334 (An Untitled Masterwork).
Wed, Apr 3, 11:40 AM

Tue, Apr 2

jbond added a comment to T219803: upgrade facter and puppet across the fleet.

thanks to moritz all dependencies have been built but now getting the following error while building facter

Tue, Apr 2, 6:35 PM · Patch-For-Review, Packaging, Puppet, Operations
jbond added a subtask for T184564: Plan Puppet 5 upgrade: T219803: upgrade facter and puppet across the fleet.
Tue, Apr 2, 12:01 PM · Puppet, Operations
jbond added a parent task for T219803: upgrade facter and puppet across the fleet: T184564: Plan Puppet 5 upgrade.
Tue, Apr 2, 12:01 PM · Patch-For-Review, Packaging, Puppet, Operations
jbond added a comment to T219803: upgrade facter and puppet across the fleet.

Ok I hit another road block. leatherman depends on debhelper 11. I manully updated debian/compat and debian/control to try and build with debhelper 10 .
The first build lead to the following error

Tue, Apr 2, 11:50 AM · Patch-For-Review, Packaging, Puppet, Operations

Mon, Apr 1

jbond added a comment to T219803: upgrade facter and puppet across the fleet.
  1. notes building facter3 for debian:
Mon, Apr 1, 3:59 PM · Patch-For-Review, Packaging, Puppet, Operations
jbond triaged T219803: upgrade facter and puppet across the fleet as Normal priority.
Mon, Apr 1, 3:55 PM · Patch-For-Review, Packaging, Puppet, Operations
jbond created T219803: upgrade facter and puppet across the fleet.
Mon, Apr 1, 3:54 PM · Patch-For-Review, Packaging, Puppet, Operations
jbond added a comment to T219280: sort out jessie vs jesse-backports vs openssl pinning issue .

Is this still an issue, the openssl package is provided via jessi-wikimedia/main

Mon, Apr 1, 10:16 AM · fundraising-tech-ops

Wed, Mar 27

jbond added a comment to T219333: apt-get update broken on jessie: jessie-updates and jessie-backports removed by Debian.

once the jessie-backport repos has been remove i suggest running the following on cumin

Wed, Mar 27, 1:23 PM · Patch-For-Review, Operations
jbond claimed T219333: apt-get update broken on jessie: jessie-updates and jessie-backports removed by Debian.
Wed, Mar 27, 9:58 AM · Patch-For-Review, Operations

Mar 20 2019

jbond added a comment to T212774: Upgrade jenkins-debian-glue to v0.20.0.

patch is merged, let me know if there is anything elses from my side

Mar 20 2019, 12:01 PM · Release-Engineering-Team (Kanban), Patch-For-Review, Operations, Packaging, Continuous-Integration-Infrastructure

Mar 15 2019

jbond created P8208 (An Untitled Masterwork).
Mar 15 2019, 4:19 PM
jbond closed T217646: wmf-auto-restart occasionally errors on fuse mounts, a subtask of T132324: Tracking and Reducing cron-spam to root@ , as Resolved.
Mar 15 2019, 10:51 AM · Patch-For-Review, Operations
jbond closed T217646: wmf-auto-restart occasionally errors on fuse mounts as Resolved.
Mar 15 2019, 10:51 AM · Patch-For-Review, Operations
jbond added a comment to T217646: wmf-auto-restart occasionally errors on fuse mounts.

This has been deployed to all nodes which mount HDFS.

Mar 15 2019, 10:51 AM · Patch-For-Review, Operations

Mar 14 2019

jbond added a comment to T98006: Anycast (Auth)DNS.

Thanks for the response

Mar 14 2019, 10:49 AM · Performance-Team (Radar), Patch-For-Review, netops, Operations, Traffic

Mar 13 2019

jbond added a comment to T98006: Anycast (Auth)DNS.

Some comments for consideration

Mar 13 2019, 6:53 PM · Performance-Team (Radar), Patch-For-Review, netops, Operations, Traffic
jbond added a comment to T125170: Internal DNS resolver responds with NXDOMAIN for localhost AAAA.

FYI this got fixed in pdns-recursor version 4.1.0 (its actully in 4.1.0-alpha1) via https://github.com/PowerDNS/pdns/pull/5223

Mar 13 2019, 3:50 PM · Traffic, Patch-For-Review, DNS, Operations
jbond added a comment to T212219: wmf-auto-restart fails on certain legacy services.

didn't notice it wasn't merged :)

Mar 13 2019, 3:02 PM · Patch-For-Review, Operations
jbond added a comment to T212219: wmf-auto-restart fails on certain legacy services.

Was this resolved with moritz's patch. if not i came across a similar issue and created the following patch https://gerrit.wikimedia.org/r/c/operations/puppet/+/494210

Mar 13 2019, 2:57 PM · Patch-For-Review, Operations
jbond added a comment to T153940: Logrotate fails for: "$FILE No such file or directory".

I had a variant of this during clinic

Mar 13 2019, 2:41 PM · Patch-For-Review, Operations

Mar 12 2019

jbond created T218117: Request spare systems to test ipmi password reset cookbook.
Mar 12 2019, 3:57 PM · hardware-requests, DC-Ops, Operations
jbond added a comment to T212774: Upgrade jenkins-debian-glue to v0.20.0.

@hashar i have taken another look at the patch i created yesterday and i now think it is incorrect. As far as i can tell profile::ci::package_builder which is the only class that attempts to install jenkins-debian-glue is never run on any server. profile::ci::package_builder is included in role::ci::slave::labs however

Mar 12 2019, 11:53 AM · Release-Engineering-Team (Kanban), Patch-For-Review, Operations, Packaging, Continuous-Integration-Infrastructure

Mar 11 2019

jbond created P8178 (An Untitled Masterwork).
Mar 11 2019, 5:17 PM
jbond created P8177 (An Untitled Masterwork).
Mar 11 2019, 5:10 PM
jbond added a comment to T212774: Upgrade jenkins-debian-glue to v0.20.0.

i have built this and added it to {jessie,stretch}-wikimedia in component/ci. It appears that the CI servers did not have components/ci configured so i have also created a change to add the repo to those servers. As far as i can tell this only needs to go to contint2001.wikimedia.org and contint1001.wikimedia.org, which both run Jessie. Have i missed some nodes which may also need this repo?

Mar 11 2019, 1:19 PM · Release-Engineering-Team (Kanban), Patch-For-Review, Operations, Packaging, Continuous-Integration-Infrastructure

Mar 7 2019

jbond triaged T217758: prometheus-openldap-exporter: Request.write called on a request after Request.finish was called as Normal priority.
Mar 7 2019, 1:21 PM · LDAP, monitoring, Operations
jbond triaged T120085: Serve Main Page of WMF wikis from a consistent URL as Normal priority.
Mar 7 2019, 1:21 PM · Core Platform Team Backlog (Watching / External), Performance-Team, Operations, Traffic, TechCom-RFC, SEO, Wikimedia-Site-requests
jbond triaged T217813: Grant root on MediaWiki maintenance hosts to perf-roots as Normal priority.
Mar 7 2019, 11:13 AM · Patch-For-Review, Operations, SRE-Access-Requests
jbond moved T217813: Grant root on MediaWiki maintenance hosts to perf-roots from Untriaged to Manager/NDA Approval/Confimation on the SRE-Access-Requests board.
Mar 7 2019, 11:12 AM · Patch-For-Review, Operations, SRE-Access-Requests
jbond updated subscribers of T217813: Grant root on MediaWiki maintenance hosts to perf-roots.

@kchapman can you approve this request

Mar 7 2019, 11:12 AM · Patch-For-Review, Operations, SRE-Access-Requests
jbond updated the task description for T217813: Grant root on MediaWiki maintenance hosts to perf-roots.
Mar 7 2019, 10:54 AM · Patch-For-Review, Operations, SRE-Access-Requests
jbond updated the task description for T217813: Grant root on MediaWiki maintenance hosts to perf-roots.
Mar 7 2019, 10:54 AM · Patch-For-Review, Operations, SRE-Access-Requests
jbond updated the task description for T217813: Grant root on MediaWiki maintenance hosts to perf-roots.
Mar 7 2019, 10:52 AM · Patch-For-Review, Operations, SRE-Access-Requests

Mar 6 2019

jbond added a comment to T151304: tmpreaper possible race condition.

when monitoring the tmp dir i see many short lived tmp files and a few long long lived files. running lsof shows that the short lived files get created by hhvm (at least on mw1347). My assumption is that the long lived ones where also created by hhvm and is theses files i.e. the ones created on Mar 2 in the output below which are causing the problem, however i'm unsure where the race condition is. tmpreaper is configured to only remove files with a ctime > 7days with a 256 second splay. My best guess is that something in media wiki (or something else although i check all the other cron jobs and couldn't see anything) is also reaping theses files after 7 days. if its doing so without the 256 second splay it would account for the somewhat spasmodic nature of our alerts

Mar 6 2019, 9:01 PM · Operations
jbond added a comment to T217758: prometheus-openldap-exporter: Request.write called on a request after Request.finish was called.

i suspect this is related to T130593: investigate slapd memory leak.

Mar 6 2019, 6:28 PM · LDAP, monitoring, Operations
jbond added a subtask for T130593: investigate slapd memory leak: T217758: prometheus-openldap-exporter: Request.write called on a request after Request.finish was called.
Mar 6 2019, 6:27 PM · LDAP, cloud-services-team (Kanban), Operations, Cloud-VPS
jbond added a parent task for T217758: prometheus-openldap-exporter: Request.write called on a request after Request.finish was called: T130593: investigate slapd memory leak.
Mar 6 2019, 6:27 PM · LDAP, monitoring, Operations
jbond placed T217280: LDAP server running out of memory frequently and disrupting Cloud VPS clients up for grabs.
Mar 6 2019, 6:06 PM · cloud-services-team (Kanban), Patch-For-Review, Operations, Cloud-VPS, LDAP, Toolforge
jbond closed T217736: Please create an All Affiliates mailing list as Resolved.

Here are URLs for listinfo, admin. As mentioned the list password has already been sent to the initial list administrator's email address, we recommend they communicate the password with the secondary list administrator.

Mar 6 2019, 11:54 AM · Operations, Wikimedia-Mailing-lists
jbond claimed T217736: Please create an All Affiliates mailing list.
Mar 6 2019, 11:32 AM · Operations, Wikimedia-Mailing-lists
jbond added a comment to T217736: Please create an All Affiliates mailing list.

Hi Erica, As i think you have noticed the list has been created and you should have revived the admin password. I'm just holding of resolving the ticket as this is the first time i have created a list and just wanted someone else (likely @herron ) to validate my work and ensure i have not made a mistake

Mar 6 2019, 11:31 AM · Operations, Wikimedia-Mailing-lists
jbond triaged T217679: Graphite returning server errors (out of memory?) as High priority.
Mar 6 2019, 10:43 AM · Patch-For-Review, Operations, Graphite
jbond closed T217447: Add bmansurov to archiva-deployers LDAP group, a subtask of T210844: Generate article recommendations in Hadoop for use in production, as Resolved.
Mar 6 2019, 10:41 AM · Patch-For-Review, Analytics-Kanban, Article-Recommendation, Research, Analytics
jbond closed T217447: Add bmansurov to archiva-deployers LDAP group as Resolved.

This should be in place now, let me know if there are any issues

Mar 6 2019, 10:41 AM · LDAP-Access-Requests, Operations
jbond reopened T217280: LDAP server running out of memory frequently and disrupting Cloud VPS clients as "Open".

sorry updated wrong ticket

Mar 6 2019, 10:40 AM · cloud-services-team (Kanban), Patch-For-Review, Operations, Cloud-VPS, LDAP, Toolforge
jbond closed T217280: LDAP server running out of memory frequently and disrupting Cloud VPS clients as Resolved.

This should be in place now, let me know if there are any issues

Mar 6 2019, 10:40 AM · cloud-services-team (Kanban), Patch-For-Review, Operations, Cloud-VPS, LDAP, Toolforge
jbond added a comment to T217646: wmf-auto-restart occasionally errors on fuse mounts.

the last option excludes mount points which would work for this case. As far as i can see you can only remove directories from the output which wouldn't stop the warning from triggering

Mar 6 2019, 10:34 AM · Patch-For-Review, Operations

Mar 5 2019

jbond closed T215275: Improve CI checks to cover more of the code base as Resolved.
Mar 5 2019, 3:05 PM · Patch-For-Review, Continuous-Integration-Config, Puppet, Operations
jbond updated subscribers of T217447: Add bmansurov to archiva-deployers LDAP group.

@Reedy thanks was going from https://office.wikimedia.org/wiki/Contact_list, i need authorization from bmansurov manager. Looking at https://wikimediafoundation.org/role/staff-contractors/ im guessing that would be @leila

Mar 5 2019, 1:32 PM · LDAP-Access-Requests, Operations
jbond added a comment to T216425: Volunteer NDA for AWight.

@RStallman-legalteam did you receive a response? I don't see them in the spread sheet

Mar 5 2019, 1:28 PM · WMF-NDA-Requests
jbond updated subscribers of T217447: Add bmansurov to archiva-deployers LDAP group.

@DarTar can you please authorize this request

Mar 5 2019, 1:25 PM · LDAP-Access-Requests, Operations
jbond triaged T217557: Socket timeout on wdqs.svc.eqiad.wmnet as Normal priority.
Mar 5 2019, 1:24 PM · Wikidata, Operations, Wikidata-Query-Service, Discovery-Wikidata-Query-Service-Sprint
jbond triaged T217407: Indexing of https://www.wikidata.org in the Yandex Search Engine as Normal priority.
Mar 5 2019, 1:09 PM · Operations, Traffic
jbond closed T217457: Intermittent slowness on gerrit as Resolved.

paladox confirmed via IRC this can be closed

Mar 5 2019, 1:04 PM · Operations, Gerrit
jbond closed T217247: Close the grwp-wici mailing list as Resolved.

This list has now been removed

Mar 5 2019, 12:09 PM · Operations, Wikimedia-Mailing-lists
jbond added a comment to T217646: wmf-auto-restart occasionally errors on fuse mounts.

This is reproducible but not reliably, some file operation taking part on fuse e.g. ls -la /mnt/hdfs/tmp seem to cause lsof to fail. its is almost certainly to do with hdfs fuse stability issues. I think we could remove this noise with any of the following options

Mar 5 2019, 11:57 AM · Patch-For-Review, Operations
jbond triaged T217646: wmf-auto-restart occasionally errors on fuse mounts as Normal priority.
Mar 5 2019, 11:29 AM · Patch-For-Review, Operations