Page MenuHomePhabricator

jbond (John Bond)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Wednesday

  • Clear sailing ahead.

User Details

User Since
Jan 7 2019, 1:06 PM (36 w, 6 d)
Availability
Available
IRC Nick
jbond42
LDAP User
Jbond
MediaWiki User
JBond (WMF) [ Global Accounts ]

Recent Activity

Fri, Sep 20

jbond added a comment to T232489: Request access to 'deployment' user group for phedenskog.

@Peter this has been deployed now, please allow upto 30 minutes for the change to fully propagate, if you are still seeing issues after that please reopen this ticket

Fri, Sep 20, 8:44 AM · Operations, SRE-Access-Requests, Performance-Team

Wed, Sep 18

jbond added a comment to T233203: Analyses octocatalog-diff output.

nodes with changes

Wed, Sep 18, 12:22 PM · Patch-For-Review, Puppet
jbond updated the task description for T233203: Analyses octocatalog-diff output.
Wed, Sep 18, 10:54 AM · Patch-For-Review, Puppet
jbond added a comment to T233203: Analyses octocatalog-diff output.

there are a few nodes which have diffes like the following. This is caused when an attribute has a value of undef

Wed, Sep 18, 10:30 AM · Patch-For-Review, Puppet
jbond created T233203: Analyses octocatalog-diff output.
Wed, Sep 18, 10:28 AM · Patch-For-Review, Puppet

Mon, Sep 16

jbond created P9109 (An Untitled Masterwork).
Mon, Sep 16, 4:22 PM
jbond closed T232609: Reset inactive admin of offline-l mailing list as Resolved.

Hello All,

Mon, Sep 16, 10:24 AM · Operations, Wikimedia-Mailing-lists

Fri, Sep 13

jbond created P9103 502.txt.
Fri, Sep 13, 12:03 PM
jbond created P9102 mkaps1003.
Fri, Sep 13, 11:49 AM
jbond created P9101 /srv/log/kartotherian/syslog.log.
Fri, Sep 13, 11:33 AM
jbond created P9100 (An Untitled Masterwork).
Fri, Sep 13, 10:32 AM
jbond created P9099 (An Untitled Masterwork).
Fri, Sep 13, 10:28 AM
jbond created P9098 maps1003.
Fri, Sep 13, 8:54 AM

Thu, Sep 12

jbond added a comment to T232177: Please create engprod@lists.wikimedia.org.

wiki has been updated https://wikitech.wikimedia.org/w/index.php?title=Mailman&type=revision&diff=1837413&oldid=1826755

Thu, Sep 12, 11:16 AM · Operations, Wikimedia-Mailing-lists
jbond triaged T232654: eqiad: three clouvirt-wdqs servers for WDQS testing as Normal priority.
Thu, Sep 12, 11:08 AM · hardware-requests, Operations
jbond triaged T175691: Geoip lookup - Misidentifying country due to travelling as Normal priority.
Thu, Sep 12, 11:07 AM · Operations, Traffic, FR-Q2-FY2019-20-cleanup-list, Fundraising-Backlog, MediaWiki-extensions-CentralNotice
jbond triaged T232679: Images served with text/html content type as Normal priority.
Thu, Sep 12, 11:06 AM · Traffic, Operations, Analytics
jbond triaged T232711: Deploy ripe-atlas-tools for ad-hoc network tests as Normal priority.
Thu, Sep 12, 11:06 AM · Operations, netops, observability
jbond added a comment to T232711: Deploy ripe-atlas-tools for ad-hoc network tests.

I think this is a great idea. As to which host, the cumin server makes sense to me or perhaps bastion? The user is a bit of a pain, it would be nice if we could have a global config e.g. /etc/ripe-atlas-tools/config but a quick look at the code suggests all config is per user. Long term it would be nice to send a PR to add support for a global config, short term perhaps use the user atlas. If we install the the ripe-atlas package with either --user or in a virtualenv we can then create some wrapper scripts e.g.

Thu, Sep 12, 9:47 AM · Operations, netops, observability
jbond closed T232476: LDAP access to the wmf group for Alex Hollender as Resolved.

@alexhollender i have no added your wmf access to your ldap account please let me know if you are still unable to access resources

Thu, Sep 12, 9:17 AM · LDAP-Access-Requests

Wed, Sep 11

jbond triaged T232609: Reset inactive admin of offline-l mailing list as Normal priority.
Wed, Sep 11, 3:02 PM · Operations, Wikimedia-Mailing-lists
jbond triaged T232489: Request access to 'deployment' user group for phedenskog as Normal priority.
Wed, Sep 11, 2:13 PM · Operations, SRE-Access-Requests, Performance-Team
jbond triaged T232617: BGP sessions down on cr2-esams as Normal priority.
Wed, Sep 11, 2:13 PM · Operations, netops
jbond added a comment to T232177: Please create engprod@lists.wikimedia.org.

Hi Greg,

Wed, Sep 11, 12:46 PM · Operations, Wikimedia-Mailing-lists
jbond updated subscribers of T232489: Request access to 'deployment' user group for phedenskog.

@greg are you able to approve this access request

Wed, Sep 11, 12:40 PM · Operations, SRE-Access-Requests, Performance-Team
jbond assigned T232591: helium array has slot 3 disk failed to Cmjohnson.
Wed, Sep 11, 12:15 PM · ops-eqiad, Operations
jbond triaged T232591: helium array has slot 3 disk failed as Normal priority.
Wed, Sep 11, 12:10 PM · ops-eqiad, Operations

Tue, Sep 10

jbond added a comment to T232476: LDAP access to the wmf group for Alex Hollender.

@alexhollender what is your shell username i don't see anything obvious

Tue, Sep 10, 2:43 PM · LDAP-Access-Requests
jbond added a comment to T232178: Please create private "testeng" team mailing list.

@zeljkofilipin I have [now] set the subscription model to Require approval. I leave it to the admins to change the other privacy settings. @Jrbranaa should have recived the admin password via email so should be able to share that with you

Tue, Sep 10, 11:59 AM · Release-Engineering-Team-TODO (201909), Operations, Wikimedia-Mailing-lists
jbond triaged T232417: mass Yahoo / AOL bounces mailman as Normal priority.
Tue, Sep 10, 10:10 AM · Mail, Operations, Wikimedia-Mailing-lists
jbond triaged T232362: Massmessages not going through, log looks fine as Normal priority.
Tue, Sep 10, 10:09 AM · Core Platform Team Workboards (Clinic Duty Team), WMF-JobQueue, Operations, MassMessage
jbond triaged T232343: Consider Postfix as MTA for our MXes (and OTRS/Mailman/Phab) as Normal priority.
Tue, Sep 10, 10:08 AM · Mail, Operations

Mon, Sep 9

jbond triaged T232322: labspuppetmaster1001 puppet-merge failing as Normal priority.
Mon, Sep 9, 11:00 AM · Puppet, Operations, cloud-services-team
jbond triaged T232310: Integrate Buster 10.1 point update as Normal priority.
Mon, Sep 9, 10:39 AM · Operations
jbond closed T232178: Please create private "testeng" team mailing list as Resolved.

Hello Jean-Rene,

Mon, Sep 9, 10:38 AM · Release-Engineering-Team-TODO (201909), Operations, Wikimedia-Mailing-lists
jbond closed T232177: Please create engprod@lists.wikimedia.org as Resolved.

Hello Greg,

Mon, Sep 9, 10:34 AM · Operations, Wikimedia-Mailing-lists
jbond triaged T232069: analytics1045 - RAID failure and /var/lib/hadoop/data/j can't be mounted as Normal priority.
Mon, Sep 9, 10:21 AM · ops-eqiad, DC-Ops, Analytics, Operations, Analytics-Cluster
jbond added a comment to T232069: analytics1045 - RAID failure and /var/lib/hadoop/data/j can't be mounted.

I tried running the followin command on the server however the Current Cache policy remains as WriteThrough

Mon, Sep 9, 10:21 AM · ops-eqiad, DC-Ops, Analytics, Operations, Analytics-Cluster
jbond triaged T232068: notebook1004 - /srv is full as Normal priority.
Mon, Sep 9, 9:43 AM · Operations, Analytics, Analytics-Cluster
jbond triaged T232006: LDF service does not Vary responses by Accept, sending incorrect cached responses to clients as Normal priority.
Mon, Sep 9, 9:37 AM · Patch-For-Review, Operations, Traffic, Wikidata, Wikidata-Query-Service
jbond triaged T231738: Server side upload failed with "overwriting failed (at recordUpload stage)" as Normal priority.
Mon, Sep 9, 9:30 AM · MediaWiki-Maintenance-scripts, media-storage, Operations
jbond added a comment to T231738: Server side upload failed with "overwriting failed (at recordUpload stage)".

@Urbanecm creating this un-triaged task with the Operations tag should be enough to bring it to the attention of the clinic duty, this task must have slipped through the gaps sorry about that. In relation to your specific issue. i just tried re-running your command at it worked without issue

Mon, Sep 9, 9:29 AM · MediaWiki-Maintenance-scripts, media-storage, Operations
jbond updated subscribers of T231616: Request access to Analytics cluster for Urbanecm.

@Nuria are you able to approve @Urbanecm access to researchers and analytics-privatedata-users

Mon, Sep 9, 9:22 AM · Patch-For-Review, Operations, SRE-Access-Requests
jbond triaged T231616: Request access to Analytics cluster for Urbanecm as Normal priority.
Mon, Sep 9, 9:17 AM · Patch-For-Review, Operations, SRE-Access-Requests
jbond triaged T231522: Two user pages on meta can't be rendered: "request has exceeded memory limit" as Normal priority.
Mon, Sep 9, 9:16 AM · MediaWiki-extensions-Babel, Operations
jbond triaged T227541: b6-eqiad pdu refresh (Tuesday 9/10 @11am UTC) as Normal priority.
Mon, Sep 9, 9:15 AM · DC-Ops, Operations, ops-eqiad
jbond triaged T227540: b4-eqiad pdu refresh (Thursday 10/24 @11am UTC) as Normal priority.
Mon, Sep 9, 9:15 AM · DC-Ops, Operations, ops-eqiad
jbond triaged T227539: b3-eqiad pdu refresh (Tuesday 9/17 @11am UTC) as Normal priority.
Mon, Sep 9, 9:15 AM · DC-Ops, Operations, ops-eqiad

Tue, Aug 27

jbond added a comment to T102099: Fix IPv6 autoconf issues once and for all, across the fleet..

Hi, I am bit disconnected about the planning of deployment of this- Once all hosts (or all hosts that are planned above being migrated, is the puppet line supposed to go on the profile (or role) or on base.pp with some exclussions? It is not clear based on the ticket description and comments, or I may have missed it as it is a long ticket :-D.

Tue, Aug 27, 9:25 AM · Patch-For-Review, Traffic, netops, Operations, IPv6

Aug 21 2019

jbond created P8955 celery crash.
Aug 21 2019, 11:51 AM

Aug 20 2019

jbond created P8937 test.
Aug 20 2019, 11:44 AM

Aug 16 2019

jbond created T230600: Investigate the potential benefits of BGPalerter .
Aug 16 2019, 9:04 AM · netops, Operations

Aug 15 2019

jbond added a comment to T230204: Add Clara Andrew-Wani to wmf ldap group.

When i started I believe someone from OIT created an office wiki page based of a standard template. There is also an Ops specific page which i think is referenced in the first wiki.

Aug 15 2019, 11:19 AM · LDAP-Access-Requests
jbond added a comment to T229621: Icinga check defined from LVS configuration for cloudelastic are borked.
Aug 15 2019, 11:18 AM · Patch-For-Review, Discovery-Search (Current work), Elasticsearch, Operations, Traffic

Aug 14 2019

jbond committed rLPRI96bd5c67e0dc: apereo_cas: add some content to the keystore (authored by jbond).
apereo_cas: add some content to the keystore
Aug 14 2019, 12:04 PM
jbond committed rLPRId6a9dc18d69f: apereo_cas: add empty keystore file (authored by jbond).
apereo_cas: add empty keystore file
Aug 14 2019, 12:00 PM

Aug 9 2019

jbond added a comment to T229621: Icinga check defined from LVS configuration for cloudelastic are borked.

When this alerted today and ended up going right down the rabbit hole. Anyway i think i have found out some information.

Aug 9 2019, 2:00 PM · Patch-For-Review, Discovery-Search (Current work), Elasticsearch, Operations, Traffic

Aug 8 2019

jbond added a comment to T230002: puppetdb queue size went up since July 30.

The lua hack seems to have worked. I have again updated the config to send some canary servers to puppetmaster1003. So far i have seen no errors in the puppetdb log. Further via puppetdb i can see that the puppet reports and facts are all refreshing.

Aug 8 2019, 4:00 PM · Patch-For-Review, Operations

Aug 7 2019

jbond added a comment to T230002: puppetdb queue size went up since July 30.

I have disabled puppetmaster1003 for now, unfortunately from reading the PUP-8901 It seems the advice from puppetlabs is to always upgrade all puppetmasters and the puppetdb in tandem

Aug 7 2019, 10:35 AM · Patch-For-Review, Operations
jbond added a subtask for T228657: Upgrade Puppet Masters and Puppet DB servers: T230002: puppetdb queue size went up since July 30.
Aug 7 2019, 10:24 AM · Patch-For-Review, Puppet
jbond added a parent task for T230002: puppetdb queue size went up since July 30: T228657: Upgrade Puppet Masters and Puppet DB servers.
Aug 7 2019, 10:24 AM · Patch-For-Review, Operations
jbond triaged T228657: Upgrade Puppet Masters and Puppet DB servers as Normal priority.
Aug 7 2019, 10:23 AM · Patch-For-Review, Puppet
jbond added a comment to T230002: puppetdb queue size went up since July 30.

It seems when that the severs using the new puppet master cause the following stack trace when they try reach the 'store report' phase

Aug 7 2019, 10:11 AM · Patch-For-Review, Operations
jbond added a comment to T230002: puppetdb queue size went up since July 30.

Seems to correlate well with when i enabled the canary puppet master and when i started adding more canary hosts

Aug 7 2019, 9:51 AM · Patch-For-Review, Operations

Aug 6 2019

jbond created P8872 apache.
Aug 6 2019, 3:19 PM
jbond created T229916: Create a cassandra.service which subsumes casandra-{a,b,c} services using PartsOf=cassandra.service .
Aug 6 2019, 11:15 AM · Cassandra, Operations

Aug 5 2019

jbond added a comment to T229861: Can't reach cloudelastic.wikimedia.org via IPv6.

@EBernhardson im not sure where the data above came from but i suspect a typo somewhere 2620:0:860::ed1a::1 dose not exist anywhere in the puppet repo. however 2620:0:860:ed1a::1 (notice the missing ':') is valid and is routed to the lvs eqiad servers. That said it does look like most lvs services are in the 2620:0:860:ed1a:: prefix an the servers i have found in the 2620:0:861:1:: prefix seem to be real machines so i suspect that the ipv6 address may be incorrect

Aug 5 2019, 6:21 PM · Operations, Traffic, Discovery-Search (Current work)
jbond created T229807: cookbook sre.elasticsearch.rolling-restart failed with cluster relforge.
Aug 5 2019, 11:49 AM · Discovery-Search (Current work), Operations, SRE-tools, Elasticsearch
jbond added a comment to T102099: Fix IPv6 autoconf issues once and for all, across the fleet..

Thanks brandon, Ill take a look at removing the ones SLAAC addresses from puppet this week. One of them, at least, was added by me and was what led me down this rabbit hole :)

Aug 5 2019, 10:58 AM · Patch-For-Review, Traffic, netops, Operations, IPv6

Aug 2 2019

jbond added a comment to T102099: Fix IPv6 autoconf issues once and for all, across the fleet..

Our current set up seems to use SLACC without any of the privacy extensions as such the lower 64 bits of the ipv6 address are composed of the mac address of the interface. It seems this would be fairly simple to script the AAAA records and then im not sure if we need to do much configuration. Although the conversion then turns to what do we do with all the hosts that currently have interface::add_ip6_mapped

Answering my own question, as NIC's can and do change in a server lifetime using SLAAC like this is undesirable as we dont want an IP change if we change a NIC

Aug 2 2019, 10:09 AM · Patch-For-Review, Traffic, netops, Operations, IPv6

Aug 1 2019

jbond added a comment to T228854: Use git commit id as "configuration version" for puppet.

I think this is a really good idea. further after a bit of investigation i think any arbitrary string can be used. Later versions of the puppet documentation even use git rev-parse as an example command. also r10k has a script in there repo which has the sha1, git server and last git log message as the version

Aug 1 2019, 5:00 PM · Operations, observability, Puppet
jbond added a comment to T102099: Fix IPv6 autoconf issues once and for all, across the fleet..

Hi all,

Aug 1 2019, 3:55 PM · Patch-For-Review, Traffic, netops, Operations, IPv6
jbond closed T229571: labpuppetmaster1001: puppet catalog error related to canary_host as Resolved.

Sorry for the interuption, this issues ws caused by https://gerrit.wikimedia.org/r/c/operations/puppet/+/527064 when i removed the config entry at the common level. I have now added a default of 0 canary_hosts in labs.yaml and comon/profile/puppetmaster/frontend.yaml. Please reopen if you still see issues

Aug 1 2019, 12:01 PM · cloud-services-team (Kanban)
jbond claimed T229571: labpuppetmaster1001: puppet catalog error related to canary_host.
Aug 1 2019, 11:58 AM · cloud-services-team (Kanban)

Jul 31 2019

jbond added a comment to T229397: Puppet: get row/rack info from Netbox.

Have a script that periodically generates a hiera file with that information for each host, to be merged by the puppetmasters with the public tree as we already do with the private repo

Jul 31 2019, 1:12 PM · Patch-For-Review, Puppet, Operations

Jul 30 2019

jbond created P8823 (An Untitled Masterwork).
Jul 30 2019, 10:39 AM
jbond created P8822 (An Untitled Masterwork).
Jul 30 2019, 10:30 AM

Jul 25 2019

jbond added a comment to T208566: puppet.git rake fails with ruby 2.5.

In case this needs rolling back the the issue can be fixed in 4.8 with the following patch https://phabricator.wikimedia.org/P8772

Jul 25 2019, 4:33 PM · Patch-For-Review, Continuous-Integration-Config, Operations, Puppet

Jul 22 2019

jbond closed T226508: Icinga custom checks should follow our HTTP User-Agent policy as Resolved.

yes, closing thanks

Jul 22 2019, 3:14 PM · observability, Operations
jbond added a comment to T227587: upgrade puppet master servers.

puppetmaster1003 has now been build and is running puppet 5.5

Jul 22 2019, 1:06 PM · Patch-For-Review, Packaging, Puppet, Operations
jbond created T228657: Upgrade Puppet Masters and Puppet DB servers.
Jul 22 2019, 1:05 PM · Patch-For-Review, Puppet

Jul 19 2019

jbond added a comment to T224977: puppet-catalog-compiler: compilation result randomly places servers in the 'failed' section.

Here is another example. puppetmaster1003.eqiad.wmnet is in both 'Hosts that have no differences' and 'Hosts that fail to compile when the change is applied'

Jul 19 2019, 2:30 PM · Operations, puppet-compiler
jbond edited P8775 example output.
Jul 19 2019, 11:34 AM
jbond created P8775 example output.
Jul 19 2019, 11:25 AM
jbond added a comment to T224977: puppet-catalog-compiler: compilation result randomly places servers in the 'failed' section.

After checking https://puppet-compiler.wmflabs.org/compiler1001/16855/ change error/warning logs for hosts marked as "fail to compile when the change is applied" it looks like two warnings are being interpreted as errors:

Warning: Unknown variable: '::restricted_to'. at /srv/jenkins-workspace/puppet-compiler/16855/change/src/modules/profile/manifests/ldap/client/labs.pp:5:72
Warning: Unknown variable: '::restricted_from'. at /srv/jenkins-workspace/puppet-compiler/16855/change/src/modules/profile/manifests/ldap/client/labs.pp:6:76
Jul 19 2019, 10:09 AM · Operations, puppet-compiler
jbond added a subtask for T224977: puppet-catalog-compiler: compilation result randomly places servers in the 'failed' section: T228266: PCC always has an ERROR when compiling for servers with profile::redis::slave.
Jul 19 2019, 10:09 AM · Operations, puppet-compiler
jbond added a parent task for T228266: PCC always has an ERROR when compiling for servers with profile::redis::slave: T224977: puppet-catalog-compiler: compilation result randomly places servers in the 'failed' section.
Jul 19 2019, 10:09 AM · Continuous-Integration-Infrastructure, Packaging, Operations

Jul 18 2019

jbond closed T228174: Function Call, wrong number of arguments (4 for 5) when a puppet master is connected to labs puppetmaster as Resolved.
Jul 18 2019, 8:40 PM · cloud-services-team (Kanban), Cloud-Services
jbond added a comment to T225604: log spam from mtail 3.0.0~rc19 on wezen.

This error is triggered slower on some machines vs others

  • cp2026 (cache::upload): takes a bit longer to die ~ 5 minutes
  • cp1085 (cache::text): dies instantly
  • cp1075 (cache::text): dies instantly
  • cp1076 (cache::upload): dies instantly
Jul 18 2019, 2:22 PM · Patch-For-Review, observability
jbond added a comment to T225604: log spam from mtail 3.0.0~rc19 on wezen.

Unfortunately this had to be downgraded again. We saw the following error with the new version

Jul 18 2019, 1:32 PM · Patch-For-Review, observability
jbond updated the task description for T201342: rack/setup/install puppetmaster1003.eqiad.wmnet.
Jul 18 2019, 12:00 PM · Operations
jbond added a comment to T225604: log spam from mtail 3.0.0~rc19 on wezen.

I have now pushed a patch so so that mtail uses -logs /dev/stdin instead of -logfds 0. would you like me to upgrade everything to mtail_3.0.0~rc24.1-1+wmf1_amd64.deb again?

Jul 18 2019, 10:50 AM · Patch-For-Review, observability
jbond added a comment to T228395: puppetdb prometheus metrics per-host metrics.

below are all the hosts with puppetdb package installed

Jul 18 2019, 10:29 AM · User-fgiunchedi, Operations, Puppet
jbond created P8772 puppet4.8 with ruby > 2.3 - NoMethodError: undefined method `<<' for nil:NilClass.
Jul 18 2019, 10:09 AM

Jul 17 2019

jbond added a comment to T228266: PCC always has an ERROR when compiling for servers with profile::redis::slave.

Investigating further this is due to how populate_puppetdb adds entries to the database. p`opulate_puppetdb` loops through the site.pp manifest picks one host from every role and compiles its catalog locally so that the result is sent to the puppetdb. This means that the puppet DB is populated and the other calls to query_resource have enough data for them succeed.

Jul 17 2019, 3:48 PM · Continuous-Integration-Infrastructure, Packaging, Operations
jbond renamed T228266: PCC always has an ERROR when compiling for servers with profile::redis::slave from Investigate if puppetdbquery::query_resources should work on PCC to PCC always has an ERROR when compiling for servers with profile::redis::slave.
Jul 17 2019, 3:39 PM · Continuous-Integration-Infrastructure, Packaging, Operations
jbond updated the task description for T228266: PCC always has an ERROR when compiling for servers with profile::redis::slave.
Jul 17 2019, 3:23 PM · Continuous-Integration-Infrastructure, Packaging, Operations
jbond closed T227779: Hiera incompatible with newer versions of puppet, a subtask of T227587: upgrade puppet master servers, as Resolved.
Jul 17 2019, 11:23 AM · Patch-For-Review, Packaging, Puppet, Operations
jbond closed T227779: Hiera incompatible with newer versions of puppet as Resolved.
Jul 17 2019, 11:23 AM · Patch-For-Review, Packaging, Puppet, Operations
jbond updated the task description for T228266: PCC always has an ERROR when compiling for servers with profile::redis::slave.
Jul 17 2019, 11:06 AM · Continuous-Integration-Infrastructure, Packaging, Operations