Page MenuHomePhabricator

jbond (John Bond)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Monday

  • Clear sailing ahead.

User Details

User Since
Jan 7 2019, 1:06 PM (121 w, 5 d)
Availability
Available
IRC Nick
jbond42
LDAP User
Jbond
MediaWiki User
JBond (WMF) [ Global Accounts ]

Recent Activity

Yesterday

jbond added a comment to T261693: Ensure Puppet checks types as part of the build.

However this depends heavily on the PS i had to revert as that takes care of dependencies configuring, hiera custom functions, the private repo and other things required to correctly compile the catalogue

Also worth noting that i did merge a similar change to this

Fri, May 7, 1:42 PM · Patch-For-Review, puppet-compiler, Puppet, SRE
jbond added a project to T281369: Additional CFSSL tasks: CFSSL-PKI.
Fri, May 7, 12:58 PM · CFSSL-PKI, Patch-For-Review, SRE
jbond updated the task description for T281369: Additional CFSSL tasks.
Fri, May 7, 12:57 PM · CFSSL-PKI, Patch-For-Review, SRE
jbond updated the task description for T281369: Additional CFSSL tasks.
Fri, May 7, 12:57 PM · CFSSL-PKI, Patch-For-Review, SRE
jbond created P15852 (An Untitled Masterwork).
Fri, May 7, 10:38 AM

Thu, May 6

jbond updated the task description for T281369: Additional CFSSL tasks.
Thu, May 6, 3:29 PM · CFSSL-PKI, Patch-For-Review, SRE
jbond created P15834 pcc.
Thu, May 6, 2:53 PM
jbond created P15831 (An Untitled Masterwork).
Thu, May 6, 2:14 PM
jbond created P15830 (An Untitled Masterwork).
Thu, May 6, 2:10 PM

Wed, May 5

jbond added a comment to T281371: Request Project (cfssl-pki) for pki tasks.

thanks updated let me know if anything further is required

Wed, May 5, 2:41 PM · Project-Admins
jbond updated the task description for T281371: Request Project (cfssl-pki) for pki tasks.
Wed, May 5, 2:41 PM · Project-Admins

Tue, May 4

jbond closed T281370: Create a discover CA, a subtask of T281369: Additional CFSSL tasks, as Resolved.
Tue, May 4, 4:30 PM · CFSSL-PKI, Patch-For-Review, SRE
jbond closed T281370: Create a discover CA as Resolved.
Tue, May 4, 4:30 PM · SRE
jbond closed T281366: Revoke debmonitor.discovery.wmnet, a subtask of T281369: Additional CFSSL tasks, as Resolved.
Tue, May 4, 3:46 PM · CFSSL-PKI, Patch-For-Review, SRE
jbond closed T281366: Revoke debmonitor.discovery.wmnet as Resolved.
Tue, May 4, 3:46 PM · SRE
jbond committed rLPRI5c3e4cdb0c15: pki: add discovery fake cert (authored by jbond).
pki: add discovery fake cert
Tue, May 4, 2:04 PM
jbond updated the task description for T281369: Additional CFSSL tasks.
Tue, May 4, 1:39 PM · CFSSL-PKI, Patch-For-Review, SRE
jbond updated the task description for T281369: Additional CFSSL tasks.
Tue, May 4, 1:35 PM · CFSSL-PKI, Patch-For-Review, SRE
jbond updated the task description for T281369: Additional CFSSL tasks.
Tue, May 4, 1:31 PM · CFSSL-PKI, Patch-For-Review, SRE
jbond closed T281376: Add PKI root CA to ca-certificates via puppet, a subtask of T281369: Additional CFSSL tasks, as Resolved.
Tue, May 4, 12:28 PM · CFSSL-PKI, Patch-For-Review, SRE
jbond closed T281376: Add PKI root CA to ca-certificates via puppet as Resolved.
Tue, May 4, 12:28 PM · SRE
jbond added a comment to T281376: Add PKI root CA to ca-certificates via puppet.

This has been completed

Tue, May 4, 12:28 PM · SRE
jbond added a comment to T276459: Mail for Gitlab.

Pretty straigthforward, but please review, thanks.

LGTM thanks

Tue, May 4, 10:29 AM · Mail, GitLab (Initialization)

Mon, May 3

jbond added a comment to T276148: SSH Access of Git data in GitLab.
Mon, May 3, 4:57 PM · Patch-For-Review, Release-Engineering-Team (Doing), SRE, User-brennen, GitLab (Initialization)
jbond added a comment to T281692: Wipe DNS cache when destroying VM's.

to be extra usefull doing a reload of nsdc on bastion-restricted.wmflabs.org would also be usefull

Mon, May 3, 1:45 PM · Cloud-Services
jbond triaged T281700: PDNS in cloud can return inconsistent answers as Low priority.
Mon, May 3, 12:26 PM · Traffic, SRE, DNS, Cloud-Services
jbond triaged T281692: Wipe DNS cache when destroying VM's as Medium priority.
Mon, May 3, 11:31 AM · Cloud-Services
jbond committed rLPRIe7bac6ecc799: move key to common.yaml (authored by jbond).
move key to common.yaml
Mon, May 3, 10:19 AM
jbond committed rLPRI1386c96c223c: add profile::pki::client::auth_key (authored by jbond).
add profile::pki::client::auth_key
Mon, May 3, 10:03 AM

Fri, Apr 30

jbond added a subtask for T281369: Additional CFSSL tasks: T281371: Request Project (cfssl-pki) for pki tasks.
Fri, Apr 30, 2:23 PM · CFSSL-PKI, Patch-For-Review, SRE
jbond added a parent task for T281371: Request Project (cfssl-pki) for pki tasks: T281369: Additional CFSSL tasks.
Fri, Apr 30, 2:23 PM · Project-Admins
jbond updated the task description for T281369: Additional CFSSL tasks.
Fri, Apr 30, 2:22 PM · CFSSL-PKI, Patch-For-Review, SRE

Thu, Apr 29

jbond created P15660 (An Untitled Masterwork).
Thu, Apr 29, 3:24 PM
jbond updated the task description for T279683: Investigate iptables replacements.
Thu, Apr 29, 2:50 PM · User-MoritzMuehlenhoff, Security, SRE
jbond updated the task description for T281369: Additional CFSSL tasks.
Thu, Apr 29, 11:53 AM · CFSSL-PKI, Patch-For-Review, SRE
jbond committed rLPRI00b54f27aede: fix key name (authored by jbond).
fix key name
Thu, Apr 29, 11:34 AM
jbond committed rLPRIf72e6dc5d9a9: rename hiera file (authored by jbond).
rename hiera file
Thu, Apr 29, 11:31 AM
jbond committed rLPRI888756b2e6f5: updte key name (authored by jbond).
updte key name
Thu, Apr 29, 11:31 AM
jbond committed rLPRIc90e7f13f0a3: Merge branch 'master' of ssh://gerrit.wikimedia.org:29418/labs/private (authored by jbond).
Merge branch 'master' of ssh://gerrit.wikimedia.org:29418/labs/private
Thu, Apr 29, 11:31 AM
jbond closed T268882: PKI/CFSSL Next steps as Resolved.
Thu, Apr 29, 11:06 AM · Patch-For-Review, User-jbond, SRE
jbond updated the task description for T268882: PKI/CFSSL Next steps.
Thu, Apr 29, 11:06 AM · Patch-For-Review, User-jbond, SRE
ema awarded T280484: debmonitor-client.service stays in failed state in case of server errors a Baby Tequila token.
Thu, Apr 29, 7:22 AM · SRE-tools, SRE

Wed, Apr 28

jbond closed T280484: debmonitor-client.service stays in failed state in case of server errors as Resolved.
Wed, Apr 28, 3:44 PM · SRE-tools, SRE
jbond closed T280892: debmonitor-client.postinst: line 7: systemd-sysusers: command not found on stretch docker images as Resolved.

This should be fixed now, please reopen if you are still seeing issues

Wed, Apr 28, 3:44 PM · SRE, SRE-tools
jbond updated the task description for T281369: Additional CFSSL tasks.
Wed, Apr 28, 3:14 PM · CFSSL-PKI, Patch-For-Review, SRE
jbond created T281377: debmonitor.discovery.wmnet: Generate server cetificate via cfssl.
Wed, Apr 28, 2:50 PM · SRE
jbond created T281376: Add PKI root CA to ca-certificates via puppet.
Wed, Apr 28, 2:48 PM · SRE
jbond created T281371: Request Project (cfssl-pki) for pki tasks.
Wed, Apr 28, 2:30 PM · Project-Admins
jbond triaged T281369: Additional CFSSL tasks as Medium priority.
Wed, Apr 28, 2:25 PM · CFSSL-PKI, Patch-For-Review, SRE
jbond triaged T281370: Create a discover CA as Medium priority.
Wed, Apr 28, 2:25 PM · SRE
jbond created T281370: Create a discover CA.
Wed, Apr 28, 2:25 PM · SRE
jbond added a subtask for T281369: Additional CFSSL tasks: T281366: Revoke debmonitor.discovery.wmnet.
Wed, Apr 28, 2:22 PM · CFSSL-PKI, Patch-For-Review, SRE
jbond added a parent task for T281366: Revoke debmonitor.discovery.wmnet: T281369: Additional CFSSL tasks.
Wed, Apr 28, 2:22 PM · SRE
jbond created T281369: Additional CFSSL tasks.
Wed, Apr 28, 2:22 PM · CFSSL-PKI, Patch-For-Review, SRE
jbond updated the task description for T281366: Revoke debmonitor.discovery.wmnet.
Wed, Apr 28, 2:10 PM · SRE
jbond triaged T281366: Revoke debmonitor.discovery.wmnet as Medium priority.
Wed, Apr 28, 1:53 PM · SRE
jbond committed rLPRIe7d3a4d86ce6: add debmon fake key (authored by jbond).
add debmon fake key
Wed, Apr 28, 1:39 PM
jbond committed rLPRI779d7d3a7d29: Merge branch 'master' of ssh://gerrit.wikimedia.org:29418/labs/private (authored by jbond).
Merge branch 'master' of ssh://gerrit.wikimedia.org:29418/labs/private
Wed, Apr 28, 1:39 PM
jbond added a comment to T247364: Forward port Python2 files to Python3 in Puppet Repository.

I merged a couple of changes today, just a note that subprocess.Popen (and all the functions which inturn call this) dos not support the text=True option untill python 3.7 (buster) so we need to use universal_newlines=True instead for any scripts which need to run on debian < buster

Wed, Apr 28, 11:36 AM · SRE, Patch-For-Review, User-MoritzMuehlenhoff, User-crusnov, User-jbond, Python3-Porting, SRE-tools, Puppet
jbond updated the task description for T281277: Request creation of puppet-dev VPS project.
Wed, Apr 28, 11:27 AM · Cloud-VPS (Project-requests)
jbond added a comment to T281277: Request creation of puppet-dev VPS project.

. Please use puppet-dev as the name, if possible.

will update description

Wed, Apr 28, 11:27 AM · Cloud-VPS (Project-requests)
jbond added a comment to T265138: OKR: Work required to prepare for puppet 6.

@Ladsgroup This could well be to do with how puppetlabs defines core type however it has definitely been removed from the puppet git repo you can see that in this commit which also references PUP-8836 (although there is not much information on the later). That said i have created a documentation bug to seek clarification.

Still no update on the ticked however it looks like debian will package resources like cron_core as separate packages

Wed, Apr 28, 10:00 AM · Patch-For-Review, User-jbond, puppet-compiler, SRE, Puppet

Tue, Apr 27

jbond created T281277: Request creation of puppet-dev VPS project.
Tue, Apr 27, 3:13 PM · Cloud-VPS (Project-requests)
jbond added a comment to T281263: Primary s4 db Incident report review.

BTW, "Replication incident" was a good name when you took control- we didn't know what was going on at first, so when replication breaks is either the replicas or the primary server :-). We do now, so that is why I renamed it to not confuse it with a "regular" replication error problem.

Tue, Apr 27, 2:59 PM · SRE-OnFire-Incident-Docs
jbond added projects to T281267: various weekly and daily dumps run from systemd timers are broken: observability, SRE.
Tue, Apr 27, 2:49 PM · wdwb-tech, Wikidata, SRE, observability, Dumps-Generation
jbond updated subscribers of T281267: various weekly and daily dumps run from systemd timers are broken.

@fgiunchedi there is a requirement to forward a subset of icinga alerts to a different set of users. either sending to an email address or something fancier like a push notifications.

Tue, Apr 27, 2:45 PM · wdwb-tech, Wikidata, SRE, observability, Dumps-Generation
jbond closed T281267: various weekly and daily dumps run from systemd timers are broken as Resolved.

This has been fixed in https://gerrit.wikimedia.org/r/c/operations/puppet/+/682922. systemd-timer-mail-wrapper has been updated us command uses nagrs=argparse.REMAINDER. this means that the $command paramater needs to come last

Tue, Apr 27, 2:19 PM · wdwb-tech, Wikidata, SRE, observability, Dumps-Generation
jbond updated the task description for T281263: Primary s4 db Incident report review.
Tue, Apr 27, 1:58 PM · SRE-OnFire-Incident-Docs
jbond updated the task description for T281263: Primary s4 db Incident report review.
Tue, Apr 27, 1:58 PM · SRE-OnFire-Incident-Docs
jbond added a comment to T281263: Primary s4 db Incident report review.

Maybe the confusion was with our monitoring infrastructure(?), but that is the name of that mysql component.

The confusion was mine, i thought this fired as an icinga log, this can be ignored now

Tue, Apr 27, 1:58 PM · SRE-OnFire-Incident-Docs
jbond updated the task description for T281263: Primary s4 db Incident report review.
Tue, Apr 27, 1:57 PM · SRE-OnFire-Incident-Docs
jbond renamed T281263: Primary s4 db Incident report review from MySQL Replication Incident repot review to MySQL Replication Incident report review.
Tue, Apr 27, 1:34 PM · SRE-OnFire-Incident-Docs
jbond created T281263: Primary s4 db Incident report review.
Tue, Apr 27, 1:30 PM · SRE-OnFire-Incident-Docs
jbond renamed T281261: Update grafana link for mediawiki-error-rate-$cluster in icinga check from Ubtade grafana link for mediawiki-error-rate-$cluster check to Update grafana link for mediawiki-error-rate-$cluster in icinga check.
Tue, Apr 27, 1:23 PM · serviceops, SRE
jbond updated subscribers of T281261: Update grafana link for mediawiki-error-rate-$cluster in icinga check.

@jijiki perhaps?

Tue, Apr 27, 1:22 PM · serviceops, SRE
jbond created T281261: Update grafana link for mediawiki-error-rate-$cluster in icinga check.
Tue, Apr 27, 1:21 PM · serviceops, SRE
jbond added a project to T281249: Create or modify an existing tool that quickly shows the db replication status in case of master failure: Sustainability (Incident Followup).
Tue, Apr 27, 12:56 PM · Sustainability (Incident Followup), SRE-tools, DBA
jbond added a comment to T281249: Create or modify an existing tool that quickly shows the db replication status in case of master failure.

Jbond: we already collect those metrics, what we don't have is a way to show them easily.

Tue, Apr 27, 12:55 PM · Sustainability (Incident Followup), SRE-tools, DBA
jbond created T281251: Collect metricts for Exec_Master_Log_Pos.
Tue, Apr 27, 12:52 PM · Sustainability (Incident Followup), SRE, DBA
jbond updated the task description for T281242: ReEnable GlobalUsage.
Tue, Apr 27, 12:25 PM · User-Ladsgroup, Sustainability (Incident Followup), SRE
jbond updated the task description for T281242: ReEnable GlobalUsage.
Tue, Apr 27, 12:25 PM · User-Ladsgroup, Sustainability (Incident Followup), SRE
jbond created T281242: ReEnable GlobalUsage.
Tue, Apr 27, 12:24 PM · User-Ladsgroup, Sustainability (Incident Followup), SRE
jbond created T281240: Ensure Changeprop is disabled when the databases are in read only mode .
Tue, Apr 27, 11:42 AM · ChangeProp, Sustainability (Incident Followup), SRE, serviceops

Mon, Apr 26

jbond added a comment to T281176: Puppet broken on restricted.bastion.wmcloud.org.

Thanks @Dzahn I have created a CR to update the supported types as the bastion host passes max_startups: 35:30:60 which should be supported. Ill take this forward tomorrow but feel free to merge the change your self if its blocking

Mon, Apr 26, 5:14 PM · Cloud-VPS, cloud-services-team (Kanban)
jbond added a comment to T221388: Test dhcp-option 82.

Just to confirm after removing use-vlan-id re-imaging of sretest1002 worked fine

Mon, Apr 26, 3:29 PM · Patch-For-Review, SRE, netops
jbond added a comment to T221388: Test dhcp-option 82.

I'm probably not up to date on concrete plans built on top of this, but it seems like having the numeric vlan id might be useful metadata here in addition to the abstract name of the vlan (e.g. scenarios where we might do vlan trunking on the main interface of the host and need to see or match that primary-vlan number in some interface setup scripts?)

@BBlack The option on the JunOS side allows to pick either the name or the ID, not both. In terms of assured uniqueness we can surely use the ID if the name can be duplicated (I don't think that Netbox enforces it).

Mon, Apr 26, 1:36 PM · Patch-For-Review, SRE, netops
jbond closed T281004: grafana-rw SSO redirect breaks template parameters due to double encoding as Resolved.

I have applied a patch which seems to have fixed this, going to resolve the ticket but please reopen if you still see issues

Mon, Apr 26, 12:16 PM · CAS-SSO, SRE, observability
jbond closed T281090: Various debmonitor-client systemdtimer errors starting April 21st as Resolved.

I have now increased the default expiry of certs, deployed the newest version of debmonitor-client and fixed systemd service logging. I'm going to optimistically assume this has fixed all issues and mark this resolved but please reopen if more issues arise

Mon, Apr 26, 11:34 AM · SRE, SRE-tools
jbond added a comment to T281090: Various debmonitor-client systemdtimer errors starting April 21st.

Failed to execute DebMonitor CLI: [SSL] PEM lib (_ssl.c:2947)
'PEM routines', 'get_name', 'no start line'), ('SSL routines', 'use_certificate_chain_file', 'PEM lib'

Mon, Apr 26, 9:27 AM · SRE, SRE-tools
jbond added a comment to T281090: Various debmonitor-client systemdtimer errors starting April 21st.

To confirm i have just pushed out 0.2.9 which should fix the JSONDecodeError and 'Retry' and 'int' issues.

Mon, Apr 26, 8:31 AM · SRE, SRE-tools

Fri, Apr 23

jbond added a comment to T280622: Determine safe concurrent puppet run batches via cumin.

The result was 1000+ DNS requests per agent run.

yes i also hit the same issue in $JOB~1 will try and find the relevant bugs and check on progress. As mentioned the pop caches will help here, further i have I have also drafted a change to start using systemd-resolved which could also help (although it was a Friday afternoon draft and needs much more testing and thought)

Fri, Apr 23, 4:49 PM · SRE, Puppet
jbond created P15514 (An Untitled Masterwork).
Fri, Apr 23, 3:27 PM
jbond added a comment to T280622: Determine safe concurrent puppet run batches via cumin.

@akosiaris thanks for digging into this a bit further, and appolagise for not leaving more then a drive by comment:

Fri, Apr 23, 1:49 PM · SRE, Puppet
jbond created P15513 (An Untitled Masterwork).
Fri, Apr 23, 11:01 AM
jbond added a comment to T265904: Remove SLAAC IPs from Ganeti hosts.

note: I created a bug against facter4 which is related

FYI this has been resolved however i need to create an additional BUG report to ask for the attributes to also get exposed by facter

Fri, Apr 23, 9:31 AM · Patch-For-Review, Traffic, SRE

Thu, Apr 22

jbond added a comment to T280892: debmonitor-client.postinst: line 7: systemd-sysusers: command not found on stretch docker images.

I think this is just an old log entry from before i uploaded the new package (21/04/2021). debmonitor is installed into the docker image by docker-report so no need to rebuild the imags. I have just ran it my self manually and is mostly working although we i did get one error will check this out tomorrow

Thu, Apr 22, 9:43 AM · SRE, SRE-tools

Wed, Apr 21

jbond committed rOHPU567b8afe06b5: CR:firewall: remove tangeling term (authored by jbond).
CR:firewall: remove tangeling term
Wed, Apr 21, 12:46 PM
jbond committed rOHPU488c0c142bb0: PKI access: open access to the pki service for analytics and cloud (authored by jbond).
PKI access: open access to the pki service for analytics and cloud
Wed, Apr 21, 12:30 PM
jbond added a comment to T228591: Document how to request installing additional SVG and PDF fonts on Wikimedia servers.

@jbond: Prioritizing a task as "medium" priority but not being able to give an answer on priority confuses me - how does that go together? I guess I wonder why this (and many other tickets) were not set to "low" priority which feels more realistic given the limited resources and creates less expectations.

Wed, Apr 21, 11:58 AM · SRE, Wikimedia-General-or-Unknown, Documentation, Wikimedia-SVG-rendering
jbond added a comment to T228591: Document how to request installing additional SVG and PDF fonts on Wikimedia servers.

@Aklapper I think SRE are only tagged on this ticket in case there are any puppet changes to be made, reading the ticket im not sure there is so not sure if it makes sense to have sre tagged. As such im not the one who would be working on this and therefore can't really give an answer on priority. however i do notice that the request was made by yourself and a response given https://phabricator.wikimedia.org/T228591#6221461, has that response answered your request or are there still some specifics required.

Wed, Apr 21, 8:53 AM · SRE, Wikimedia-General-or-Unknown, Documentation, Wikimedia-SVG-rendering