Page MenuHomePhabricator

herron (Keith Herron)
Ops Engineer

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
May 30 2017, 5:25 PM (103 w, 2 d)
Availability
Available
IRC Nick
herron
LDAP User
Herron
MediaWiki User
Unknown

Recent Activity

Yesterday

herron added a comment to T223493: rack/setup/install kafka-main200[1-5].

Hey @Papaul, I added a raid10-gpt-srv-lvm-ext4-8disks.cfg for the initial installs on these.

Wed, May 22, 7:10 PM · Patch-For-Review, ops-codfw, Operations
herron triaged T224128: Migrate network device syslogs to Kafka logging pipeline as Normal priority.
Wed, May 22, 2:52 PM · Patch-For-Review, User-herron, Operations, netops, Wikimedia-Logstash
herron added a comment to T221969: Puppet catalog compiler - increasing max concurrent jobs.

I thought about this task a little bit. The current instances have 4 vCPUs. The operations-puppet-catalog-compiler-test job runs the compiler with NUM_THREADS=2

I would suggest:

  • to use x1.large instances (8 vCPUS / 16G RAM / 160G disk). The RAM / disk is a bit overkill since the compiler is mostly CPU bound iirc.
  • Set the jobs to use NUM_THREADS=6 (or 7? so we at least have one CPU available for the rest)
  • add a third instance to the pool
Wed, May 22, 2:25 PM · Release-Engineering-Team (Kanban), puppet-compiler, Continuous-Integration-Infrastructure

Tue, May 21

herron renamed T213902: Implement sensitive logstash access control from Implement sensitive log access control to Implement sensitive logstash access control.
Tue, May 21, 8:38 PM · Patch-For-Review, User-herron, Operations, Wikimedia-Logstash
herron renamed T213902: Implement sensitive logstash access control from [stretch] Implement sensitive log access control, onboard 3 sensitive log producers to Implement sensitive log access control.
Tue, May 21, 8:38 PM · Patch-For-Review, User-herron, Operations, Wikimedia-Logstash
herron added a subtask for T220103: TEC6: Logging infrastructure (Q4 2018/19 goal): T213902: Implement sensitive logstash access control.
Tue, May 21, 8:37 PM · Patch-For-Review, Wikimedia-Logstash, User-fgiunchedi, Operations, Goal
herron added a parent task for T213902: Implement sensitive logstash access control: T220103: TEC6: Logging infrastructure (Q4 2018/19 goal).
Tue, May 21, 8:37 PM · Patch-For-Review, User-herron, Operations, Wikimedia-Logstash
herron updated the task description for T220103: TEC6: Logging infrastructure (Q4 2018/19 goal).
Tue, May 21, 8:37 PM · Patch-For-Review, Wikimedia-Logstash, User-fgiunchedi, Operations, Goal

Fri, May 17

herron added a comment to T223493: rack/setup/install kafka-main200[1-5].

Good point! And if we number from 200[1-5] it should simplify mapping of broker IDs between old and new hosts too. I updated the description to reflect this, but if you think its best to keep the 200[4-8] suffix happy to go that route instead.

Fri, May 17, 2:49 PM · Patch-For-Review, ops-codfw, Operations
herron renamed T223493: rack/setup/install kafka-main200[1-5] from rack/setup/install kafka200[4-8] to rack/setup/install kafka-main200[1-5].
Fri, May 17, 2:45 PM · Patch-For-Review, ops-codfw, Operations
herron updated the task description for T223493: rack/setup/install kafka-main200[1-5].
Fri, May 17, 2:44 PM · Patch-For-Review, ops-codfw, Operations

Thu, May 16

herron triaged T223483: Logstash stops processing messages if a single output becomes blocked as Normal priority.
Thu, May 16, 7:38 PM · Operations, Wikimedia-Logstash

Tue, May 14

herron awarded T222800: Requesting quota increase for 'puppet-diffs' project a Party Time token.
Tue, May 14, 8:59 PM · Operations, Cloud-VPS (Quota-requests), puppet-compiler

Mon, May 13

herron moved T187147: Port mediawiki/php/wmerrors to PHP7 and deploy from Backlog to In Dev/Progress on the Wikimedia-Logstash board.
Mon, May 13, 3:25 PM · Core Platform Team Kanban (Doing), wmerrors, Wikimedia-Logstash, MediaWiki-Logging, Operations, User-herron, MW-1.34-notes (1.34.0-wmf.5; 2019-05-14), Patch-For-Review, PHP 7.2 support, Core Platform Team (PHP7 (TEC4)), Performance-Team (Radar)
herron added a project to T187147: Port mediawiki/php/wmerrors to PHP7 and deploy: Wikimedia-Logstash.
Mon, May 13, 3:22 PM · Core Platform Team Kanban (Doing), wmerrors, Wikimedia-Logstash, MediaWiki-Logging, Operations, User-herron, MW-1.34-notes (1.34.0-wmf.5; 2019-05-14), Patch-For-Review, PHP 7.2 support, Core Platform Team (PHP7 (TEC4)), Performance-Team (Radar)
herron added a comment to T221969: Puppet catalog compiler - increasing max concurrent jobs.

I have deployed it on May 6th and thus puppet compile jobs should be hopefully equally split between the compiler1001 and compiler1002. I don't know how to proof check that though :-(

Mon, May 13, 1:56 PM · Release-Engineering-Team (Kanban), puppet-compiler, Continuous-Integration-Infrastructure

Fri, May 10

herron moved T187147: Port mediawiki/php/wmerrors to PHP7 and deploy from Backlog to Working on on the User-herron board.
Fri, May 10, 5:16 PM · Core Platform Team Kanban (Doing), wmerrors, Wikimedia-Logstash, MediaWiki-Logging, Operations, User-herron, MW-1.34-notes (1.34.0-wmf.5; 2019-05-14), Patch-For-Review, PHP 7.2 support, Core Platform Team (PHP7 (TEC4)), Performance-Team (Radar)
herron added projects to T187147: Port mediawiki/php/wmerrors to PHP7 and deploy: Operations, MediaWiki-Logging.
Fri, May 10, 5:16 PM · Core Platform Team Kanban (Doing), wmerrors, Wikimedia-Logstash, MediaWiki-Logging, Operations, User-herron, MW-1.34-notes (1.34.0-wmf.5; 2019-05-14), Patch-For-Review, PHP 7.2 support, Core Platform Team (PHP7 (TEC4)), Performance-Team (Radar)
herron added a project to T187147: Port mediawiki/php/wmerrors to PHP7 and deploy: User-herron.
Fri, May 10, 5:16 PM · Core Platform Team Kanban (Doing), wmerrors, Wikimedia-Logstash, MediaWiki-Logging, Operations, User-herron, MW-1.34-notes (1.34.0-wmf.5; 2019-05-14), Patch-For-Review, PHP 7.2 support, Core Platform Team (PHP7 (TEC4)), Performance-Team (Radar)
herron added a comment to T187147: Port mediawiki/php/wmerrors to PHP7 and deploy.

@herron Yes, I can do that to help avoid this specific instance of the problem. The problem I'd like to solve in this task, however, is to be able to detect it. That is, if there is a significant influx of errors that happen to be too large, this really should show up under type:mediawiki in some kind of channel (e.g. syslog_truncated) with a severity of "ERROR", so that they still get counted and immediately trigger the necessary alarms during a MediaWiki deployment.

For that it's totally find if the json is no parsed and only stored as raw message text. It would still be picked up at least with a timestamp, type and a bit of context (e.g. which MW server it came from), and the raw text will have to suffice for a MW developer to figure out where it came from and either to fix the problem that caused the error to be reported, or to make the error message less big.

But the immediate issue is to be able to at least index them and detect the problem.

Fri, May 10, 5:15 PM · Core Platform Team Kanban (Doing), wmerrors, Wikimedia-Logstash, MediaWiki-Logging, Operations, User-herron, MW-1.34-notes (1.34.0-wmf.5; 2019-05-14), Patch-For-Review, PHP 7.2 support, Core Platform Team (PHP7 (TEC4)), Performance-Team (Radar)

Thu, May 9

herron closed T182819: custom fact interface_primary breaks under newer versions of facter as Resolved.

yup!

Thu, May 9, 8:06 PM · User-herron, Patch-For-Review, Puppet, Operations
herron closed T182819: custom fact interface_primary breaks under newer versions of facter, a subtask of T177254: Upgrade to puppet 4 (4.8 or newer), as Resolved.
Thu, May 9, 8:06 PM · cloud-services-team (FY2017-18), Puppet, User-Joe, Operations
herron moved T213902: Implement sensitive logstash access control from Backlog to Working on on the User-herron board.
Thu, May 9, 8:05 PM · Patch-For-Review, User-herron, Operations, Wikimedia-Logstash
herron moved T217359: Possibly expand Kafka main-{eqiad,codfw} clusters in Q4 2019. from Backlog to Working on on the User-herron board.
Thu, May 9, 8:05 PM · User-herron, Core Platform Team (Modern Event Platform (TEC2)), Core Platform Team Backlog (Watching / External), Services (watching), EventBus, Analytics, Operations
herron moved T220387: Transition Kafka main ownership from Analytics Engineering to SRE - (2018-2019 Q4 SRE Goal Tracking Task) from Backlog to Working on on the User-herron board.
Thu, May 9, 8:05 PM · User-herron, Operations
herron moved T222075: Prevent puppet catalog compiler workers from running out of disk space from Backlog to Working on on the User-herron board.
Thu, May 9, 8:05 PM · observability, User-herron, puppet-compiler, Operations
herron added a comment to T187147: Port mediawiki/php/wmerrors to PHP7 and deploy.

After further testing I'm seeing these messages are arriving to rsyslog with @cee formatting, but truncated. Meaning the msg field does not contain valid json, specifically within the json-in-json-in-json field msg.fatal_exception.trace. The msg field comes from rsyslog extraction of the json payload prefixed by the @cee cookie in the syslog message.

Thu, May 9, 5:58 PM · Core Platform Team Kanban (Doing), wmerrors, Wikimedia-Logstash, MediaWiki-Logging, Operations, User-herron, MW-1.34-notes (1.34.0-wmf.5; 2019-05-14), Patch-For-Review, PHP 7.2 support, Core Platform Team (PHP7 (TEC4)), Performance-Team (Radar)

Wed, May 8

herron added a comment to T187147: Port mediawiki/php/wmerrors to PHP7 and deploy.

Comparing (in beta) a working mediawiki log message and a log message failing with max_bytes_length_exceeded_exception I'm noticing differences in json formatting as well. For example

Wed, May 8, 8:21 PM · Core Platform Team Kanban (Doing), wmerrors, Wikimedia-Logstash, MediaWiki-Logging, Operations, User-herron, MW-1.34-notes (1.34.0-wmf.5; 2019-05-14), Patch-For-Review, PHP 7.2 support, Core Platform Team (PHP7 (TEC4)), Performance-Team (Radar)
herron added a comment to T221969: Puppet catalog compiler - increasing max concurrent jobs.

@hashar while on the topic, is it possible for Jenkins to more evenly dispatch PCC jobs across the workers? Currently compiler1002 receives the bulk of the work and currently is at 95% disk full, while compiler1001 is at only 50% disk full.

Wed, May 8, 3:08 PM · Release-Engineering-Team (Kanban), puppet-compiler, Continuous-Integration-Infrastructure
herron triaged T222800: Requesting quota increase for 'puppet-diffs' project as Normal priority.
Wed, May 8, 3:04 PM · Operations, Cloud-VPS (Quota-requests), puppet-compiler
herron closed T221290: wiki-mail DKIM failing as Resolved.
Wed, May 8, 1:57 PM · Patch-For-Review, Traffic, Operations, DNS, Mail
herron added a comment to T221288: Phabricator SPF record contains internal addressing for phab[12]001.

Do those IPv6 addresses actually send any mail?

Wed, May 8, 1:21 PM · Patch-For-Review, Traffic, Operations, DNS, Mail
herron closed T221288: Phabricator SPF record contains internal addressing for phab[12]001 as Resolved.

Ready to resolve afaict!

Wed, May 8, 1:13 PM · Patch-For-Review, Traffic, Operations, DNS, Mail

Tue, May 7

herron updated subscribers of T187147: Port mediawiki/php/wmerrors to PHP7 and deploy.
Tue, May 7, 7:45 PM · Core Platform Team Kanban (Doing), wmerrors, Wikimedia-Logstash, MediaWiki-Logging, Operations, User-herron, MW-1.34-notes (1.34.0-wmf.5; 2019-05-14), Patch-For-Review, PHP 7.2 support, Core Platform Team (PHP7 (TEC4)), Performance-Team (Radar)
herron added a comment to T187147: Port mediawiki/php/wmerrors to PHP7 and deploy.

Seeing errors like this from logstash that appear related. This one specifically originated from logstash1007 /var/log/logstash/logstash-plain.log

Tue, May 7, 7:44 PM · Core Platform Team Kanban (Doing), wmerrors, Wikimedia-Logstash, MediaWiki-Logging, Operations, User-herron, MW-1.34-notes (1.34.0-wmf.5; 2019-05-14), Patch-For-Review, PHP 7.2 support, Core Platform Team (PHP7 (TEC4)), Performance-Team (Radar)

Wed, May 1

herron added a comment to T222072: compiler1002.puppet-diffs.eqiad.wmflabs disk is full.

On paper this use case also would lend itself to a filesystem with transparent compression. Maybe btrfs with compression. The data stored on disk is non-critical, and there are multiple worker nodes should issues arise with one filesystem.

Wed, May 1, 6:41 PM · Patch-For-Review, Operations, puppet-compiler, Jenkins
herron added a comment to T222072: compiler1002.puppet-diffs.eqiad.wmflabs disk is full.
Wed, May 1, 6:29 PM · Patch-For-Review, Operations, puppet-compiler, Jenkins
herron added a comment to T221290: wiki-mail DKIM failing.

Looking better after merging the above. From a password reminder mail:

Wed, May 1, 5:33 PM · Patch-For-Review, Traffic, Operations, DNS, Mail

Tue, Apr 30

herron triaged T222198: Gmail - Multiple destination domains per transaction is unsupported. Please try again. as Normal priority.
Tue, Apr 30, 4:19 PM · Patch-For-Review, Mail, Operations

Mon, Apr 29

herron added a project to T222075: Prevent puppet catalog compiler workers from running out of disk space: observability.
Mon, Apr 29, 3:10 PM · observability, User-herron, puppet-compiler, Operations
herron triaged T222075: Prevent puppet catalog compiler workers from running out of disk space as Normal priority.
Mon, Apr 29, 3:10 PM · observability, User-herron, puppet-compiler, Operations
herron closed T221990: LDAP access to the (nda) wmf group for sukhe as Resolved.

uid=sukhe,ou=people,dc=wikimedia,dc=org has been added to the NDA group. Please re-open if any follow up is needed. Thanks!

Mon, Apr 29, 2:52 PM · LDAP-Access-Requests

Fri, Apr 26

herron triaged T221529: Frequent puppet failures as Normal priority.
Fri, Apr 26, 10:44 PM · Puppet, puppet-compiler, Operations
herron triaged T221904: swift backend decomms / rebalances are noisy as Normal priority.
Fri, Apr 26, 10:44 PM · observability, media-storage, Operations
herron triaged T221939: Investigate use of hp-asrd on HPE servers as Normal priority.
Fri, Apr 26, 10:42 PM · cloud-services-team, Operations
herron triaged T221985: puppet-merge shouldn't fail if `tput` doesn't grok your terminal as Normal priority.
Fri, Apr 26, 10:40 PM · Puppet, Operations
herron added a comment to T189434: Fake email about @tools.wmflabs.org email.

(eg, it says it's to security@tools.wmflabs.org but I somehow got the email).

Fri, Apr 26, 3:23 PM · Mail, cloud-services-team
herron created T221969: Puppet catalog compiler - increasing max concurrent jobs.
Fri, Apr 26, 2:31 PM · Release-Engineering-Team (Kanban), puppet-compiler, Continuous-Integration-Infrastructure

Thu, Apr 25

herron added a comment to T116011: ferm: Log dropped packets.

Looking at cumin1001 I noticed that the log prefix at the end of the input chan is "fw-out-drop" and the output chain is empty with an accept policy. Is "out" indeed the direction in this case? Or would dropped packets logged by the input chain be considered "in"?

Thu, Apr 25, 5:54 PM · Patch-For-Review, Operations
herron added a comment to T220860: access for foks to labweb (in one way or another) (or make changePassword.php work on mwmaint hosts).

Since we're approaching two weeks on this request I've proposed the above patch to move forward using the existing deployment group and trust that caution will be exercised. Happy to see another approach implemented, but at the same time would like to unblock this individual access request.

Thu, Apr 25, 5:08 PM · Patch-For-Review, Operations, SRE-Access-Requests
herron added a comment to T221744: Add Progresslabs to WMF LDAP group for transparency report editing (allow 'nda' users to login on transparency-private).

Hello, I am not seeing an existing account with username Progresslabs. Could you please confirm that the account has already been created, and this is indeed the username? If you know what email was used I could try searching for that.

Thu, Apr 25, 4:24 PM · LDAP-Access-Requests

Wed, Apr 24

herron closed T221143: Kibana breaks during rolling upgrade as Resolved.

The Kibana lvs has been updated to use the source hash scheduler

Wed, Apr 24, 3:57 PM · Patch-For-Review, User-herron, Wikimedia-Logstash, Operations
herron added a comment to T220982: maps hosts have bad permissions under /srv/deployment.

Is there anything left to do before closing this?

Wed, Apr 24, 3:33 PM · Operations
herron closed T212640: logstash stuck on its persistent queue as Resolved.

I think it's safe to resolve this now since we're on logstash 5.6.15, and have disabled the logstash persistent queue.

Wed, Apr 24, 3:30 PM · Operations, Wikimedia-Logstash
herron added a parent task for T221529: Frequent puppet failures : T201247: Sporadic puppet failures.
Wed, Apr 24, 3:18 PM · Puppet, puppet-compiler, Operations
herron added a subtask for T201247: Sporadic puppet failures: T221529: Frequent puppet failures .
Wed, Apr 24, 3:18 PM · cloud-services-team (Kanban), Operations

Apr 23 2019

herron closed T221660: WMF LDAP Access for jfishback as Resolved.

jfishback has been added to the wmf ldap group. If any follow-up is needed please don't hesitate to re-open. Thanks!

Apr 23 2019, 4:35 PM · Patch-For-Review, LDAP-Access-Requests, Security-Team
herron closed T221660: WMF LDAP Access for jfishback, a subtask of T220517: Onboarding James Fishback to Security Team as Privacy Engineer (April 15th), as Resolved.
Apr 23 2019, 4:35 PM · Security-Team
herron closed T220565: offboard tilman bayer as Resolved.

Resolving as checklist in description has been completed

Apr 23 2019, 4:22 PM · Operations, SRE-Access-Requests
herron closed T220887: Allow Bryan Davis to downtime alerts in Icinga as Resolved.

Looks like this is complete, but if any follow up is needed please don't hesitate to re-open!

Apr 23 2019, 4:17 PM · Patch-For-Review, Operations, SRE-Access-Requests, observability
herron triaged T221507: Netbox report to validate network equipment data as Normal priority.
Apr 23 2019, 2:35 PM · Patch-For-Review, netbox, User-crusnov, Operations-Software-Development, Operations, netops
herron added a comment to T221529: Frequent puppet failures .

Here are related puppet agent and puppetmaster1001 apache logs from a sampling of hosts

Apr 23 2019, 2:33 PM · Puppet, puppet-compiler, Operations
herron closed T221450: Broken puppet in the 'logging' project as Resolved.

Deleted the remaining instances

Apr 23 2019, 1:59 PM · Operations
herron updated the task description for T221450: Broken puppet in the 'logging' project.
Apr 23 2019, 1:58 PM · Operations

Apr 19 2019

herron added a comment to T217359: Possibly expand Kafka main-{eqiad,codfw} clusters in Q4 2019..

Today we discussed desired hardware configs and expansion strategies during a meeting with @elukey @mobrovac @Ottomata and myself. Here are the outcomes:

Apr 19 2019, 6:58 PM · User-herron, Core Platform Team (Modern Event Platform (TEC2)), Core Platform Team Backlog (Watching / External), Services (watching), EventBus, Analytics, Operations
herron awarded T219430: Support targetting WMCS instances with the Jenkins puppet compiler a Party Time token.
Apr 19 2019, 6:31 PM · Patch-For-Review, Puppet-infrastructure-modernization, puppet-compiler, cloud-services-team (Kanban)

Apr 18 2019

herron added a comment to T221290: wiki-mail DKIM failing.

That's interesting, based on the headers in T221290#5123805 it looks like this issue goes back as far as 2015.

Apr 18 2019, 9:20 PM · Patch-For-Review, Traffic, Operations, DNS, Mail
herron added a comment to T221290: wiki-mail DKIM failing.

How did it work until now?

Apr 18 2019, 7:29 PM · Patch-For-Review, Traffic, Operations, DNS, Mail
herron updated subscribers of T221290: wiki-mail DKIM failing.

Indeed I'm able to produce a DKIM issue as well with wiki-mail. Here's an example (seen in headers of message triggered by account preferences change):

Apr 18 2019, 6:17 PM · Patch-For-Review, Traffic, Operations, DNS, Mail
herron added a comment to T141324: Look into shoving gerrit logs into logstash.
  • structured logging from log4j can be exposed in a number of ways. The easier is probably to continue logging to file, use syslog for transport, but have messages in a structured format. A json layout could be used for that (here is one from Jetbrains, which I haven't tested, but I tend to trust Jetbrains). This would allow to output all logging context and not just a few basic information.
Apr 18 2019, 3:52 PM · observability, Patch-For-Review, Release-Engineering-Team (Backlog), Technical-Debt, Wikimedia-Logstash, Gerrit

Apr 17 2019

herron added a comment to T216088: Mapping of servers to stakeholders.

To give a few real-world examples

Apr 17 2019, 4:51 PM · Operations
herron added a comment to T220987: Ferm: send ferm/iptables/ulogd logs to Kafaka/logstash/elasticsearch.

I'm for erring on the side of simplicity. Since these logs are useful on the command line of an individual host, on centrallog, and in Kibana, it makes sense to me to stream the ulogd syslogs formatted as shown in the description to logstash and parse them with a grok pattern.

Apr 17 2019, 3:09 PM · Patch-For-Review, Wikimedia-Logstash, Security, Operations

Apr 16 2019

herron added a project to T221143: Kibana breaks during rolling upgrade: User-herron.
Apr 16 2019, 8:45 PM · Patch-For-Review, User-herron, Wikimedia-Logstash, Operations
herron triaged T221143: Kibana breaks during rolling upgrade as Normal priority.
Apr 16 2019, 8:45 PM · Patch-For-Review, User-herron, Wikimedia-Logstash, Operations

Apr 15 2019

herron added a project to T220500: logstash1012 lock up caused central logging stuck: User-herron.
Apr 15 2019, 3:24 PM · User-herron, Wikimedia-Logstash, Operations
herron moved T220987: Ferm: send ferm/iptables/ulogd logs to Kafaka/logstash/elasticsearch from Backlog to Up next on the Wikimedia-Logstash board.
Apr 15 2019, 3:12 PM · Patch-For-Review, Wikimedia-Logstash, Security, Operations
herron added a project to T220987: Ferm: send ferm/iptables/ulogd logs to Kafaka/logstash/elasticsearch: Wikimedia-Logstash.
Apr 15 2019, 3:11 PM · Patch-For-Review, Wikimedia-Logstash, Security, Operations
herron added a comment to T217359: Possibly expand Kafka main-{eqiad,codfw} clusters in Q4 2019..

In the SRE spreadsheet I can see that the suggested replacement FY is 20/21, not the upcoming one.. Just adding the info, not sure if these servers are eligible or not for refresh before the 5y of usage.

Apr 15 2019, 2:58 PM · User-herron, Core Platform Team (Modern Event Platform (TEC2)), Core Platform Team Backlog (Watching / External), Services (watching), EventBus, Analytics, Operations
herron added a comment to T220987: Ferm: send ferm/iptables/ulogd logs to Kafaka/logstash/elasticsearch.

The intention is that ulogd logs from all servers will be sent to kafaka as such it would seem to make senses to move ::profile::rsyslog::kafka_shipper to the ::standard class

Apr 15 2019, 2:48 PM · Patch-For-Review, Wikimedia-Logstash, Security, Operations
herron added a comment to T220387: Transition Kafka main ownership from Analytics Engineering to SRE - (2018-2019 Q4 SRE Goal Tracking Task).

One thing that we didn't discuss for this goal is Zookeeper.

Apr 15 2019, 2:37 PM · User-herron, Operations

Apr 11 2019

herron closed T220711: Hostname settings on phab1001.eqiad.wmnet as Resolved.

Localhost is seen in the mail headers because the Phabricator application relays mail to a local Exim MTA via SMTP. The local MTA provides message queueing and outbound SMTP server failover. This was done intentionally to improve the reliability of email delivery from Phabricator.

Apr 11 2019, 4:52 PM · Mail, Phabricator
herron closed T193408: SPF record for canonical domains as Resolved.
Apr 11 2019, 3:34 PM · Patch-For-Review, Mail, Operations
herron updated the task description for T193408: SPF record for canonical domains.
Apr 11 2019, 3:34 PM · Patch-For-Review, Mail, Operations
herron closed T220412: Identify appropriate SPF record for domain wikimediafoundation.org as Resolved.

The below SPF record is now active

Apr 11 2019, 3:34 PM · Patch-For-Review, fundraising-tech-ops, Fundraising-Backlog, Mail, Operations
herron closed T220412: Identify appropriate SPF record for domain wikimediafoundation.org, a subtask of T193408: SPF record for canonical domains, as Resolved.
Apr 11 2019, 3:33 PM · Patch-For-Review, Mail, Operations

Apr 9 2019

herron created P8378 (An Untitled Masterwork).
Apr 9 2019, 8:06 PM
herron added a comment to T220412: Identify appropriate SPF record for domain wikimediafoundation.org.

As far as I know, fundraising does not send mail using this domain but only from wikimedia.org, so I don't think our mass mail contractor needs to be listed.

Apr 9 2019, 7:34 PM · Patch-For-Review, fundraising-tech-ops, Fundraising-Backlog, Mail, Operations

Apr 8 2019

herron triaged T220412: Identify appropriate SPF record for domain wikimediafoundation.org as Normal priority.
Apr 8 2019, 3:10 PM · Patch-For-Review, fundraising-tech-ops, Fundraising-Backlog, Mail, Operations
herron added subtasks for T220387: Transition Kafka main ownership from Analytics Engineering to SRE - (2018-2019 Q4 SRE Goal Tracking Task): T220391: Establish guideline documentation for Kafka cluster use cases (main, jumbo, logging, etc.), T220389: Review current architecture/capacity and establish plan for Kafka main cluster upgrade/refresh to cover needs for next 2-3 years, T220390: Audit existing Kafka main producers/consumers and document their configuration and use cases, T217359: Possibly expand Kafka main-{eqiad,codfw} clusters in Q4 2019..
Apr 8 2019, 2:07 PM · User-herron, Operations
herron added a parent task for T217359: Possibly expand Kafka main-{eqiad,codfw} clusters in Q4 2019.: T220387: Transition Kafka main ownership from Analytics Engineering to SRE - (2018-2019 Q4 SRE Goal Tracking Task).
Apr 8 2019, 2:07 PM · User-herron, Core Platform Team (Modern Event Platform (TEC2)), Core Platform Team Backlog (Watching / External), Services (watching), EventBus, Analytics, Operations
herron added a parent task for T220389: Review current architecture/capacity and establish plan for Kafka main cluster upgrade/refresh to cover needs for next 2-3 years: T220387: Transition Kafka main ownership from Analytics Engineering to SRE - (2018-2019 Q4 SRE Goal Tracking Task).
Apr 8 2019, 2:07 PM · Operations
herron added a parent task for T220390: Audit existing Kafka main producers/consumers and document their configuration and use cases: T220387: Transition Kafka main ownership from Analytics Engineering to SRE - (2018-2019 Q4 SRE Goal Tracking Task).
Apr 8 2019, 2:07 PM · Operations
herron added a parent task for T220391: Establish guideline documentation for Kafka cluster use cases (main, jumbo, logging, etc.): T220387: Transition Kafka main ownership from Analytics Engineering to SRE - (2018-2019 Q4 SRE Goal Tracking Task).
Apr 8 2019, 2:07 PM · Operations
herron updated the task description for T220387: Transition Kafka main ownership from Analytics Engineering to SRE - (2018-2019 Q4 SRE Goal Tracking Task).
Apr 8 2019, 2:06 PM · User-herron, Operations
herron created T220391: Establish guideline documentation for Kafka cluster use cases (main, jumbo, logging, etc.).
Apr 8 2019, 2:05 PM · Operations
herron created T220390: Audit existing Kafka main producers/consumers and document their configuration and use cases.
Apr 8 2019, 2:05 PM · Operations
herron created T220389: Review current architecture/capacity and establish plan for Kafka main cluster upgrade/refresh to cover needs for next 2-3 years.
Apr 8 2019, 2:05 PM · Operations
herron triaged T220387: Transition Kafka main ownership from Analytics Engineering to SRE - (2018-2019 Q4 SRE Goal Tracking Task) as Normal priority.
Apr 8 2019, 2:02 PM · User-herron, Operations
herron added a comment to T178575: Add require_package() variant with repository component to wmflib.

Since today we have a mix of package and require_package this would be very nice indeed. Does it need to be homegrown? Seems worthwhile to weigh the pros/cons of using native Puppet ordering as well.

Apr 8 2019, 1:55 PM · User-jijiki, Puppet, Operations

Mar 27 2019

herron updated the task description for T219430: Support targetting WMCS instances with the Jenkins puppet compiler.
Mar 27 2019, 6:40 PM · Patch-For-Review, Puppet-infrastructure-modernization, puppet-compiler, cloud-services-team (Kanban)