Page MenuHomePhabricator

herron (Keith Herron)
Ops Engineer

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
May 30 2017, 5:25 PM (95 w, 2 h)
Availability
Available
IRC Nick
herron
LDAP User
Herron
MediaWiki User
Unknown

Recent Activity

Wed, Mar 20

herron updated the task description for T213899: Migrate at least 3 existing Logstash inputs and associated producers to the new Kafka-logging pipeline, and remove the associated non-Kafka Logstash inputs.
Wed, Mar 20, 6:46 PM · Patch-For-Review, User-herron, Operations, Wikimedia-Logstash

Tue, Mar 19

herron added a comment to T218691: Remove elasticsearch icinga checks from logstash collectors.

Ah, thanks for clarifying! I agree we probably don't need the full suite of checks on the client nodes, but at the same time would like to make sure we continue monitoring elasticsearch client node health on the collectors since Logstash and Kibana depend on them.

Tue, Mar 19, 4:48 PM · Operations, Discovery-Search, Icinga, Elasticsearch, Wikimedia-Logstash
herron added a comment to T218691: Remove elasticsearch icinga checks from logstash collectors.

Why do you say the elasticsearch icinga checks are not needed on the logstash elasticsearch data/master nodes? Is the thinking to monitor cluster status from only the client nodes?

Tue, Mar 19, 4:11 PM · Operations, Discovery-Search, Icinga, Elasticsearch, Wikimedia-Logstash

Mon, Mar 18

herron added a comment to T217359: Possibly expand Kafka main-{eqiad,codfw} clusters in Q4 2019..

According to netbox support for hosts kafka[12]00[123] expired in Dec 2018. After discussing a bit with @Ottomata, a server refresh with higher-spec hardware would be a reasonable course of action to address both server age and capacity.

Mon, Mar 18, 2:21 PM · User-herron, Core Platform Team (Modern Event Platform (TEC2)), Core Platform Team Backlog (Watching / External), Services (watching), EventBus, Analytics, Operations

Fri, Mar 15

herron updated the task description for T213899: Migrate at least 3 existing Logstash inputs and associated producers to the new Kafka-logging pipeline, and remove the associated non-Kafka Logstash inputs.
Fri, Mar 15, 5:49 PM · Patch-For-Review, User-herron, Operations, Wikimedia-Logstash
herron added a comment to T206675: 1.33.0-wmf.21 deployment blockers.

I was not aware of that dashboard. Should it be added to the list of things train conductor should be monitoring?

https://wikitech.wikimedia.org/wiki/Heterogeneous_deployment/Train_deploys#Places_to_Watch_for_Breakage

Fri, Mar 15, 1:52 PM · MW-1.33-notes (1.33.0-wmf.21; 2019-03-12), Patch-For-Review, User-zeljkofilipin, Release-Engineering-Team (Kanban), Release, Train Deployments

Thu, Mar 14

herron added a comment to T206675: 1.33.0-wmf.21 deployment blockers.

There looks to be a significant increase (about 1.5 million in the past hour) of log messages from the mediawiki "deprecated" channel to the effect of Use of ParserOutput::getModuleScripts was deprecated in MediaWiki 1.33. Could we squelch these somehow?

Thu, Mar 14, 4:05 PM · MW-1.33-notes (1.33.0-wmf.21; 2019-03-12), Patch-For-Review, User-zeljkofilipin, Release-Engineering-Team (Kanban), Release, Train Deployments

Wed, Mar 13

herron added a project to T217359: Possibly expand Kafka main-{eqiad,codfw} clusters in Q4 2019.: User-herron.
Wed, Mar 13, 5:54 PM · User-herron, Core Platform Team (Modern Event Platform (TEC2)), Core Platform Team Backlog (Watching / External), Services (watching), EventBus, Analytics, Operations

Tue, Mar 12

herron updated the task description for T213899: Migrate at least 3 existing Logstash inputs and associated producers to the new Kafka-logging pipeline, and remove the associated non-Kafka Logstash inputs.
Tue, Mar 12, 5:50 PM · Patch-For-Review, User-herron, Operations, Wikimedia-Logstash

Thu, Mar 7

herron added a comment to T216088: Mapping of servers to stakeholders.

A pretty accurate list of stakeholders for a given host can be gleaned from the users, groups, and sudoers config deployed to it.

Thu, Mar 7, 5:43 PM · Operations

Tue, Mar 5

herron added a comment to T217679: Graphite returning server errors (out of memory?).

Looking here https://grafana.wikimedia.org/d/000000020/graphite-eqiad?refresh=1m&orgId=1&from=now-3h&to=now disk utilization has increased significantly

Tue, Mar 5, 5:14 PM · Patch-For-Review, Operations, Graphite
herron updated subscribers of T217679: Graphite returning server errors (out of memory?).
Tue, Mar 5, 5:13 PM · Patch-For-Review, Operations, Graphite

Mon, Mar 4

herron moved T216172: Set up basic email infra for w.wiki domain from Backlog to Up Next on the Mail board.
Mon, Mar 4, 8:29 PM · Operations, Mail
herron added a comment to T215611: MediaWiki errors overloading logstash.

The screenshot from Grafana does indicate that starting around 19:40 nearly 90% of MW events were dropped. By itself not conclusive that MW is the cause, merely that it is a victim.

Mon, Mar 4, 5:49 PM · Core Platform Team Kanban (Done with CPT), Core Platform Team (Security, stability, performance and scalability (TEC1)), Performance-Team, Wikimedia-production-error, Wikimedia-Logstash, Operations, MediaWiki-Database, monitoring
herron updated subscribers of T215611: MediaWiki errors overloading logstash.

Thanks @herron, I would like to know more information about what caused the extra logging, but I didn't find it on the incident report, do you know it, or know someone that does?

Mon, Mar 4, 5:00 PM · Core Platform Team Kanban (Done with CPT), Core Platform Team (Security, stability, performance and scalability (TEC1)), Performance-Team, Wikimedia-production-error, Wikimedia-Logstash, Operations, MediaWiki-Database, monitoring
herron added a project to T217142: [WIP] [Proposal] Use the Kafka-Logstash logging infrastructure to log client-side errors: User-herron.
Mon, Mar 4, 4:37 PM · User-herron, Reading-Infrastructure-Team-Backlog, Wikimedia-Logstash
herron moved T215611: MediaWiki errors overloading logstash from Backlog to In Dev/Progress on the Wikimedia-Logstash board.
Mon, Mar 4, 4:35 PM · Core Platform Team Kanban (Done with CPT), Core Platform Team (Security, stability, performance and scalability (TEC1)), Performance-Team, Wikimedia-production-error, Wikimedia-Logstash, Operations, MediaWiki-Database, monitoring
herron added a comment to T215611: MediaWiki errors overloading logstash.

Sadly this bit us again last week. Details outlined in https://wikitech.wikimedia.org/wiki/Incident_documentation/20190228-logstash

Mon, Mar 4, 4:28 PM · Core Platform Team Kanban (Done with CPT), Core Platform Team (Security, stability, performance and scalability (TEC1)), Performance-Team, Wikimedia-production-error, Wikimedia-Logstash, Operations, MediaWiki-Database, monitoring
herron updated the task description for T213899: Migrate at least 3 existing Logstash inputs and associated producers to the new Kafka-logging pipeline, and remove the associated non-Kafka Logstash inputs.
Mon, Mar 4, 3:00 PM · Patch-For-Review, User-herron, Operations, Wikimedia-Logstash
herron updated the task description for T213157: Increase utilization of application logging pipeline (FY2018-2019 Q3 TEC6).
Mon, Mar 4, 2:59 PM · User-fgiunchedi, User-herron, Operations, Wikimedia-Logstash
herron closed T213898: Replace and expand Elasticsearch/Kafka storage in eqiad and upgrade the cluster from Debian jessie to stretch as Resolved.

Service migration and OS upgrade work is complete with ES and Kafka services running from logstash101[012], and frontend VMs logstash100[789] upgraded to stretch.

Mon, Mar 4, 2:59 PM · Patch-For-Review, User-herron, Operations, Wikimedia-Logstash
herron closed T213898: Replace and expand Elasticsearch/Kafka storage in eqiad and upgrade the cluster from Debian jessie to stretch, a subtask of T213157: Increase utilization of application logging pipeline (FY2018-2019 Q3 TEC6), as Resolved.
Mon, Mar 4, 2:59 PM · User-fgiunchedi, User-herron, Operations, Wikimedia-Logstash
herron triaged T217556: Decommission old eqiad logstash hardware hosts logstash100[456] as Normal priority.
Mon, Mar 4, 2:56 PM · decommission, DC-Ops, ops-eqiad, User-herron, Operations, Wikimedia-Logstash
herron updated the task description for T213898: Replace and expand Elasticsearch/Kafka storage in eqiad and upgrade the cluster from Debian jessie to stretch.
Mon, Mar 4, 2:52 PM · Patch-For-Review, User-herron, Operations, Wikimedia-Logstash

Fri, Mar 1

herron updated the task description for T213898: Replace and expand Elasticsearch/Kafka storage in eqiad and upgrade the cluster from Debian jessie to stretch.
Fri, Mar 1, 10:37 PM · Patch-For-Review, User-herron, Operations, Wikimedia-Logstash
herron updated the task description for T213898: Replace and expand Elasticsearch/Kafka storage in eqiad and upgrade the cluster from Debian jessie to stretch.
Fri, Mar 1, 10:37 PM · Patch-For-Review, User-herron, Operations, Wikimedia-Logstash
herron added a comment to T212327: Beta Cluster mailer not sending emails.
2019-03-01 10:05:46 H=deployment-mediawiki-07.deployment-prep.eqiad.wmflabs [172.16.4.119]:35632 I=[172.16.4.120]:25 F=<wiki-enwiki-50n-pnnvxh-ujky0FUwA22N6N9J@beta.wmflabs.org> temporarily rejected RCPT <etonkovidova@wikimedia.org>: failed to bind the LDAP connection to server ldap-corp.codfw.wikimedia.org:389 - ldap_bind() returned -1

Basically our MX is trying to do the special routing for @wikimedia.org addresses (e.g., looking up against the mirror of the foundation's corp LDAP system to see if the user has a google inbox) that should only be done by a prod MX. Our one should just be sending it on to prod.

Fri, Mar 1, 2:29 PM · User-DannyS712, Cloud-VPS, Beta-Cluster-reproducible, Patch-For-Review, Mail

Thu, Feb 28

herron updated the task description for T213898: Replace and expand Elasticsearch/Kafka storage in eqiad and upgrade the cluster from Debian jessie to stretch.
Thu, Feb 28, 5:52 PM · Patch-For-Review, User-herron, Operations, Wikimedia-Logstash
herron renamed T213898: Replace and expand Elasticsearch/Kafka storage in eqiad and upgrade the cluster from Debian jessie to stretch from Replace and expand Elasticsearch storage in eqiad and upgrade the cluster from Debian jessie to stretch to Replace and expand Elasticsearch/Kafka storage in eqiad and upgrade the cluster from Debian jessie to stretch.
Thu, Feb 28, 5:52 PM · Patch-For-Review, User-herron, Operations, Wikimedia-Logstash

Wed, Feb 27

herron updated the task description for T213898: Replace and expand Elasticsearch/Kafka storage in eqiad and upgrade the cluster from Debian jessie to stretch.
Wed, Feb 27, 10:49 PM · Patch-For-Review, User-herron, Operations, Wikimedia-Logstash
herron updated the task description for T213898: Replace and expand Elasticsearch/Kafka storage in eqiad and upgrade the cluster from Debian jessie to stretch.
Wed, Feb 27, 10:49 PM · Patch-For-Review, User-herron, Operations, Wikimedia-Logstash
herron added a comment to T213898: Replace and expand Elasticsearch/Kafka storage in eqiad and upgrade the cluster from Debian jessie to stretch.

kafka service from logstash1004 has been migrated to logstash1010, and logstash1004 is now transitioned to spare::system.

Wed, Feb 27, 10:47 PM · Patch-For-Review, User-herron, Operations, Wikimedia-Logstash
herron updated the task description for T213898: Replace and expand Elasticsearch/Kafka storage in eqiad and upgrade the cluster from Debian jessie to stretch.
Wed, Feb 27, 3:23 PM · Patch-For-Review, User-herron, Operations, Wikimedia-Logstash

Mon, Feb 25

herron closed T214608: rack/setup/install logstash101[012].eqiad.wmnet as Resolved.
Mon, Feb 25, 3:45 PM · Patch-For-Review, Operations
herron added a comment to T214608: rack/setup/install logstash101[012].eqiad.wmnet.

Setup of new hosts is complete. Tracking follow up steps in T213898

Mon, Feb 25, 3:45 PM · Patch-For-Review, Operations
herron updated the task description for T213898: Replace and expand Elasticsearch/Kafka storage in eqiad and upgrade the cluster from Debian jessie to stretch.
Mon, Feb 25, 3:43 PM · Patch-For-Review, User-herron, Operations, Wikimedia-Logstash

Feb 22 2019

mmodell awarded T216714: gmail considers all Phabricator email to be spam due to missing SPF record a Orange Medal token.
Feb 22 2019, 7:27 PM · Patch-For-Review, Mail, Operations
LarsWirzenius awarded T216714: gmail considers all Phabricator email to be spam due to missing SPF record a Mountain of Wealth token.
Feb 22 2019, 2:59 PM · Patch-For-Review, Mail, Operations
herron closed T216714: gmail considers all Phabricator email to be spam due to missing SPF record as Resolved.

Looking much better now!

Feb 22 2019, 2:53 PM · Patch-For-Review, Mail, Operations

Feb 21 2019

herron closed T216513: compiler1002.puppet-diffs.eqiad.wmflabs instance is down as Resolved.

Test build succeeded https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/14773/console, I think we're in good shape now.

Feb 21 2019, 4:40 PM · Release-Engineering-Team (Kanban), Continuous-Integration-Infrastructure, puppet-compiler
herron reassigned T216513: compiler1002.puppet-diffs.eqiad.wmflabs instance is down from herron to hashar.

compiler1002 is ready to be re-enabled at your earliest convenience

Feb 21 2019, 4:33 PM · Release-Engineering-Team (Kanban), Continuous-Integration-Infrastructure, puppet-compiler
herron added a comment to T216714: gmail considers all Phabricator email to be spam due to missing SPF record.

Side question, wouldn't it be sufficient to just whitelist the mx1001 relay instead of each individual servers that might send emails?

Feb 21 2019, 4:11 PM · Patch-For-Review, Mail, Operations

Feb 19 2019

herron added a comment to T216513: compiler1002.puppet-diffs.eqiad.wmflabs instance is down.

That finished a bit faster than I was expecting! Ready to re-enable in the morning. And FWIW here's an example of a successful manual run https://puppet-compiler.wmflabs.org/compiler1002/2/.

Feb 19 2019, 9:49 PM · Release-Engineering-Team (Kanban), Continuous-Integration-Infrastructure, puppet-compiler
herron added a comment to T214608: rack/setup/install logstash101[012].eqiad.wmnet.

logstash101[0-2] have been added to the logging eqiad elasticsearch cluster, and data is now being relocated from the old logstash100[4-6] hosts onto logstash101[0-2]. This will to take some time to complete as there are several TB worth of shards to relocate.

Feb 19 2019, 9:20 PM · Patch-For-Review, Operations
herron added a comment to T216513: compiler1002.puppet-diffs.eqiad.wmflabs instance is down.

Compiler1002 is back online and successfully ran through a few local test catalog compiles. populate-puppetdb is running now so we should be in good shape to re-enable this host tomorrow morning. Will follow up when that completes.

Feb 19 2019, 8:52 PM · Release-Engineering-Team (Kanban), Continuous-Integration-Infrastructure, puppet-compiler
herron added a comment to T216513: compiler1002.puppet-diffs.eqiad.wmflabs instance is down.

I've re-created compiler1002 from scratch and am working to bring the puppet compiler service up on the host and validate a few builds locally. I estimate this will take until tonight (eastern time) or tomorrow morning since the local puppetdb takes a while to populate.

Feb 19 2019, 5:34 PM · Release-Engineering-Team (Kanban), Continuous-Integration-Infrastructure, puppet-compiler
herron closed T215251: Ban recurrent spam to Wikimedia mailing lists (January 2019) as Resolved.

Great! Glad to hear it. Resolving

Feb 19 2019, 3:44 PM · Patch-For-Review, User-herron, Wikimedia-Mailing-lists, Operations

Feb 15 2019

herron added a comment to T215251: Ban recurrent spam to Wikimedia mailing lists (January 2019).

Hi @MarcoAurelio, has this situation improved for you with the above patches merged?

Feb 15 2019, 6:53 PM · Patch-For-Review, User-herron, Wikimedia-Mailing-lists, Operations

Feb 7 2019

herron added a comment to T213899: Migrate at least 3 existing Logstash inputs and associated producers to the new Kafka-logging pipeline, and remove the associated non-Kafka Logstash inputs.

I've ran an audit on producers that sent logs through the least three inputs used over the last 24h (sorted by increasing volume by input)

Feb 7 2019, 9:06 PM · Patch-For-Review, User-herron, Operations, Wikimedia-Logstash

Feb 6 2019

herron added a comment to T215251: Ban recurrent spam to Wikimedia mailing lists (January 2019).

Progress! (I hope...) https://gerrit.wikimedia.org/r/488602 adds an acl to detect unknown/untrusted hosts who are attempting to issue a mail from command that contains our domain (lists.wikimedia.org in this case). I enabled this briefly in a warn-only mode on lists and it indeed flagged the same IP address from the pastes. From the lists exim log:

Feb 6 2019, 11:03 PM · Patch-For-Review, User-herron, Wikimedia-Mailing-lists, Operations
herron added a comment to T214608: rack/setup/install logstash101[012].eqiad.wmnet.

Hey @Cmjohnson, sending a friendly ping to see how these builds are going. If there's anything I can do to assist remotely just let me know.

Feb 6 2019, 9:01 PM · Patch-For-Review, Operations
herron added a comment to T116011: ferm: Log dropped packets.

While not necessarily optimal, it is possible to ingest a file with rsyslog. So, if left with no other option we may be able to ingest json this way for forwarding on. At the same time I think having a "human readable" and greppable log file on the host would be useful for quick troubleshooting. Maybe we could settle on more that one output? For example a plain syslog output to populate a greppable local /var/log/firewall.log (or similar) and the central log hosts, and a json file for structured logs that is passed onwards to logstash and friends.

Feb 6 2019, 4:46 PM · Patch-For-Review, Operations
herron closed T213416: Toolforge outbound root email in eqiad1 as Resolved.

After checking in with @aborrero and @Bstorm via IRC there isn't clear enough evidence of an issue to take action now.

Feb 6 2019, 4:23 PM · Patch-For-Review, cloud-services-team (Kanban)
herron added a comment to T215251: Ban recurrent spam to Wikimedia mailing lists (January 2019).

Sadly I'm seeing unexpected backscatter since merging https://gerrit.wikimedia.org/r/488022. Going to revert this for now while looking closer at the cause.

Feb 6 2019, 3:16 AM · Patch-For-Review, User-herron, Wikimedia-Mailing-lists, Operations

Feb 5 2019

herron moved T215251: Ban recurrent spam to Wikimedia mailing lists (January 2019) from Backlog to Working on on the User-herron board.
Feb 5 2019, 9:05 PM · Patch-For-Review, User-herron, Wikimedia-Mailing-lists, Operations
herron edited projects for T215251: Ban recurrent spam to Wikimedia mailing lists (January 2019), added: User-herron; removed Patch-For-Review.
Feb 5 2019, 9:04 PM · Patch-For-Review, User-herron, Wikimedia-Mailing-lists, Operations
herron triaged T215251: Ban recurrent spam to Wikimedia mailing lists (January 2019) as Normal priority.

Thanks for the patch! As mentioned in https://gerrit.wikimedia.org/r/488022 a reject rule is now in place based on this subject. But let's keep tuning this to reject based on multiple criteria and try to find a reliable long-term filter. Do you have one or more example messages with full headers that could be shared? FWIW I've seen a few instances of this on the ops list as well, but have already deleted the messages.

Feb 5 2019, 6:16 PM · Patch-For-Review, User-herron, Wikimedia-Mailing-lists, Operations

Jan 23 2019

herron added a comment to T214489: Improve LDAP logging.

http://www.openldap.org/doc/admin24/overlays.html#Password%20Policies (specifically sections 12.2 and 12.10) outline some possibilities for audit logging and password policy that could be useful here

Jan 23 2019, 5:10 PM · LDAP, Security-Team, Operations
herron triaged T214489: Improve LDAP logging as Normal priority.
Jan 23 2019, 4:43 PM · LDAP, Security-Team, Operations

Jan 22 2019

herron moved T213157: Increase utilization of application logging pipeline (FY2018-2019 Q3 TEC6) from Backlog to Working on on the User-herron board.
Jan 22 2019, 9:24 PM · User-fgiunchedi, User-herron, Operations, Wikimedia-Logstash
herron moved T213898: Replace and expand Elasticsearch/Kafka storage in eqiad and upgrade the cluster from Debian jessie to stretch from Backlog to Working on on the User-herron board.
Jan 22 2019, 9:24 PM · Patch-For-Review, User-herron, Operations, Wikimedia-Logstash
herron moved T213899: Migrate at least 3 existing Logstash inputs and associated producers to the new Kafka-logging pipeline, and remove the associated non-Kafka Logstash inputs from Backlog to Working on on the User-herron board.
Jan 22 2019, 9:23 PM · Patch-For-Review, User-herron, Operations, Wikimedia-Logstash
herron reopened Unknown Object (Task), a subtask of T213157: Increase utilization of application logging pipeline (FY2018-2019 Q3 TEC6), as Open.
Jan 22 2019, 3:03 PM · User-fgiunchedi, User-herron, Operations, Wikimedia-Logstash

Jan 11 2019

herron triaged T37611: Remove port 29418 from cloning process as Normal priority.
Jan 11 2019, 7:48 PM · serviceops, Developer-Advocacy, Operations, Gerrit
herron triaged T213506: Grafana alerting broken after upgrade to 5.0.0 as High priority.
Jan 11 2019, 7:47 PM · Core Platform Team Kanban (Done with CPT), monitoring, Patch-For-Review, Operations, Services (watching), WMF-JobQueue, TCB-Team
herron triaged T213527: Prepare our base system layer for Debian buster as Normal priority.
Jan 11 2019, 7:47 PM · Patch-For-Review, Operations
herron triaged T213546: Prepare puppet infrastructure for Debian buster as Normal priority.
Jan 11 2019, 7:46 PM · Patch-For-Review, Packaging, Puppet, Operations
herron moved T213569: add Greg Grossmeier to Phabricator admins group from Untriaged to SRE Meeting Review Required on the SRE-Access-Requests board.
Jan 11 2019, 6:35 PM · Operations, SRE-Access-Requests
herron added a comment to T213569: add Greg Grossmeier to Phabricator admins group.

Adding the usual checklist, even though it's nearly all done. Since this involves sudo privs it's been flagged for review/approval during the next SRE meeting which happens on Monday the 14th. The on-duty SRE will follow up after then. Thanks!

Jan 11 2019, 6:35 PM · Operations, SRE-Access-Requests
herron updated the task description for T213569: add Greg Grossmeier to Phabricator admins group.
Jan 11 2019, 6:34 PM · Operations, SRE-Access-Requests
herron added a comment to T213397: Request to be added to the ldap/wmde group.

Thanks for the update @noarave

Jan 11 2019, 5:17 PM · WMF-Legal, LDAP-Access-Requests, Operations, WMF-NDA-Requests

Jan 10 2019

herron triaged T213371: Document and possibly fine-tune how Proton interacts with Varnish as Normal priority.
Jan 10 2019, 10:24 PM · Readers-Web-Backlog (Tracking), Services (watching), serviceops, Traffic, Reading-Infrastructure-Team-Backlog, Operations, Proton
herron triaged T213475: Wikimedia varnish rules no longer exempt all Cloud VPS/Toolforge IPs from rate limits (HTTP 429 response) as Normal priority.
Jan 10 2019, 10:24 PM · Patch-For-Review, Toolforge, Traffic, Operations, Cloud-VPS
herron triaged T213305: upgrade prometheus-blazegraph-exporter to python3 as Normal priority.
Jan 10 2019, 10:23 PM · Patch-For-Review, Discovery-Search (Current work), monitoring, Operations, Wikidata-Query-Service, Wikidata
herron updated subscribers of T213366: [2 hrs] Decide on handling system updates for Proton.
Jan 10 2019, 10:22 PM · Reading-Infrastructure-Team-Backlog (Kanban), Security-Team, Operations, Proton
herron triaged T213366: [2 hrs] Decide on handling system updates for Proton as Normal priority.
Jan 10 2019, 10:21 PM · Reading-Infrastructure-Team-Backlog (Kanban), Security-Team, Operations, Proton
herron triaged T213191: Some queries causes wdqs-blazegraph on wdqs1006 to crash and restart as Normal priority.
Jan 10 2019, 10:18 PM · Wikidata, Wikidata-Query-Service, Operations
herron moved T213397: Request to be added to the ldap/wmde group from Backlog to NDA Pending on the LDAP-Access-Requests board.
Jan 10 2019, 9:34 PM · WMF-Legal, LDAP-Access-Requests, Operations, WMF-NDA-Requests
herron added a comment to T212946: Stream Thumbor logs to logstash.

Sure, sounds good!

Jan 10 2019, 8:41 PM · Wikimedia-Logstash, User-jijiki, serviceops, Operations, Thumbor
herron added a comment to T185222: lists.wikimedia.org reporting "You must GET the form before submitting it" for all list subscription attempts.

@Tomthirteen which OS and browser version did this occur on? Also is it reproducible using different browers, hosts, etc.? Thanks in advance!

Jan 10 2019, 7:15 PM · Operations, Wikimedia-Mailing-lists
herron triaged T185222: lists.wikimedia.org reporting "You must GET the form before submitting it" for all list subscription attempts as Normal priority.
Jan 10 2019, 5:59 PM · Operations, Wikimedia-Mailing-lists
herron triaged T213401: Create a cookbook to copy data between WDQS servers as Normal priority.
Jan 10 2019, 5:56 PM · Patch-For-Review, Discovery-Search (Current work), Operations-Software-Development, Wikidata-Query-Service, Wikidata, Operations
herron triaged T213397: Request to be added to the ldap/wmde group as Normal priority.
Jan 10 2019, 5:55 PM · WMF-Legal, LDAP-Access-Requests, Operations, WMF-NDA-Requests
herron triaged T213214: Visual Editor gets stuck opening article (net::ERR_SPDY_PROTOCOL_ERROR 200/Loading failed for the <script> with source ...) as Normal priority.
Jan 10 2019, 5:53 PM · User-Ryasmeen, Traffic, Wikimedia-Apache-configuration, Operations, VisualEditor
herron added a comment to T213416: Toolforge outbound root email in eqiad1.

Actually upon closer inspection I'm not understanding the issue with the current config. The production MX hosts accept mail for valid @wikimedia.org addresses regardless of the originating IP address (unless it is in a dnsbl). It is relay for other remote domains where the relay whitelist comes into play, and I had misunderstood the description thinking that tools-mail was attempting to use the prod MX as a smarthost relay for other domains. I also understand now that the current configuration is what I had described as option 3 in T213416#4869863.

Jan 10 2019, 5:38 PM · Patch-For-Review, cloud-services-team (Kanban)
herron triaged T213427: Add kchapman@wikimedia.org to performance-team@wikimedia.org as Normal priority.

Hi @kchapman, I wasn't able to find a mailman list with this name, nor an email server alias. As @Reedy suggests we'll need follow-up from Office-IT or a current list member in Performance-Team. I've added some tags, hopefully that will give the task enough visibility to move forward.

Jan 10 2019, 4:02 PM · Performance-Team, Office-IT, Operations
herron added a comment to T213416: Toolforge outbound root email in eqiad1.

we can't guarantee tools will be good citizens so there might be an impact even if our MX is following best practices. Tools authors may have to fix their code in that case.

Jan 10 2019, 3:19 PM · Patch-For-Review, cloud-services-team (Kanban)

Jan 9 2019

herron updated the task description for T213269: Requesting access to Citoid/Zotero production servers for MVOLZ.
Jan 9 2019, 6:02 PM · Operations, SRE-Access-Requests, Citoid
herron added a comment to T213269: Requesting access to Citoid/Zotero production servers for MVOLZ.

The zotero-admin group is defunct effectively. The groups should be citoid-admin, deployment and deploy-service.

Jan 9 2019, 6:02 PM · Operations, SRE-Access-Requests, Citoid
herron moved T213269: Requesting access to Citoid/Zotero production servers for MVOLZ from Untriaged to SRE Meeting Review Required on the SRE-Access-Requests board.
Jan 9 2019, 5:57 PM · SRE-Access-Requests, Operations, Citoid
herron closed T213015: Add krinkle to contint-docker group as Resolved.
Jan 9 2019, 5:56 PM · Patch-For-Review, Performance-Team (Radar), SRE-Access-Requests, Operations, Continuous-Integration-Infrastructure
herron added a comment to T213015: Add krinkle to contint-docker group.

Proceeding with this

Jan 9 2019, 4:36 PM · Patch-For-Review, Performance-Team (Radar), SRE-Access-Requests, Operations, Continuous-Integration-Infrastructure
herron triaged T213288: TEC6: Upgrade metrics monitoring infrastructure core components (Q3 2018/19 goal) as Normal priority.
Jan 9 2019, 4:32 PM · User-fgiunchedi, Goal, monitoring, Operations
herron triaged T213269: Requesting access to Citoid/Zotero production servers for MVOLZ as Normal priority.
Jan 9 2019, 2:58 PM · SRE-Access-Requests, Operations, Citoid
herron added a comment to T213269: Requesting access to Citoid/Zotero production servers for MVOLZ.

Great! It's would be fine to paste the public ssh key here in the task.

Jan 9 2019, 2:57 PM · SRE-Access-Requests, Operations, Citoid
herron updated the task description for T213269: Requesting access to Citoid/Zotero production servers for MVOLZ.
Jan 9 2019, 2:51 PM · SRE-Access-Requests, Operations, Citoid

Jan 8 2019

herron closed T212747: Create mailing list for Wikimedia's Google Code-in mentors as Resolved.

This list has been created an initial password emailed by the system to aklapper@wm. The list has been set to "confirm and approve" subscription mode, with archives set to private. With that said please double check the settings in the list admin interface to ensure they are as expected. Thanks!

Jan 8 2019, 8:37 PM · Operations, Wikimedia-Mailing-lists
herron moved T212957: Adminship of MediaWiki-India Mailing List from Backlog to List maintenance on the Wikimedia-Mailing-lists board.
Jan 8 2019, 8:32 PM · Operations, Wikimedia-Mailing-lists
herron closed T210223: Post hold because of "invalid headers" in wikimediacz-l as Resolved.

Please consider this a soft-close and reopen if any follow up is needed. Thanks!

Jan 8 2019, 8:30 PM · User-Urbanecm, Operations, Wikimedia-Mailing-lists
herron closed T212920: recovering wikimedia-mx mailing list password as Resolved.

Hello, this list password has been reset and the new value automatically sent to the owner by the system. Please don't hesitate to re-open if any follow up is needed. Thanks!

Jan 8 2019, 8:29 PM · Operations, Wikimedia-Mailing-lists