Page MenuHomePhabricator

jhathaway (Jesse Hathaway)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
Nov 22 2021, 10:00 PM (26 w, 2 d)
Availability
Available
LDAP User
JHathaway
MediaWiki User
JHathaway (WMF) [ Global Accounts ]

Recent Activity

Yesterday

jhathaway added a comment to T309237: Determine why Exim BDAT error messages are not counted as bounces by mtail.

Bounces relating to these 503s should/would originate from the sending mail system as opposed to the wikimedia mx since these errored out midway through the incoming SMTP session, and the connection was closed before the message was accepted into queue of the mx.

Wed, May 25, 8:51 PM · Sustainability (Incident Followup), Infrastructure-Foundations, Mail
jhathaway updated the task description for T309238: 2022-05-09 Exim BDAT Errors incident.
Wed, May 25, 7:25 PM · Infrastructure-Foundations, SRE, Mail, Wikimedia-Incident
jhathaway added a parent task for T307873: [mitigated] Google returning 503 error when delivering to mx1001 and mx2001: T309238: 2022-05-09 Exim BDAT Errors incident.
Wed, May 25, 7:22 PM · SRE, Mail, Infrastructure-Foundations
jhathaway added a subtask for T309238: 2022-05-09 Exim BDAT Errors incident: T307873: [mitigated] Google returning 503 error when delivering to mx1001 and mx2001.
Wed, May 25, 7:22 PM · Infrastructure-Foundations, SRE, Mail, Wikimedia-Incident
jhathaway added a subtask for T309238: 2022-05-09 Exim BDAT Errors incident: T309237: Determine why Exim BDAT error messages are not counted as bounces by mtail.
Wed, May 25, 7:21 PM · Infrastructure-Foundations, SRE, Mail, Wikimedia-Incident
jhathaway added a parent task for T309237: Determine why Exim BDAT error messages are not counted as bounces by mtail: T309238: 2022-05-09 Exim BDAT Errors incident.
Wed, May 25, 7:21 PM · Sustainability (Incident Followup), Infrastructure-Foundations, Mail
jhathaway created T309238: 2022-05-09 Exim BDAT Errors incident.
Wed, May 25, 7:21 PM · Infrastructure-Foundations, SRE, Mail, Wikimedia-Incident
jhathaway claimed T309237: Determine why Exim BDAT error messages are not counted as bounces by mtail.
Wed, May 25, 7:03 PM · Sustainability (Incident Followup), Infrastructure-Foundations, Mail
jhathaway updated the task description for T309237: Determine why Exim BDAT error messages are not counted as bounces by mtail.
Wed, May 25, 7:02 PM · Sustainability (Incident Followup), Infrastructure-Foundations, Mail
jhathaway created T309237: Determine why Exim BDAT error messages are not counted as bounces by mtail.
Wed, May 25, 7:01 PM · Sustainability (Incident Followup), Infrastructure-Foundations, Mail

Mon, May 9

jhathaway added a comment to T307873: [mitigated] Google returning 503 error when delivering to mx1001 and mx2001.
  • Please first remove the google servers from the callout cache, and you may also consider examining what caused callout failure on that host. It defintely shouldn't be there. An unexpected rejection may mess up google's SMTP side and may invalidate already seen chunking status, which would in turn invalidate BDATs. (In fact it might be our own server involved in the callout, if it was a wikimedia.org source!)
Mon, May 9, 8:54 PM · SRE, Mail, Infrastructure-Foundations
jhathaway added a comment to T307873: [mitigated] Google returning 503 error when delivering to mx1001 and mx2001.

First messages in the logs appeared on May 4th:

Mon, May 9, 3:17 PM · SRE, Mail, Infrastructure-Foundations
jhathaway added a comment to T307873: [mitigated] Google returning 503 error when delivering to mx1001 and mx2001.

@jbond I can't think of any recent changes that would have introduced this behavior. The boxes were rebooted on Friday to catch the latest kernel update. I'll start investigating.

Mon, May 9, 1:46 PM · SRE, Mail, Infrastructure-Foundations

Fri, May 6

jhathaway closed T307573: Request to add user fkaelin to analytics-platform-eng-admins group as Resolved.

Commit has been merged in, please reopen if there are any problems, thanks!

Fri, May 6, 3:24 PM · SRE, Generated Data Platform, SRE-Access-Requests

Thu, May 5

jhathaway updated the task description for T307573: Request to add user fkaelin to analytics-platform-eng-admins group.
Thu, May 5, 10:14 PM · SRE, Generated Data Platform, SRE-Access-Requests
jhathaway added a comment to T307737: Grant Access to PII in Superset for HMonroy and Dmaza.

Thanks @HMonroy from my read of T296161 it is ultimately the same as T283190, both add the user to the analytics-privatedata-users group. If you could use the template, twice, that would be appreciated, as I can then ensure the boxes are checked for both accounts.

Thu, May 5, 10:05 PM · SRE-Access-Requests, SRE, Community-Tech
jhathaway moved T307737: Grant Access to PII in Superset for HMonroy and Dmaza from Untriaged to Awaiting User Input on the SRE-Access-Requests board.
Thu, May 5, 8:07 PM · SRE-Access-Requests, SRE, Community-Tech
jhathaway claimed T307573: Request to add user fkaelin to analytics-platform-eng-admins group.
Thu, May 5, 8:05 PM · SRE, Generated Data Platform, SRE-Access-Requests
jhathaway claimed T307737: Grant Access to PII in Superset for HMonroy and Dmaza.
Thu, May 5, 8:04 PM · SRE-Access-Requests, SRE, Community-Tech
jhathaway added a comment to T307737: Grant Access to PII in Superset for HMonroy and Dmaza.

@HMonroy happy to help grant superset access, but I am not sure exactly how to do that? This ticket, T283190, appears similar, is that what group you all need? Also, would you kindly use the access template so as to keep these requests consistent, https://phabricator.wikimedia.org/maniphest/task/edit/form/8/

Thu, May 5, 8:04 PM · SRE-Access-Requests, SRE, Community-Tech
jhathaway moved T307573: Request to add user fkaelin to analytics-platform-eng-admins group from Untriaged to Awaiting User Input on the SRE-Access-Requests board.
Thu, May 5, 7:49 PM · SRE, Generated Data Platform, SRE-Access-Requests
jhathaway added a comment to T307573: Request to add user fkaelin to analytics-platform-eng-admins group.

@WDoranWMF patch cut, if you could explicitly approve as a comment that would be appreciated, though I take your authoring of the ticket as tacit approval.

Thu, May 5, 7:48 PM · SRE, Generated Data Platform, SRE-Access-Requests
jhathaway updated the task description for T307573: Request to add user fkaelin to analytics-platform-eng-admins group.
Thu, May 5, 7:44 PM · SRE, Generated Data Platform, SRE-Access-Requests
jhathaway closed T307582: Grant Access to ldap/wmf for Ariel Gutman as Resolved.

@AGutman-WMF you have been added to the wmf group, please reopen if there are any issues!

Thu, May 5, 7:17 PM · SRE, LDAP-Access-Requests
jhathaway added a member for WMF-NDA: AGutman-WMF.
Thu, May 5, 7:16 PM

Wed, May 4

jhathaway added a comment to T307573: Request to add user fkaelin to analytics-platform-eng-admins group.

@WDoranWMF happy to help on this access request. Would you be so kind as to update this ticket with the access request form details, https://phabricator.wikimedia.org/maniphest/task/edit/form/8/

Wed, May 4, 10:29 PM · SRE, Generated Data Platform, SRE-Access-Requests
jhathaway added a comment to T307582: Grant Access to ldap/wmf for Ariel Gutman.

@AGutman-WMF I assume you don't need shell access?

Wed, May 4, 9:34 PM · SRE, LDAP-Access-Requests

Tue, May 3

jhathaway added a comment to T303464: Disable GeoIP Legacy Download / Identify all users of legacy (v1) GeoIP datasets and inform them of the need to switch to GeoIP2 dataset.

@Dzahn that makes sense, so I assume it is okay that we also received a notice saying the invoice has lapsed, since these downloads are no longer needed?

Tue, May 3, 3:45 PM · Trust-and-Safety, serviceops, Traffic, SRE, Data-Engineering
jhathaway added a comment to T303464: Disable GeoIP Legacy Download / Identify all users of legacy (v1) GeoIP datasets and inform them of the need to switch to GeoIP2 dataset.

@Dzahn I mentioned over email, but I thought I would add a note here as well, we are still receiving alerts, last one was on May 2, stating that we are downloading the legacy database, is that expected?

Tue, May 3, 3:04 PM · Trust-and-Safety, serviceops, Traffic, SRE, Data-Engineering

Tue, Apr 26

jhathaway added a comment to T306860: Videoscalers fail health checks while CPU is maxed.

Another option would be to use cpu pinning via taskset(1), where ffmpeg is assigned to cpus 1-N and cpu 0 is left free to service health checks.

Tue, Apr 26, 8:12 PM · Sustainability (Incident Followup), WMF-JobQueue, serviceops, SRE

Apr 21 2022

jhathaway claimed T305567: MX: increasing disk space.
Apr 21 2022, 2:16 PM · Mail, SRE, Infrastructure-Foundations
jhathaway triaged T305567: MX: increasing disk space as Medium priority.
Apr 21 2022, 2:16 PM · Mail, SRE, Infrastructure-Foundations

Apr 18 2022

jhathaway added a comment to T211750: Introduce Python code formatters usage.

thanks @Volans for the additional detail and I am happy to see that folks have been persistently chipping away at some of these blockers.

Apr 18 2022, 6:46 PM · Infrastructure-Foundations, User-Kormat, tox-wikimedia, Patch-For-Review, SRE, SRE-tools

Apr 14 2022

jhathaway added a comment to T211750: Introduce Python code formatters usage.

Although there are no doubt that an automatic formatter is of great help, there are also a bunch of issues to take into account, for example:

Apr 14 2022, 9:02 PM · Infrastructure-Foundations, User-Kormat, tox-wikimedia, Patch-For-Review, SRE, SRE-tools
jhathaway added a comment to T305676: Validate all yaml files in puppet.git.

I think having a syntax validity check would be a great first start. I think using yamllint, a ruby script or a short python script would work well:

Apr 14 2022, 8:47 PM · Puppet, SRE, Infrastructure-Foundations
jhathaway added a comment to T294564: Migrate Foundations Prometheus alerts to AlertManager.

+1 to exclude list, thank you for digging out the root cause. Just as an historical/contextual note I can't remember at the minute why we went with our implementation of smartmon.py (it is possible it wasn't available at the time though). Having said that, nowadays it might make sense to move to upstream' smartmon (100% out of scope for this task, but putting it out there)

Apr 14 2022, 7:24 PM · Observability-Alerting

Apr 13 2022

jhathaway closed T305962: Exim emitting warnings about tainted filenames as Resolved.

merged!

Apr 13 2022, 7:00 PM · SRE, Mail, Infrastructure-Foundations
jhathaway added a comment to T211750: Introduce Python code formatters usage.

Are we ready to consider running black on our puppet repo?

Apr 13 2022, 6:58 PM · Infrastructure-Foundations, User-Kormat, tox-wikimedia, Patch-For-Review, SRE, SRE-tools

Apr 12 2022

jhathaway added a comment to T305962: Exim emitting warnings about tainted filenames.

Mailing list discussion, https://www.mail-archive.com/exim-users@exim.org/msg57122.html

Apr 12 2022, 7:28 PM · SRE, Mail, Infrastructure-Foundations
jhathaway created T305962: Exim emitting warnings about tainted filenames.
Apr 12 2022, 4:16 PM · SRE, Mail, Infrastructure-Foundations
jhathaway closed T280472: Figure out if we can remove legacy domain support for mailing lists as Resolved.

Emails are now being rejected, gmail presents rejections like this:

2022-04-11-162926_1181x740_scrot.png (740×1 px, 74 KB)

Apr 12 2022, 2:32 PM · SRE, Wikimedia-Mailing-lists

Apr 8 2022

jhathaway added a comment to T294564: Migrate Foundations Prometheus alerts to AlertManager.

@jhathaway I saw the alert firing today, looks like it is working as expected so that's great! I believe the old icinga alert can be removed now (i.e. both are firing e.g. for aqs1007 ATM)

Apr 8 2022, 7:49 PM · Observability-Alerting
jhathaway added a comment to T305567: MX: increasing disk space.

ok, thanks, I'll rotate it manually and plan on embiggening the existing
hosts.

Apr 8 2022, 1:41 PM · Mail, SRE, Infrastructure-Foundations

Apr 6 2022

jhathaway added a comment to T305567: MX: increasing disk space.

I rotated the log file and then compressed it on another host for this specific incident, but it was cumbersome. I think we should definitely embiggen the disks for the new Postfix based hosts. I am less sure if it is worth the effort for these hosts.

Apr 6 2022, 9:33 PM · Mail, SRE, Infrastructure-Foundations

Mar 31 2022

jhathaway claimed T280472: Figure out if we can remove legacy domain support for mailing lists.
Mar 31 2022, 7:10 PM · SRE, Wikimedia-Mailing-lists
jhathaway added a comment to T280472: Figure out if we can remove legacy domain support for mailing lists.

I did some quick analysis and it appears like the vast majority of traffic to the old addresses is spam. As an example, for the messages sent on March 29th, it appears all of them were spam:

Mar 31 2022, 7:10 PM · SRE, Wikimedia-Mailing-lists

Mar 29 2022

jhathaway added a comment to T304965: Security Issue Access Request for jhathaway.

@Dsharpe thanks!

Mar 29 2022, 7:07 PM · SecTeam-Processed, Security-Team, Security
jhathaway created T304965: Security Issue Access Request for jhathaway.
Mar 29 2022, 3:36 PM · SecTeam-Processed, Security-Team, Security

Mar 25 2022

jhathaway claimed T236954: Hieradata yaml style checking.
Mar 25 2022, 2:28 PM · Infrastructure-Foundations, Patch-For-Review, Puppet, SRE, User-jbond

Mar 16 2022

jhathaway closed T302423: Where to Put Community Modules? as Resolved.

Community modules have now been moved to vendor_modules, thanks everyone for the discussion & feedback.

Mar 16 2022, 4:53 PM · Patch-For-Review, Puppet, Infrastructure-Foundations
jhathaway closed T302423: Where to Put Community Modules?, a subtask of T265138: Work required to prepare for puppet 6, as Resolved.
Mar 16 2022, 4:52 PM · Infrastructure-Foundations, Patch-For-Review, User-jbond, SRE, Puppet

Mar 14 2022

jhathaway added a comment to T302423: Where to Put Community Modules?.

There seems to be some coalescing around moving vendored modules into their own directory, here is a patch that does just that, feedback very much appreciated, https://gerrit.wikimedia.org/r/770099

Mar 14 2022, 3:27 AM · Patch-For-Review, Puppet, Infrastructure-Foundations

Mar 10 2022

jhathaway closed T286898: Setup new mirror server (mirror1001.wikimedia.org) as Resolved.
Mar 10 2022, 5:42 PM · Infrastructure-Foundations, SRE
jhathaway added a comment to T297906: Change physical label from copernicum.wikimedia.org to mirror1001.wikimedia.org.

thanks!

Mar 10 2022, 5:41 PM · ops-eqiad, Infrastructure-Foundations, DC-Ops
jhathaway claimed T302423: Where to Put Community Modules?.
Mar 10 2022, 5:18 PM · Patch-For-Review, Puppet, Infrastructure-Foundations
jhathaway claimed T302639: How should we monitor for faulty memory modules?.
Mar 10 2022, 5:18 PM · SRE Observability, Infrastructure-Foundations
jhathaway closed T299107: mx1001.wikimedia.org mail delivery timeouts as Resolved.

We are no longer seeing the timeouts after setting the sysctl net.ipv4.tcp_fastopen_blackhole_timeout_sec sysctl to 3600 which restores the setting to the same value prior to kernel
version v5.10.54.

Mar 10 2022, 5:09 PM · Infrastructure-Foundations, Mail, SRE
jhathaway closed T299107: mx1001.wikimedia.org mail delivery timeouts, a subtask of T297127: Incident: 2021-12-03 mx2001->Gmail delivery issues, as Resolved.
Mar 10 2022, 5:09 PM · SRE-OnFire (FY2021/2022-Q2), Sustainability (Incident Followup), SRE
jhathaway closed T298110: Provide an easier way to drop spam mail as Resolved.
Mar 10 2022, 5:07 PM · Infrastructure-Foundations, Mail
jhathaway closed T298727: decom sodium, a subtask of T286898: Setup new mirror server (mirror1001.wikimedia.org), as Resolved.
Mar 10 2022, 5:06 PM · Infrastructure-Foundations, SRE
jhathaway closed T298727: decom sodium as Resolved.
Mar 10 2022, 5:06 PM · Infrastructure-Foundations, SRE
jhathaway added a project to T297906: Change physical label from copernicum.wikimedia.org to mirror1001.wikimedia.org: ops-eqiad.
Mar 10 2022, 5:04 PM · ops-eqiad, Infrastructure-Foundations, DC-Ops
Legoktm awarded T300985: mirrors.wikimedia.org debian repository fails to serve packages from time to time a Orange Medal token.
Mar 10 2022, 4:09 PM · SRE
aborrero awarded T300985: mirrors.wikimedia.org debian repository fails to serve packages from time to time a Like token.
Mar 10 2022, 10:33 AM · SRE

Mar 9 2022

jhathaway closed T300985: mirrors.wikimedia.org debian repository fails to serve packages from time to time as Resolved.

@aborrero the mirrors server has now been switched to apache2 and I am unable to reproduce the error with my tests. Please reopen if you experience the issue again, thanks!

Mar 9 2022, 9:37 PM · SRE

Mar 1 2022

jhathaway added a comment to T302423: Where to Put Community Modules?.

Based on the discussion so far my inclination is that we stick with our current method of vendoring Community modules in ./modules. Though not a perfect solution, it seems to have worked well for us and the downsides are small. My personal experience with a similar setup mirrors the foundation's experience. As @akosiaris mentioned git submodules for just Community modules could be an interesting route as well, but given the scars of the last submodules experience I don't think the upsides are worth exploring at this time.

Mar 1 2022, 4:22 PM · Patch-For-Review, Puppet, Infrastructure-Foundations
jhathaway added a comment to T302423: Where to Put Community Modules?.

On a side note, I see there is a proposal of using /vendor/modules. It seems interesting and I 've never tried it, I am wondering what technical hurdles we 'd meet. Any ideas?

Mar 1 2022, 3:57 PM · Patch-For-Review, Puppet, Infrastructure-Foundations
jhathaway closed T302687: Don't move filesystem_avail_bigger_than_size icinga check to alert manager? as Resolved.
Mar 1 2022, 3:11 PM · Patch-For-Review, Observability-Alerting
jhathaway closed T302687: Don't move filesystem_avail_bigger_than_size icinga check to alert manager?, a subtask of T294564: Migrate Foundations Prometheus alerts to AlertManager, as Resolved.
Mar 1 2022, 3:10 PM · Observability-Alerting

Feb 28 2022

jhathaway added a comment to T302423: Where to Put Community Modules?.

@CDanis I had not, here is my attempt at a comparison between git submodules and subtrees

Feb 28 2022, 5:57 PM · Patch-For-Review, Puppet, Infrastructure-Foundations
jhathaway added a comment to T302687: Don't move filesystem_avail_bigger_than_size icinga check to alert manager?.

Thank you for digging up the details/history for this! I'm +1 on leaving the check in icinga, and possibly behind a conditional based on the distribution

Feb 28 2022, 2:21 PM · Patch-For-Review, Observability-Alerting

Feb 27 2022

jhathaway created T302687: Don't move filesystem_avail_bigger_than_size icinga check to alert manager?.
Feb 27 2022, 11:49 PM · Patch-For-Review, Observability-Alerting
jhathaway added a subtask for T294564: Migrate Foundations Prometheus alerts to AlertManager: T302639: How should we monitor for faulty memory modules?.
Feb 27 2022, 11:25 PM · Observability-Alerting
jhathaway added a parent task for T302639: How should we monitor for faulty memory modules?: T294564: Migrate Foundations Prometheus alerts to AlertManager.
Feb 27 2022, 11:25 PM · SRE Observability, Infrastructure-Foundations

Feb 25 2022

jhathaway created T302639: How should we monitor for faulty memory modules?.
Feb 25 2022, 10:25 PM · SRE Observability, Infrastructure-Foundations

Feb 24 2022

jhathaway added a comment to T302481: Where to put puppetlabs Core Modules.

As all the packages we need have already been packaged by debian, my view is we just go with the debian packages and close this ticket down.

Feb 24 2022, 3:31 PM · Infrastructure-Foundations, SRE, Puppet

Feb 23 2022

jhathaway added a comment to T302423: Where to Put Community Modules?.

Before commenting i would say that in my mind we have four types types of modules

  • in house modules
  • third-party modules
  • built in types which are all the types included in the puppet source tree.
  • core types which are types that puppet labs packages with the puppet-agent and labeled puppetlabs-core-$foo on puppet forge

We can ignore the built in types as they will be shipped with the puppet agent code

The puppet core types are modules that use to be part of the puppet source tree but got split out when puppet 6 was released. puppetlabs upstream package theses modules are part of the puppet-agent package, however it seems that Debian will split theses out as separate packages. Even though they are separate repos in puppet labs the fact that puppet labs have decided to bundle them with the puppet agent makes me think we should follow suite and package theses possibly even try to keep parity with the versions that puppetlabs ship in there puppet-agent packages. Further i have never needed to submit a patch to theses core types and would even argue that any such changes should be scrutinised by both us and upstream and we shouldn't allow users to so easily change something that is stable. To keep context currently this set would only include i think cron and mailaises

Feb 23 2022, 8:24 PM · Patch-For-Review, Puppet, Infrastructure-Foundations
jhathaway added a comment to T302423: Where to Put Community Modules?.

To start with I would just like to add a bit of info that we have a history of using git submodules inside the puppet repo and not liking them and then moving away from them again, which was kind of a bigger deal. So maybe not that one.

Feb 23 2022, 6:45 PM · Patch-For-Review, Puppet, Infrastructure-Foundations
jhathaway added a comment to T302423: Where to Put Community Modules?.

Feel free to edit the description and add advantages and disadvantages.

Feb 23 2022, 5:45 PM · Patch-For-Review, Puppet, Infrastructure-Foundations
jhathaway created T302423: Where to Put Community Modules?.
Feb 23 2022, 5:43 PM · Patch-For-Review, Puppet, Infrastructure-Foundations
jhathaway added a comment to T302372: prometheus-statsd-exporter failure to start due to invalid yaml config.

@fgiunchedi very sorry about the breakage, I wish I would have caught that in the review.

Feb 23 2022, 2:53 PM · SRE Observability, Infrastructure-Foundations, Puppet

Feb 9 2022

jhathaway added a comment to T300985: mirrors.wikimedia.org debian repository fails to serve packages from time to time.

Good catch! It seems a little mysterious though that this problem isn't more widely reported, given that nginx is a popular HTTP server and apt a very common client for a deb repo. Even if this would get fixed on the client side, it would trickle in too slow into older releases to be effective), so serving mirrors via Apache seems like the adequate workaround. It might still be worth looping in the nginx mailing list for upstream's assessment, if only to make them aware that there's a popular broken client not working with a stock (given how simple our mirror nginx site is) nginx setup?

Feb 9 2022, 5:52 PM · SRE

Feb 8 2022

jhathaway added a comment to T293198: Alerts "instance" label and port number.

Would it be worth asking a version of this question on the prometheus mailing list? It seems like folks there have asked similar questions and have often received helpful replies:

Feb 8 2022, 9:10 PM · Observability-Alerting, User-fgiunchedi
jhathaway added a comment to T293198: Alerts "instance" label and port number.

AIUI option 1 is not really an option because we would no longer be able to distinguish co-hosted versions of the same software and that is something we do/need. I think option 2 is out because of the same reason option 1 is, option 4 (patching Karma) does not look promising with the response from upstream and 5 needs a bunch of changes without us actually needing the host label anywhere else than in Karma (if I'm not misunderstanding).

Feb 8 2022, 9:01 PM · Observability-Alerting, User-fgiunchedi
jhathaway added a comment to T300985: mirrors.wikimedia.org debian repository fails to serve packages from time to time.

Two things/tests here which came to my mind:

Feb 8 2022, 7:30 PM · SRE
jhathaway added a comment to T300985: mirrors.wikimedia.org debian repository fails to serve packages from time to time.

I was able to confirm that the problem is due to https://salsa.debian.org/apt-team/apt/-/commit/fa375493c5a4ed9c10d4e5257ac82c6e687862d3 which involves HTTP/1.1 pipelining, as mentioned in this bug report, https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=973581. I was also able to confirm that this problem is specific to nginx. My proposal is to switch our mirror to apache2 until the bug is resolved.

Feb 8 2022, 7:26 PM · SRE

Feb 4 2022

jhathaway added a comment to T300985: mirrors.wikimedia.org debian repository fails to serve packages from time to time.

I can reliably produce the issue, with the following script running on sretest1002:

Feb 4 2022, 11:42 PM · SRE
jhathaway claimed T300985: mirrors.wikimedia.org debian repository fails to serve packages from time to time.
Feb 4 2022, 5:14 PM · SRE

Feb 3 2022

jhathaway added a comment to T293198: Alerts "instance" label and port number.

@fgiunchedi for number (1) is the stripping option all or none? You mention it might not be wanted in an instance where a host has multiple instances of the same service. Would it be possible to choose when we change the instance label. For instance on physical hosts we change it to the hostname, whereas for kubernetes containers we change it to the pod name. This way the instance label always refers to the unique identity we want to monitor.

Feb 3 2022, 9:47 PM · Observability-Alerting, User-fgiunchedi

Feb 2 2022

jhathaway added a comment to T286898: Setup new mirror server (mirror1001.wikimedia.org).
Feb 2 2022, 4:37 PM · Infrastructure-Foundations, SRE

Jan 31 2022

jhathaway closed T299919: Access to analytics-privatedata-users for Research intern AniketArs as Resolved.

great, marking as resolved, please reopen if you discover any new issues.

Jan 31 2022, 6:01 PM · SRE, Research, SRE-Access-Requests
jhathaway added a comment to T299919: Access to analytics-privatedata-users for Research intern AniketArs.

@Miriam & @AniketArs they were not part of the nda group, they are added now, please try again.

Jan 31 2022, 4:26 PM · SRE, Research, SRE-Access-Requests
jhathaway updated the task description for T300383: Requesting access to Analytics Private Data Users for Tanja Andic.
Jan 31 2022, 2:49 PM · SRE, SRE-Access-Requests

Jan 28 2022

jhathaway added a comment to T293198: Alerts "instance" label and port number.

This is my first experience with a prometheus setup, so please take all my
suggestions with a grain of salt :).

Jan 28 2022, 10:29 PM · Observability-Alerting, User-fgiunchedi
jhathaway added a comment to T300383: Requesting access to Analytics Private Data Users for Tanja Andic.

@TAndic your account and kerberos credentials should be setup, please lookout for an email. Let me know if everything works!

Jan 28 2022, 8:49 PM · SRE, SRE-Access-Requests
jhathaway moved T299919: Access to analytics-privatedata-users for Research intern AniketArs from Ready To Go to Awaiting User Input on the SRE-Access-Requests board.
Jan 28 2022, 7:24 PM · SRE, Research, SRE-Access-Requests
jhathaway moved T300383: Requesting access to Analytics Private Data Users for Tanja Andic from Untriaged to Awaiting User Input on the SRE-Access-Requests board.
Jan 28 2022, 7:24 PM · SRE, SRE-Access-Requests
jhathaway updated subscribers of T300383: Requesting access to Analytics Private Data Users for Tanja Andic.

@JAnstee_WMF & @Ottomata please approve

Jan 28 2022, 5:34 PM · SRE, SRE-Access-Requests
jhathaway updated the task description for T300383: Requesting access to Analytics Private Data Users for Tanja Andic.
Jan 28 2022, 5:30 PM · SRE, SRE-Access-Requests
jhathaway added a comment to T299919: Access to analytics-privatedata-users for Research intern AniketArs.

@AniketArs change has been submitted, including your kerberos access, please give it a try!

Jan 28 2022, 5:19 PM · SRE, Research, SRE-Access-Requests