@akosiaris thanks for the link to the sql exporter, I'll try playing around with one and see what type of metrics I can expose.
Thu, Jun 16
This is stalled until 4.96 is available in Debian.
This has now been fixed upstream, https://git.exim.org/exim.git/commit/462e2cd30. We will keep chunking disabled until we upgrade to 4.96 which will include the fix.
Tue, Jun 7
Mon, Jun 6
Jun 2 2022
May 26 2022
@herron I did a little log analysis and from what I can tell the errors from this incident were drowned out by the continual errors we get that match the mtail regex. I am not sure we can improve our detection of this type of event by tweaking these rules, so I am going to close this ticket for now, as we at least understand the cause.
I sent a message to the Exim mailing list, https://email@example.com/msg57216.html.
May 25 2022
Bounces relating to these 503s should/would originate from the sending mail system as opposed to the wikimedia mx since these errored out midway through the incoming SMTP session, and the connection was closed before the message was accepted into queue of the mx.
May 9 2022
- Please first remove the google servers from the callout cache, and you may also consider examining what caused callout failure on that host. It defintely shouldn't be there. An unexpected rejection may mess up google's SMTP side and may invalidate already seen chunking status, which would in turn invalidate BDATs. (In fact it might be our own server involved in the callout, if it was a wikimedia.org source!)
First messages in the logs appeared on May 4th:
@jbond I can't think of any recent changes that would have introduced this behavior. The boxes were rebooted on Friday to catch the latest kernel update. I'll start investigating.
May 6 2022
Commit has been merged in, please reopen if there are any problems, thanks!
May 5 2022
Thanks @HMonroy from my read of T296161 it is ultimately the same as T283190, both add the user to the analytics-privatedata-users group. If you could use the template, twice, that would be appreciated, as I can then ensure the boxes are checked for both accounts.
@HMonroy happy to help grant superset access, but I am not sure exactly how to do that? This ticket, T283190, appears similar, is that what group you all need? Also, would you kindly use the access template so as to keep these requests consistent, https://phabricator.wikimedia.org/maniphest/task/edit/form/8/
@WDoranWMF patch cut, if you could explicitly approve as a comment that would be appreciated, though I take your authoring of the ticket as tacit approval.
@AGutman-WMF you have been added to the wmf group, please reopen if there are any issues!
May 4 2022
@WDoranWMF happy to help on this access request. Would you be so kind as to update this ticket with the access request form details, https://phabricator.wikimedia.org/maniphest/task/edit/form/8/
@AGutman-WMF I assume you don't need shell access?
May 3 2022
@Dzahn that makes sense, so I assume it is okay that we also received a notice saying the invoice has lapsed, since these downloads are no longer needed?
@Dzahn I mentioned over email, but I thought I would add a note here as well, we are still receiving alerts, last one was on May 2, stating that we are downloading the legacy database, is that expected?
Apr 26 2022
Another option would be to use cpu pinning via taskset(1), where ffmpeg is assigned to cpus 1-N and cpu 0 is left free to service health checks.
Apr 21 2022
Apr 18 2022
thanks @Volans for the additional detail and I am happy to see that folks have been persistently chipping away at some of these blockers.
Apr 14 2022
Although there are no doubt that an automatic formatter is of great help, there are also a bunch of issues to take into account, for example:
I think having a syntax validity check would be a great first start. I think using yamllint, a ruby script or a short python script would work well:
+1 to exclude list, thank you for digging out the root cause. Just as an historical/contextual note I can't remember at the minute why we went with our implementation of smartmon.py (it is possible it wasn't available at the time though). Having said that, nowadays it might make sense to move to upstream' smartmon (100% out of scope for this task, but putting it out there)
Apr 13 2022
Are we ready to consider running black on our puppet repo?
Apr 12 2022
Mailing list discussion, https://firstname.lastname@example.org/msg57122.html
Apr 8 2022
@jhathaway I saw the alert firing today, looks like it is working as expected so that's great! I believe the old icinga alert can be removed now (i.e. both are firing e.g. for aqs1007 ATM)
ok, thanks, I'll rotate it manually and plan on embiggening the existing
Apr 6 2022
I rotated the log file and then compressed it on another host for this specific incident, but it was cumbersome. I think we should definitely embiggen the disks for the new Postfix based hosts. I am less sure if it is worth the effort for these hosts.
Mar 31 2022
I did some quick analysis and it appears like the vast majority of traffic to the old addresses is spam. As an example, for the messages sent on March 29th, it appears all of them were spam:
Mar 29 2022
Mar 25 2022
Mar 16 2022
Community modules have now been moved to vendor_modules, thanks everyone for the discussion & feedback.
Mar 14 2022
There seems to be some coalescing around moving vendored modules into their own directory, here is a patch that does just that, feedback very much appreciated, https://gerrit.wikimedia.org/r/770099
Mar 10 2022
We are no longer seeing the timeouts after setting the sysctl net.ipv4.tcp_fastopen_blackhole_timeout_sec sysctl to 3600 which restores the setting to the same value prior to kernel
Mar 9 2022
@aborrero the mirrors server has now been switched to apache2 and I am unable to reproduce the error with my tests. Please reopen if you experience the issue again, thanks!
Mar 1 2022
Based on the discussion so far my inclination is that we stick with our current method of vendoring Community modules in ./modules. Though not a perfect solution, it seems to have worked well for us and the downsides are small. My personal experience with a similar setup mirrors the foundation's experience. As @akosiaris mentioned git submodules for just Community modules could be an interesting route as well, but given the scars of the last submodules experience I don't think the upsides are worth exploring at this time.
On a side note, I see there is a proposal of using /vendor/modules. It seems interesting and I 've never tried it, I am wondering what technical hurdles we 'd meet. Any ideas?
Feb 28 2022
@CDanis I had not, here is my attempt at a comparison between git submodules and subtrees
Thank you for digging up the details/history for this! I'm +1 on leaving the check in icinga, and possibly behind a conditional based on the distribution
Feb 27 2022
Feb 25 2022
Feb 24 2022
As all the packages we need have already been packaged by debian, my view is we just go with the debian packages and close this ticket down.
Feb 23 2022
Before commenting i would say that in my mind we have four types types of modules
- in house modules
- third-party modules
- built in types which are all the types included in the puppet source tree.
- core types which are types that puppet labs packages with the puppet-agent and labeled puppetlabs-core-$foo on puppet forge
We can ignore the built in types as they will be shipped with the puppet agent code
The puppet core types are modules that use to be part of the puppet source tree but got split out when puppet 6 was released. puppetlabs upstream package theses modules are part of the puppet-agent package, however it seems that Debian will split theses out as separate packages. Even though they are separate repos in puppet labs the fact that puppet labs have decided to bundle them with the puppet agent makes me think we should follow suite and package theses possibly even try to keep parity with the versions that puppetlabs ship in there puppet-agent packages. Further i have never needed to submit a patch to theses core types and would even argue that any such changes should be scrutinised by both us and upstream and we shouldn't allow users to so easily change something that is stable. To keep context currently this set would only include i think cron and mailaises
To start with I would just like to add a bit of info that we have a history of using git submodules inside the puppet repo and not liking them and then moving away from them again, which was kind of a bigger deal. So maybe not that one.
Feel free to edit the description and add advantages and disadvantages.