fgiunchedi (Filippo Giunchedi)
Awesome

Projects (19)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Monday

  • Clear sailing ahead.

User Details

User Since
Oct 3 2014, 8:06 AM (134 w, 1 d)
Availability
Available
IRC Nick
godog
LDAP User
Filippo Giunchedi
MediaWiki User
Filippo Giunchedi

Recent Activity

Yesterday

fgiunchedi closed T163385: upgrade memory in prometheus100[34] as "Resolved".

Both machines upgraded and back with 96GB, thanks @Cmjohnson !

Fri, Apr 28, 4:13 PM · ops-eqiad, User-fgiunchedi, Operations
fgiunchedi added a comment to T162796: Delete non-used and/or non-requested thumbnail sizes periodically.

Some more frequency distributions of size vs number of requests using bitly's data hacks

Fri, Apr 28, 3:15 PM · User-fgiunchedi, Operations
fgiunchedi created P5347 Hive and data_hacks fun.
Fri, Apr 28, 3:14 PM
fgiunchedi added a comment to T162796: Delete non-used and/or non-requested thumbnail sizes periodically.

And a rough estimation of the long tail, note that ~60% of sizes have been requested less than 1000 times in april. Only 4% of sizes are requested more than once per second (on average in april)

Fri, Apr 28, 2:55 PM · User-fgiunchedi, Operations
fgiunchedi added a comment to T162796: Delete non-used and/or non-requested thumbnail sizes periodically.

I started doing some analytics with hive on webrequest data for upload, reporting the queries here for reference. Note that running a query over a month of data took ~1h, writing the query into another table allows for faster querying/processing later.

Fri, Apr 28, 2:44 PM · User-fgiunchedi, Operations
fgiunchedi added a comment to T162247: Migrate beta cluster Swift cluster from Trusty to Jessie.

For reference, switching mw to talk to deployment-ms-fe02 the configuration is here: https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-deployment-cache-upload for the varnish bits and here https://wikitech.wikimedia.org/wiki/Hiera:Deployment-prep for swift

Fri, Apr 28, 2:31 PM · Patch-For-Review, User-fgiunchedi, media-storage, Beta-Cluster-Infrastructure
fgiunchedi closed T127762: Update Debian Package for Scap3 as "Resolved".

@thcipriani yep, all done!

Fri, Apr 28, 1:53 PM · Patch-For-Review, Deployment-Systems, Scap
fgiunchedi removed a project from T163673: Some swift disks wrongly mounted on 5 ms-be hosts: ops-eqiad.

@Cmjohnson not ATM, initially I thought it was a HW raid config issue but doesn't look like it, thanks!

Fri, Apr 28, 1:44 PM · User-fgiunchedi, Operations
fgiunchedi added a comment to T163385: upgrade memory in prometheus100[34].

@Cmjohnson yeah today at 10AM your time works for me, if not monday works too

Fri, Apr 28, 12:51 PM · ops-eqiad, User-fgiunchedi, Operations

Thu, Apr 27

fgiunchedi created T163998: check_hpssacli should report on battery failures and cache disabled.
Thu, Apr 27, 2:37 PM · Operations, Monitoring
fgiunchedi added a comment to T163777: HP RAID icinga alert on ms-be1021.

Looks like now battery count is reported as zero and Cache Status: Permanently Disabled plus Cache Status Details: Cable Error are still active, though the hp raid check reports OK

Thu, Apr 27, 2:22 PM · User-fgiunchedi, ops-eqiad, Operations
fgiunchedi moved T163690: Degraded RAID on ms-be1039 from Backlog to Blocked on the User-fgiunchedi board.
Thu, Apr 27, 2:17 PM · User-fgiunchedi, ops-eqiad, Operations
fgiunchedi created T163996: Icinga check for ipv6 host reachability.
Thu, Apr 27, 2:06 PM · Operations, Monitoring
fgiunchedi added a project to T163690: Degraded RAID on ms-be1039: User-fgiunchedi.
Thu, Apr 27, 1:37 PM · User-fgiunchedi, ops-eqiad, Operations
fgiunchedi updated subscribers of T163716: Include service and tmpfiles.d files into keyholder package.

@mmodell thanks! Also having systemd service file shipped would be useful I think, dh-systemd makes the task easy and we could basically reduce the puppet module to install the package and a few other things

Thu, Apr 27, 11:20 AM · Scap, Release-Engineering-Team

Wed, Apr 26

fgiunchedi renamed T150479: Prometheus varnish metric churn due to VCL reloads from "Error collecting metrics from varnish_exporter on some misc hosts" to "Prometheus varnish metric churn due to VCL reloads".
Wed, Apr 26, 5:54 PM · User-fgiunchedi, Patch-For-Review, Traffic, Prometheus-metrics-monitoring, Operations
fgiunchedi added a project to T151065: Implement DC-local cache failure limiter in Thumbor: User-fgiunchedi.
Wed, Apr 26, 5:07 PM · User-fgiunchedi, Patch-For-Review, Operations, Performance-Team, Thumbor
fgiunchedi added a comment to T162247: Migrate beta cluster Swift cluster from Trusty to Jessie.

ms-be0[34] and ms-fe02 are up and running with swift 2.10, next steps:

Wed, Apr 26, 3:21 PM · Patch-For-Review, User-fgiunchedi, media-storage, Beta-Cluster-Infrastructure
fgiunchedi moved T162247: Migrate beta cluster Swift cluster from Trusty to Jessie from Backlog to Doing on the User-fgiunchedi board.
Wed, Apr 26, 9:24 AM · Patch-For-Review, User-fgiunchedi, media-storage, Beta-Cluster-Infrastructure
fgiunchedi added a comment to T162247: Migrate beta cluster Swift cluster from Trusty to Jessie.

@hashar for ms-be the used ram seems in the order of ~12GB so m1.large would be tight. I'll go with m1.xlarge for now, we can revisit if resources get tighter

Wed, Apr 26, 9:18 AM · Patch-For-Review, User-fgiunchedi, media-storage, Beta-Cluster-Infrastructure
fgiunchedi added a project to T162247: Migrate beta cluster Swift cluster from Trusty to Jessie: User-fgiunchedi.
Wed, Apr 26, 9:11 AM · Patch-For-Review, User-fgiunchedi, media-storage, Beta-Cluster-Infrastructure
fgiunchedi added a comment to T162247: Migrate beta cluster Swift cluster from Trusty to Jessie.

Indeed we'll have to do this also because production will no longer have trusty "soon" (cfr T162609). I'll start with provisioning a jessie ms-fe since that's the easiest and will allow us to test swift 2.10 too.

Wed, Apr 26, 8:43 AM · Patch-For-Review, User-fgiunchedi, media-storage, Beta-Cluster-Infrastructure
fgiunchedi added a comment to T161836: 404 error while accessing some images files (e.g. djvu, jpg, png, webm) on Commons and other sites.

@Revent thanks for your report! It looks like those file were moved by steinsplitterbot (ogg -> ogv) which I suspect is an instance of another bug related to moving files (e.g. T64057)

Wed, Apr 26, 8:28 AM · User-fgiunchedi, media-storage, Operations, Multimedia

Tue, Apr 25

fgiunchedi created T163795: Nutcracker doesn't start at boot.
Tue, Apr 25, 3:41 PM · HHVM, Operations
fgiunchedi added a comment to T151648: Consider storage policies for swift.

WRT minimum swift version, we're running 2.2 and 2.10 is on the cards (https://phabricator.wikimedia.org/T162609) here's the relevant changelog entries between 2.2 and 2.10

Tue, Apr 25, 2:41 PM · User-fgiunchedi, media-storage, Operations
fgiunchedi added a project to T163777: HP RAID icinga alert on ms-be1021: User-fgiunchedi.
Tue, Apr 25, 10:50 AM · User-fgiunchedi, ops-eqiad, Operations
fgiunchedi created T163777: HP RAID icinga alert on ms-be1021.
Tue, Apr 25, 10:49 AM · User-fgiunchedi, ops-eqiad, Operations
fgiunchedi added a project to T151648: Consider storage policies for swift: User-fgiunchedi.
Tue, Apr 25, 10:29 AM · User-fgiunchedi, media-storage, Operations
fgiunchedi moved T162792: Reduce Swift technical debt from Backlog to Doing on the User-fgiunchedi board.
Tue, Apr 25, 10:29 AM · User-fgiunchedi, Operations
fgiunchedi added a project to T152791: Improvements to Ganglia-equivalent Prometheus dashboards: User-fgiunchedi.
Tue, Apr 25, 8:42 AM · User-fgiunchedi, Prometheus-metrics-monitoring, Operations
fgiunchedi moved T150206: ms-be1016 controller cache failure from Backlog to Blocked on the User-fgiunchedi board.
Tue, Apr 25, 8:11 AM · User-fgiunchedi, ops-eqiad, Operations
fgiunchedi added a project to T150206: ms-be1016 controller cache failure: User-fgiunchedi.
Tue, Apr 25, 8:10 AM · User-fgiunchedi, ops-eqiad, Operations
fgiunchedi closed T162348: swift-object-server 1.13.1: Wrong Content-Type returned on 304 Not Modified responses as "Resolved".

Resolving as the swift upgrade is complete and varnish bandaids have been reverted.

Tue, Apr 25, 8:08 AM · media-storage, Operations, Traffic
fgiunchedi added a comment to T127762: Update Debian Package for Scap3.

@thcipriani package built and updated in reprepro

Tue, Apr 25, 8:05 AM · Patch-For-Review, Deployment-Systems, Scap
fgiunchedi added a comment to T163743: New ganeti VM for MW release pipeline work.

I'd suggest we use the hostname jenkins-jobrunner[12]001 since its job running in patch and tar building?

Yay, naming! How about jenkins-mw-builder[12]001?

Tue, Apr 25, 7:55 AM · Operations, Security-General, Release-Engineering-Team, vm-requests
fgiunchedi moved T141704: Storage backend errors on commons when deleting/restoring pages from Backlog to Radar on the User-fgiunchedi board.
Tue, Apr 25, 7:47 AM · User-fgiunchedi, Multimedia, Commons, Operations, media-storage
fgiunchedi added a project to T141704: Storage backend errors on commons when deleting/restoring pages: User-fgiunchedi.
Tue, Apr 25, 7:47 AM · User-fgiunchedi, Multimedia, Commons, Operations, media-storage
fgiunchedi moved T161836: 404 error while accessing some images files (e.g. djvu, jpg, png, webm) on Commons and other sites from Backlog to Doing on the User-fgiunchedi board.
Tue, Apr 25, 7:36 AM · User-fgiunchedi, media-storage, Operations, Multimedia
fgiunchedi added a project to T161836: 404 error while accessing some images files (e.g. djvu, jpg, png, webm) on Commons and other sites: User-fgiunchedi.
Tue, Apr 25, 7:36 AM · User-fgiunchedi, media-storage, Operations, Multimedia

Mon, Apr 24

fgiunchedi closed T163386: upgrade memory in prometheus200[34] as "Resolved".

Both machines up at 96GB, thanks @Papaul !

Mon, Apr 24, 5:36 PM · ops-codfw, User-fgiunchedi, Operations
fgiunchedi created T163716: Include service and tmpfiles.d files into keyholder package.
Mon, Apr 24, 5:05 PM · Scap, Release-Engineering-Team
fgiunchedi closed T163209: Degraded RAID on ms-be1002 as "Resolved".

@Cmjohnson the disk in slot 7 was marked as 'foreign config' and it looks like it contained a previous filesystem, maybe from another swift box? These disks should be wiped when used as spares. I've put the disk back in service and it is rebuilding

Mon, Apr 24, 3:39 PM · media-storage, ops-eqiad, Operations
fgiunchedi claimed T163209: Degraded RAID on ms-be1002.

Indeed megacli doesn't seem happy

Mon, Apr 24, 2:39 PM · media-storage, ops-eqiad, Operations
fgiunchedi created T163692: Have puppet create Prometheus LVs.
Mon, Apr 24, 2:10 PM · User-fgiunchedi, Prometheus-metrics-monitoring
fgiunchedi assigned T163690: Degraded RAID on ms-be1039 to Cmjohnson.

This is one of the new machines in this batch, I tried burning-in the disks before production but clearly it wasn't enough :(
Note that the disk is fine according to hpssacli

Mon, Apr 24, 1:43 PM · User-fgiunchedi, ops-eqiad, Operations
fgiunchedi edited the description of T163690: Degraded RAID on ms-be1039.
Mon, Apr 24, 1:41 PM · User-fgiunchedi, ops-eqiad, Operations
fgiunchedi added a comment to T163673: Some swift disks wrongly mounted on 5 ms-be hosts.

I've tried rebooting ms-be1036 though that didn't change anything, I think the issue is a combination of these factors:

Mon, Apr 24, 1:12 PM · User-fgiunchedi, Operations
fgiunchedi moved T163385: upgrade memory in prometheus100[34] from Backlog to Doing on the User-fgiunchedi board.
Mon, Apr 24, 11:22 AM · ops-eqiad, User-fgiunchedi, Operations
fgiunchedi moved T163386: upgrade memory in prometheus200[34] from Backlog to Doing on the User-fgiunchedi board.
Mon, Apr 24, 11:22 AM · ops-codfw, User-fgiunchedi, Operations
fgiunchedi moved T163673: Some swift disks wrongly mounted on 5 ms-be hosts from Backlog to Doing on the User-fgiunchedi board.
Mon, Apr 24, 11:22 AM · User-fgiunchedi, Operations
fgiunchedi added a comment to T163385: upgrade memory in prometheus100[34].

@Cmjohnson LMK when you can do this, we can depool one machine at a time for maintenance

Mon, Apr 24, 11:22 AM · ops-eqiad, User-fgiunchedi, Operations
fgiunchedi added a comment to T163386: upgrade memory in prometheus200[34].

@Papaul LMK when you can do this, we can depool one machine at a time for maintenance

Mon, Apr 24, 11:22 AM · ops-codfw, User-fgiunchedi, Operations
fgiunchedi added a comment to T150206: ms-be1016 controller cache failure.

@Cmjohnson I'm ok to do this today, LMK when it is a good time for you

Mon, Apr 24, 11:19 AM · User-fgiunchedi, ops-eqiad, Operations
fgiunchedi added a comment to T162859: Swap NIC on mira.

naos is online and used, I think we should fix mira's NIC and deprovision / allocate to spare now (or decom altogether)

Mon, Apr 24, 11:13 AM · Operations, ops-codfw
fgiunchedi edited the description of T163667: Fix UIDs for deployment server users.
Mon, Apr 24, 11:00 AM · Operations
fgiunchedi added a comment to T163610: Some thumbnails / fullscreen images on Commons show either HTTP 503 errors or other issues.

Known issue with cache_upload, see T145661

Mon, Apr 24, 10:56 AM · Commons, Multimedia, Operations
fgiunchedi merged T163610: Some thumbnails / fullscreen images on Commons show either HTTP 503 errors or other issues into T145661: varnish backends start returning 503s after ~6 days uptime.
Mon, Apr 24, 10:56 AM · Patch-For-Review, Operations, Traffic
fgiunchedi merged task T163610: Some thumbnails / fullscreen images on Commons show either HTTP 503 errors or other issues into T145661: varnish backends start returning 503s after ~6 days uptime.
Mon, Apr 24, 10:56 AM · Commons, Multimedia, Operations
fgiunchedi created T163673: Some swift disks wrongly mounted on 5 ms-be hosts.
Mon, Apr 24, 10:35 AM · User-fgiunchedi, Operations
fgiunchedi added a comment to T162900: setup naos/WMF6406 as new codfw deployment server.

Followup for trebuchet/mwdeploy fixed uid/gid: https://phabricator.wikimedia.org/T163667

Mon, Apr 24, 8:56 AM · ops-codfw, Operations
fgiunchedi created T163667: Fix UIDs for deployment server users.
Mon, Apr 24, 8:55 AM · Operations

Wed, Apr 19

fgiunchedi edited the description of T162900: setup naos/WMF6406 as new codfw deployment server.
Wed, Apr 19, 1:04 PM · ops-codfw, Operations
fgiunchedi edited the description of T162900: setup naos/WMF6406 as new codfw deployment server.
Wed, Apr 19, 11:24 AM · ops-codfw, Operations
fgiunchedi added a comment to T158583: Restructure our internal repositories further.

A related issue discovered in T163278 is a consideration of APT priority between components (and/or distros, if multiple) so that packages are picked up from the right place in all cases (most commonly a reimage vs adding experimental to a machine with packages already installed)

Wed, Apr 19, 11:03 AM · Operations
fgiunchedi added a comment to T163278: Four different PHP/HHVM versions on the cluster.

I've downgraded hhvm-related packages back to their non-experimental version.

Wed, Apr 19, 10:59 AM · Operations
fgiunchedi added a comment to T160156: Add node_exporter ipvs ipv6 support.

Upstream issue: https://github.com/prometheus/procfs/issues/40

Wed, Apr 19, 10:47 AM · Traffic, Operations, Monitoring
fgiunchedi added a comment to T163278: Four different PHP/HHVM versions on the cluster.

Looking at the situation on naos, it looks like an accidental upgrade via hhvm-dbg

Wed, Apr 19, 10:46 AM · Operations

Tue, Apr 18

fgiunchedi added a comment to T162900: setup naos/WMF6406 as new codfw deployment server.

I've merged @RobH patch and ran puppet on naos, issues I've encountered so far:

Tue, Apr 18, 5:49 PM · ops-codfw, Operations
fgiunchedi closed T163158: acpi_pad consuming 100% CPU on tin as "Resolved".

tin rebooted, I've enabled HT and fixed performance profile to be "performance per watt (OS)", see also the icinga task for alarming on this and parent task

Tue, Apr 18, 5:21 PM · Operations
fgiunchedi closed T163158: acpi_pad consuming 100% CPU on tin, a subtask of T162850: acpi_pad issues, as "Resolved".
Tue, Apr 18, 5:21 PM · Patch-For-Review, Operations
fgiunchedi assigned T163209: Degraded RAID on ms-be1002 to Cmjohnson.

Confirmed sdh isn't well. @Cmjohnson do you have spares onsite?

Tue, Apr 18, 2:54 PM · media-storage, ops-eqiad, Operations
fgiunchedi closed T148408: Put prometheus baremetal servers in service as "Resolved".

This is completed, baremetal in service

Tue, Apr 18, 2:02 PM · User-fgiunchedi, Patch-For-Review, Prometheus-metrics-monitoring, Operations
fgiunchedi closed T158337: codfw: ms-be2028-ms-be2039 rack/setup as "Resolved".

This is completed, decom for equivalent old hw is T162785: Decomission ms-be2001 - ms-be2012

Tue, Apr 18, 1:59 PM · User-fgiunchedi, Patch-For-Review, ops-codfw, Operations
fgiunchedi moved T162814: Ensure deployment_server is global from Backlog to Radar on the User-fgiunchedi board.
Tue, Apr 18, 1:57 PM · User-fgiunchedi, Patch-For-Review, Scap
fgiunchedi added projects to T163194: Backfill restored coal whisper files with current data: User-fgiunchedi, Operations, Performance-Team.
Tue, Apr 18, 9:42 AM · Performance-Team, Operations, User-fgiunchedi
fgiunchedi renamed T163194: Backfill restored coal whisper files with current data from "Merge old coal data with new" to "Backfill restored coal whisper files with current data".
Tue, Apr 18, 9:42 AM · Performance-Team, Operations, User-fgiunchedi
fgiunchedi added a comment to T161538: Plug in ex-graphite2001 SSDs to recover coal data.

@Krinkle on graphite2001, I've opened T163194: Backfill restored coal whisper files with current data to followup on the actual backfill. Note I won't have to work on it this week, though if you want to take a stab at it all files should be readable

Tue, Apr 18, 9:41 AM · User-fgiunchedi, Performance-Team, Operations, ops-codfw
fgiunchedi created T163194: Backfill restored coal whisper files with current data.
Tue, Apr 18, 9:40 AM · Performance-Team, Operations, User-fgiunchedi
fgiunchedi added a comment to T162348: swift-object-server 1.13.1: Wrong Content-Type returned on 304 Not Modified responses.

FWIW the swift 2.2.0 upgrade is complete (from T162609)

Tue, Apr 18, 9:26 AM · media-storage, Operations, Traffic
fgiunchedi added a comment to T162949: hosts with puppet compiler failures on every run.

I believe at least bast* and prometheus* are due to T150456: puppet compiler fails with modules using puppetdb

Tue, Apr 18, 9:24 AM · puppet-compiler, Operations
fgiunchedi added a comment to T133852: analytics hosts frequently tripping 'port utilization threshold' librenms alerts.

@ayounsi sounds good to me! I think for the longer period of time we can start with 3x (or 2x) the current 5min and see if that helps. Usual cases I've seen is analytics hosts, db hosts (during reimage) and swift hosts tripping the alert. The latter hosts sometimes has had real heavy swift usage by external clients that bypass varnish (i.e. with cache-busting query string) but I don't think it'll be a problem in practice.

Tue, Apr 18, 9:23 AM · Patch-For-Review, netops, Operations
fgiunchedi closed T162712: Decommission prometheus ganeti VMs as "Resolved".

Hosts are gone now from servermon

Tue, Apr 18, 8:50 AM · Patch-For-Review, Prometheus-metrics-monitoring, User-fgiunchedi
fgiunchedi triaged T162712: Decommission prometheus ganeti VMs as "Normal" priority.
Tue, Apr 18, 8:47 AM · Patch-For-Review, Prometheus-metrics-monitoring, User-fgiunchedi
fgiunchedi added a comment to T162712: Decommission prometheus ganeti VMs.

Odd, I've ran puppet node clean and puppet node deactivate again just in case

Tue, Apr 18, 8:47 AM · Patch-For-Review, Prometheus-metrics-monitoring, User-fgiunchedi

Thu, Apr 13

fgiunchedi added a project to T162814: Ensure deployment_server is global: User-fgiunchedi.
Thu, Apr 13, 8:07 AM · User-fgiunchedi, Patch-For-Review, Scap
fgiunchedi awarded T162822: remove/fix jenkins icinga monitoring on contint2001 a Like token.
Thu, Apr 13, 8:00 AM · Patch-For-Review, Icinga, Operations, Release-Engineering-Team, Continuous-Integration-Infrastructure
fgiunchedi closed T161703: Add performance-team contact group to private.git as "Resolved".

Completed! emails to performance-team ML should be happening now. Note that for consistency with the rest the actual icinga contact name is team-performance

Thu, Apr 13, 7:50 AM · Patch-For-Review, Operations, Performance-Team
fgiunchedi closed T161703: Add performance-team contact group to private.git, a subtask of T156245: Create Nagios Grafana alert checks, as "Resolved".
Thu, Apr 13, 7:50 AM · Performance-Team

Wed, Apr 12

fgiunchedi added a comment to T162789: Create less overhead on bacula jobs when dumping production databases.

For context: fixing this would also alleviate a current problem where long-running backup jobs stall both other backup jobs and restore jobs as well (e.g. dbstore1001 backup job takes several hours to complete ATM)

Wed, Apr 12, 12:56 PM · Operations, DBA
fgiunchedi created T162796: Delete non-used and/or non-requested thumbnail sizes periodically.
Wed, Apr 12, 11:33 AM · User-fgiunchedi, Operations
fgiunchedi created T162793: Rate limit swift operations.
Wed, Apr 12, 11:24 AM · Patch-For-Review, User-fgiunchedi, Operations
fgiunchedi claimed T162609: Swift version and distro upgrade.
Wed, Apr 12, 11:20 AM · User-fgiunchedi, media-storage, Operations
fgiunchedi added a parent task for T162609: Swift version and distro upgrade: T162792: Reduce Swift technical debt.
Wed, Apr 12, 11:20 AM · User-fgiunchedi, media-storage, Operations
fgiunchedi added a parent task for T151648: Consider storage policies for swift: T162792: Reduce Swift technical debt.
Wed, Apr 12, 11:20 AM · User-fgiunchedi, media-storage, Operations
fgiunchedi added subtasks for T162792: Reduce Swift technical debt: T151648: Consider storage policies for swift, T162609: Swift version and distro upgrade.
Wed, Apr 12, 11:20 AM · User-fgiunchedi, Operations
fgiunchedi created T162792: Reduce Swift technical debt.
Wed, Apr 12, 11:19 AM · User-fgiunchedi, Operations
fgiunchedi moved T162785: Decomission ms-be2001 - ms-be2012 from Backlog to Doing on the User-fgiunchedi board.
Wed, Apr 12, 9:52 AM · User-fgiunchedi, Operations
fgiunchedi created T162785: Decomission ms-be2001 - ms-be2012.
Wed, Apr 12, 9:50 AM · User-fgiunchedi, Operations
fgiunchedi added a comment to T151999: Create script to monitor db dumps for backups are successful (and if not, old backups are not deleted).

So, somehow we missed the error there. My take would be that one of the child processes returned that error, others did not and the wait in the last line somehow mangled all that. I 'll conduct some tests in case it's that.

Wed, Apr 12, 9:23 AM · Patch-For-Review, DBA, Monitoring, Operations
fgiunchedi moved T160640: Rack and Setup ms-be1028-ms-1039 from Blocked to Doing on the User-fgiunchedi board.
Wed, Apr 12, 8:33 AM · Patch-For-Review, User-fgiunchedi, Operations