Gehel (Guillaume Lederrey)
Operations Engineer - Discovery

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Monday

  • Clear sailing ahead.

User Details

User Since
Nov 9 2015, 9:18 PM (149 w, 4 d)
Availability
Available
IRC Nick
gehel
LDAP User
Gehel
MediaWiki User
GLederrey (WMF) [ Global Accounts ]

Recent Activity

Thu, Sep 20

Gehel added a comment to T147505: [Recurring task] CirrusSearch: what is updated during re-indexing.

It looks like it was the master re-election during the cluster restart. Sadly I don't think there is much we can do about this.

Thu, Sep 20, 7:39 PM · Discovery-Search (Current work), Discovery
Gehel added a comment to T147505: [Recurring task] CirrusSearch: what is updated during re-indexing.

I correlate that with a spike in pool counter rejections, which is correlated to 3 nodes restarting in the cluster (there is a full cluster restart going on on codfw). This spike did not happen for the last ~50 servers I restarted, so not sure if it is a coincidence or not. Looking...

Thu, Sep 20, 7:36 PM · Discovery-Search (Current work), Discovery
Gehel updated subscribers of T204980: add onimisionipe to restricted group.

I can vouch for @Mathew.onipe, but his manager is officially @EBjune.

Thu, Sep 20, 7:10 PM · Patch-For-Review, Operations, Elasticsearch, SRE-Access-Requests, Discovery-Search
Gehel moved T204364: Rate limit wdqs logs from All WDQS-related tasks to Operations on the Wikidata-Query-Service board.
Thu, Sep 20, 5:31 PM · Patch-For-Review, Wikimedia-Logstash, Operations, Wikidata-Query-Service, Wikidata
Gehel added a comment to T204960: add onimisionipe to maps-admin.

I confirm that allowing @Mathew.onipe to access maps servers as a member of the maps-admins team make sense and is reasonable.

Thu, Sep 20, 3:29 PM · Patch-For-Review, Discovery-Search (Current work), SRE-Access-Requests, Operations

Wed, Sep 19

Gehel added a comment to T204776: Investigate brief CirrusSearch outage (MW exception spike for api.php).

To summarize some of the discussion I had with @dcausse:

Wed, Sep 19, 12:57 PM · Discovery-Search (Current work), MW-1.32-release-notes (WMF-deploy-2018-09-25 (1.32.0-wmf.23)), Patch-For-Review, CirrusSearch, Wikimedia-Incident, Wikimedia-production-error

Tue, Sep 18

Gehel moved T202708: Onboarding Mathew Onipe from In progress to Done on the Discovery-Search (Current work) board.

@Mathew.onipe has access to the elastic and wdqs clusters, which is what we need at the moment. We'll reopen specific tasks for specific access as needed.

Tue, Sep 18, 3:31 PM · Patch-For-Review, SRE-Access-Requests, Discovery-Search (Current work), Operations

Fri, Sep 14

Gehel renamed T204361: Raise alert level on disk space for old elasticsearch servers from Raise alert level for old elasticsearch servers to Raise alert level on disk space for old elasticsearch servers.
Fri, Sep 14, 10:32 PM · Patch-For-Review, Operations, Discovery-Search (Current work), Elasticsearch
Gehel added a project to T204361: Raise alert level on disk space for old elasticsearch servers: Operations.

The current config for that check is in hiera. We want to identify the old servers and change that config just for them. The easiest way to do that is probably via regex.yaml.

Fri, Sep 14, 5:12 PM · Patch-For-Review, Operations, Discovery-Search (Current work), Elasticsearch
Gehel created T204364: Rate limit wdqs logs.
Fri, Sep 14, 4:11 PM · Patch-For-Review, Wikimedia-Logstash, Operations, Wikidata-Query-Service, Wikidata

Thu, Sep 13

Gehel triaged T204240: Cleanup rspec tests for tilerator and wdqs puppet modules as Low priority.
Thu, Sep 13, 1:53 PM · Discovery-Search
Gehel created T204240: Cleanup rspec tests for tilerator and wdqs puppet modules.
Thu, Sep 13, 1:52 PM · Discovery-Search

Wed, Sep 12

Gehel added a comment to T204135: Warn when CirrusSearch is not configured to use local DCfor an extended time.

I'm not too familiar with the order of inclusion of MW configs, but I was wondering if we could expose that via siteinfo maybe as we do with the etcd-based data.

Wed, Sep 12, 9:18 PM · Patch-For-Review, Discovery-Search (Current work), Datacenter-Switchover-2018, Operations
Gehel assigned T202898: Decommission maps-test cluster to RobH.
Wed, Sep 12, 7:38 PM · Patch-For-Review, ops-codfw, decommission, Operations, Maps, Maps-Sprint
Gehel added a project to T204135: Warn when CirrusSearch is not configured to use local DCfor an extended time: Datacenter-Switchover-2018.
Wed, Sep 12, 5:02 PM · Patch-For-Review, Discovery-Search (Current work), Datacenter-Switchover-2018, Operations
Gehel created T204135: Warn when CirrusSearch is not configured to use local DCfor an extended time.
Wed, Sep 12, 4:58 PM · Patch-For-Review, Discovery-Search (Current work), Datacenter-Switchover-2018, Operations
Gehel updated the task description for T202898: Decommission maps-test cluster.
Wed, Sep 12, 1:35 PM · Patch-For-Review, ops-codfw, decommission, Operations, Maps, Maps-Sprint
Gehel renamed T204106: Log slow queries on postgresql / maps from Log slow queries on to Log slow queries on postgresql / maps.
Wed, Sep 12, 12:37 PM · Patch-For-Review, Discovery-Search (Current work), Maps-Sprint, Operations, Maps (Tilerator)
Gehel updated the task description for T202898: Decommission maps-test cluster.
Wed, Sep 12, 12:24 PM · Patch-For-Review, ops-codfw, decommission, Operations, Maps, Maps-Sprint
Gehel added a comment to T204106: Log slow queries on postgresql / maps.

@Mathew.onipe : if you start looking into this task, a few pointers:

Wed, Sep 12, 9:02 AM · Patch-For-Review, Discovery-Search (Current work), Maps-Sprint, Operations, Maps (Tilerator)
Gehel triaged T204106: Log slow queries on postgresql / maps as High priority.
Wed, Sep 12, 8:58 AM · Patch-For-Review, Discovery-Search (Current work), Maps-Sprint, Operations, Maps (Tilerator)
Gehel added a comment to T204047: investigate tilerator crash on maps eqiad.

Oh, I completely forgot about populate_admin()! This might have been generating lock contention. Not exactly sure what we can improve on that side. If there isn't anything we can improve in populate_admin(), we might need to ensure that tilerator is more robust to contention (also not sure how to do that). It is an async process, so we don't really care if it lags a bit at some point, we just don't want it to crash and alert.

Wed, Sep 12, 8:44 AM · Reading-Infrastructure-Team-Backlog (Kanban), Maps-Sprint, Operations, Maps (Tilerator)

Tue, Sep 11

Gehel added a comment to T202764: Wikidata produces a lot of failed requests for recentchanges API.

It looks like there is a correlation between bot activity on wikidata query service (T202765) and the rate of those errors. This would tend to indicate that cause of this issue is load on wdqs and not slowdown on wikidata. I don't have any explanation of the causality chain except the correlation, so that might be completely wrong.

Tue, Sep 11, 6:16 PM · Datacenter-Switchover-2018, Performance-Team (Radar), DBA, Patch-For-Review, User-Addshore, Operations, Wikidata-Query-Service, Wikidata
Gehel added a comment to T202764: Wikidata produces a lot of failed requests for recentchanges API.

The issue as seen from WDQS can be followed on logstash.

Tue, Sep 11, 5:59 PM · Datacenter-Switchover-2018, Performance-Team (Radar), DBA, Patch-For-Review, User-Addshore, Operations, Wikidata-Query-Service, Wikidata
Gehel triaged T204047: investigate tilerator crash on maps eqiad as High priority.
Tue, Sep 11, 12:57 PM · Reading-Infrastructure-Team-Backlog (Kanban), Maps-Sprint, Operations, Maps (Tilerator)
Gehel created T204047: investigate tilerator crash on maps eqiad.
Tue, Sep 11, 12:51 PM · Reading-Infrastructure-Team-Backlog (Kanban), Maps-Sprint, Operations, Maps (Tilerator)

Mon, Sep 10

Gehel added a comment to T202708: Onboarding Mathew Onipe.

Shell access and membership to elasticsearch-roots and wdqs-admins has been approved in weekly SRE meeting.

Mon, Sep 10, 4:53 PM · Patch-For-Review, SRE-Access-Requests, Discovery-Search (Current work), Operations
Gehel added a comment to T202639: Create dashboards for beta cluster maps instances.

Minor comment on the dashboard: the "Cassandra memory usage" graph does not name the different pools in the legend (we see multiple entries for "deployment-maps03.memory_pool_usages").

Mon, Sep 10, 11:37 AM · Maps-Sprint, Reading-Infrastructure-Team-Backlog (Kanban), Maps

Fri, Sep 7

Gehel added a comment to T202708: Onboarding Mathew Onipe.

Thanks @Dzahn to move this forward! I was stalling this for too long.

Fri, Sep 7, 7:50 PM · Patch-For-Review, SRE-Access-Requests, Discovery-Search (Current work), Operations

Thu, Sep 6

Gehel added a comment to T203546: Alert when elasticsearch has shards larger than a maximum size.

For reference, https://github.com/wikimedia/puppet/blob/production/modules/elasticsearch/files/nagios/check_elasticsearch.py is a similar check, which could be used as a base for this new check.

Thu, Sep 6, 9:27 AM · Patch-For-Review, Discovery-Search (Current work), Elasticsearch, Operations
Gehel added a comment to T202779: add SSDs to wdqs100[45].

New SSD in place, server reimaged and data reimported. We're all good!

Thu, Sep 6, 7:53 AM · ops-eqiad, Operations, Discovery, Discovery-Wikidata-Query-Service-Sprint, Wikidata-Query-Service, Wikidata
Gehel moved T202777: add SSDs to wdqs200[12] from Backlog to Done on the Discovery-Wikidata-Query-Service-Sprint board.

New SSD in place, server reimaged and data reimported. We're all good!

Thu, Sep 6, 7:53 AM · ops-codfw, Operations, Discovery, Discovery-Wikidata-Query-Service-Sprint, Wikidata-Query-Service, Wikidata
Gehel added a comment to T202778: add ssds to wdqs2003.

New SSD in place, server reimaged and data reimported. We're all good!

Thu, Sep 6, 7:53 AM · ops-codfw, Operations, Discovery, Discovery-Wikidata-Query-Service-Sprint, Wikidata-Query-Service, Wikidata
Gehel closed T196485: WDQS diskspace is low as Resolved.

New SSD in place, server reimaged and data reimported. We're all good!

Thu, Sep 6, 7:52 AM · Discovery, Operations, Wikidata, Wikidata-Query-Service
Gehel moved T202780: add SSDs to wdqs1003 from Backlog to Done on the Discovery-Wikidata-Query-Service-Sprint board.

New SSD in place, server reimaged and data reimported. We're all good!

Thu, Sep 6, 7:52 AM · ops-eqiad, Operations, Discovery, Discovery-Wikidata-Query-Service-Sprint, Wikidata-Query-Service, Wikidata

Wed, Sep 5

Gehel triaged T203546: Alert when elasticsearch has shards larger than a maximum size as Normal priority.
Wed, Sep 5, 8:18 AM · Patch-For-Review, Discovery-Search (Current work), Elasticsearch, Operations
Gehel created T203546: Alert when elasticsearch has shards larger than a maximum size.
Wed, Sep 5, 8:18 AM · Patch-For-Review, Discovery-Search (Current work), Elasticsearch, Operations

Tue, Sep 4

Gehel added a project to T194186: rack/setup/install cloudelastic100[1-4].eqiad.wmnet systems: Discovery-Search (Current work).
Tue, Sep 4, 5:39 PM · Discovery-Search (Current work), cloud-services-team, Cloud-VPS, Operations
Gehel assigned T202885: Migrate elasticsearch scripts to spicerack cookbooks to Mathew.onipe.
Tue, Sep 4, 5:23 PM · Patch-For-Review, Discovery-Search (Current work), Operations
Gehel moved T202885: Migrate elasticsearch scripts to spicerack cookbooks from Backlog to In progress on the Discovery-Search (Current work) board.
Tue, Sep 4, 5:23 PM · Patch-For-Review, Discovery-Search (Current work), Operations

Mon, Sep 3

Gehel closed T203404: Degraded RAID on elastic2012 as Resolved.

elastic2012 is scheduled to be replaced soon (see T198169), so let's not do anything at the moment and not waste our DC ops time.

Mon, Sep 3, 4:18 PM · Operations, ops-codfw

Fri, Aug 31

Gehel added a comment to T202785: Federation request to https://ld.stadt-zuerich.ch/query fails.

Looking a bit into this, it does not look like blazegraph has a deep integration with Jetty (why does it even have any dependency on Jetty is a mystery to me). So repackaging with a more recent jetty-http (or the whole jetty stack) might not be that hard (well, it is trivial to upgrade, non trivial to test).

Fri, Aug 31, 7:58 PM · Wikidata, Wikidata-Query-Service
Gehel committed rCUMIN9338431e94e8: extract reporting from BaseEventHandler (authored by Gehel).
extract reporting from BaseEventHandler
Fri, Aug 31, 2:45 PM
Gehel committed rCUMIN8a8fcb75e212: extract reporting from BaseEventHandler (authored by Gehel).
extract reporting from BaseEventHandler
Fri, Aug 31, 2:45 PM

Wed, Aug 29

Gehel added a comment to T202777: add SSDs to wdqs200[12].

error during reimage of wdqs2001:

Wed, Aug 29, 4:10 PM · ops-codfw, Operations, Discovery, Discovery-Wikidata-Query-Service-Sprint, Wikidata-Query-Service, Wikidata
Gehel added a comment to T202708: Onboarding Mathew Onipe.

Summarizing a few back channel conversations here:

Wed, Aug 29, 1:49 PM · Patch-For-Review, SRE-Access-Requests, Discovery-Search (Current work), Operations
Gehel added a comment to T202779: add SSDs to wdqs100[45].

@Cmjohnson wdqs1004 is back into rotation, ping me when you have time for the next one (we also have T202780)

Wed, Aug 29, 7:02 AM · ops-eqiad, Operations, Discovery, Discovery-Wikidata-Query-Service-Sprint, Wikidata-Query-Service, Wikidata

Tue, Aug 28

Gehel added a project to T202898: Decommission maps-test cluster: Operations.
Tue, Aug 28, 3:07 PM · Patch-For-Review, ops-codfw, decommission, Operations, Maps, Maps-Sprint
Gehel added a comment to T202777: add SSDs to wdqs200[12].

@Papaul: I'm ready to reimage wdqs2002 today. Ping me when you're around and I'll shut it down.

Tue, Aug 28, 12:55 PM · ops-codfw, Operations, Discovery, Discovery-Wikidata-Query-Service-Sprint, Wikidata-Query-Service, Wikidata

Mon, Aug 27

Gehel added a comment to T202764: Wikidata produces a lot of failed requests for recentchanges API.

Digging into this a bit more from the WDQS side, we see a few interesting things:

Mon, Aug 27, 7:58 PM · Datacenter-Switchover-2018, Performance-Team (Radar), DBA, Patch-For-Review, User-Addshore, Operations, Wikidata-Query-Service, Wikidata
Gehel added a comment to T202708: Onboarding Mathew Onipe.

It is not entirely clear what access we want to give @Mathew.onipe at this point.

Mon, Aug 27, 4:51 PM · Patch-For-Review, SRE-Access-Requests, Discovery-Search (Current work), Operations
Gehel triaged T202898: Decommission maps-test cluster as Low priority.
Mon, Aug 27, 3:08 PM · Patch-For-Review, ops-codfw, decommission, Operations, Maps, Maps-Sprint
Gehel updated subscribers of T192639: Upgrade Archiva (meitnerium) to Debian Stretch.
  • Elasticsearch plugins have no direct dependency on archiva
  • logstash plugins only upload to archiva, @fgiunchedi has been notified, there is no reason this should break and I don't want to upload an intermediate version just to test
  • wdqs still needs to be validated by @Smalyshev
Mon, Aug 27, 3:00 PM · Analytics-Kanban, Patch-For-Review, User-Elukey, Analytics
Gehel added a comment to T202888: tilerator / tileratorui crashed on maps-test2003.

I can't find the task about decommissioning maps-test servers. But if we are ready to get rid of maps-test*, we should close this task and work on removing the servers completely!

Mon, Aug 27, 2:56 PM · Reading-Infrastructure-Team-Backlog (Kanban), Maps-Sprint, Maps (Tilerator)
Gehel closed T202892: Requesting access to RESOURCE for USER[S] as Declined.

Actually, this will be tracked as part of T202708

Mon, Aug 27, 1:50 PM · Operations, SRE-Access-Requests
Gehel created T202888: tilerator / tileratorui crashed on maps-test2003.
Mon, Aug 27, 12:30 PM · Reading-Infrastructure-Team-Backlog (Kanban), Maps-Sprint, Maps (Tilerator)
Gehel added a comment to T196485: WDQS diskspace is low.

Note that data import after reimage can be done by copying over data from wdqs1010, which has been reimported recently. Procedure is documented on https://wikitech.wikimedia.org/wiki/Wikidata_query_service#Data_transfer_procedure.

Mon, Aug 27, 12:18 PM · Discovery, Operations, Wikidata, Wikidata-Query-Service
Gehel updated subscribers of T202778: add ssds to wdqs2003.

@Papaul: we'll start by reimaging wdqs2003 (wdqs200[12] to follow). We'll reimage them one by one, to ensure that we have at most 1 host down in the cluster at any time.

Mon, Aug 27, 12:16 PM · ops-codfw, Operations, Discovery, Discovery-Wikidata-Query-Service-Sprint, Wikidata-Query-Service, Wikidata
Gehel updated subscribers of T196485: WDQS diskspace is low.

To not duplicate infos on each of the child tasks, I'll add anything that is common to all on this task.

Mon, Aug 27, 12:14 PM · Discovery, Operations, Wikidata, Wikidata-Query-Service
Gehel created T202885: Migrate elasticsearch scripts to spicerack cookbooks.
Mon, Aug 27, 11:48 AM · Patch-For-Review, Discovery-Search (Current work), Operations
Gehel moved T202708: Onboarding Mathew Onipe from Backlog to In progress on the Discovery-Search (Current work) board.
Mon, Aug 27, 7:31 AM · Patch-For-Review, SRE-Access-Requests, Discovery-Search (Current work), Operations
Gehel added a project to T202708: Onboarding Mathew Onipe: Discovery-Search (Current work).
Mon, Aug 27, 7:31 AM · Patch-For-Review, SRE-Access-Requests, Discovery-Search (Current work), Operations

Fri, Aug 24

Gehel added a comment to T202708: Onboarding Mathew Onipe.

Note: @Mathew.onipe does not have an @wikimedia.org email yet. Some of the checklist items above would make more sense with an @wikimedia.org email (like exim email aliases), so those might be delayed a bit.

Fri, Aug 24, 9:10 AM · Patch-For-Review, SRE-Access-Requests, Discovery-Search (Current work), Operations
Gehel updated the task description for T202708: Onboarding Mathew Onipe.
Fri, Aug 24, 9:04 AM · Patch-For-Review, SRE-Access-Requests, Discovery-Search (Current work), Operations
Gehel renamed T202708: Onboarding Mathew Onipe from Add Mathew.onipe to #wmf-nda to Onboarding Mathew Onipe.
Fri, Aug 24, 7:58 AM · Patch-For-Review, SRE-Access-Requests, Discovery-Search (Current work), Operations

Aug 23 2018

Gehel added a comment to T202476: Give thiemowmde permission to upload wikidiff2 releases (releasers-wikidiff2).
  • I don't know what "WMCS" stands for, despite working with Wikimedia infrastructure for about a decade.
Aug 23 2018, 9:10 AM · Patch-For-Review, Operations, SRE-Access-Requests, User-Addshore, wikidiff2

Aug 22 2018

Gehel added a comment to T186732: Decide on Cache-Control headers for map tiles.

for reference, the mediawiki implementation of cache invalidation: https://github.com/wikimedia/mediawiki/blob/0ac1ee63e8b131576c8e9b703ed01ee5f9a377d1/includes/deferred/CdnCacheUpdate.php#L162

Aug 22 2018, 4:26 PM · Reading-Infrastructure-Team-Backlog (Kanban), Maps, Operations, Traffic, Maps-Sprint

Aug 21 2018

Gehel claimed T198351: Refactor puppet to support multiple elasticsearch instances on same node.

I'm taking over this task to review and deploy the remaining patches.

Aug 21 2018, 5:37 PM · Patch-For-Review, Discovery-Search (Current work)
Gehel added a comment to T186732: Decide on Cache-Control headers for map tiles.

Reducing the Varnish-level TTLs seems counter-productive for efficiency at all levels, in the long run. The right answer is probably to have tilerator purge tiles from Varnish when they're updated, and in the shorter term we could work on a simple methodology for banning "bad" content when isolated events happen.

Aug 21 2018, 4:39 PM · Reading-Infrastructure-Team-Backlog (Kanban), Maps, Operations, Traffic, Maps-Sprint
Gehel moved T184933: Use <maplink> or <mapframe> to view coordinates on Wikidata from All map-related tasks to Tracking on the Maps board.
Aug 21 2018, 3:37 PM · Wikidata-Campsite-Iteration-∞, Wikidata-Campsite, Patch-For-Review, Maps, Wikidata, Wikidata-Gadgets

Aug 20 2018

Gehel committed rDPOM5bfcc7741cc1: [maven-release-plugin] prepare for next development iteration (authored by Gehel).
[maven-release-plugin] prepare for next development iteration
Aug 20 2018, 4:50 PM
Gehel committed rDPOM396f44fb3ab8: [maven-release-plugin] prepare release discovery-parent-pom-1.18 (authored by Gehel).
[maven-release-plugin] prepare release discovery-parent-pom-1.18
Aug 20 2018, 4:50 PM
Gehel moved T193649: migrate elasticsearch to stretch (from jessie) from In progress to Done on the Discovery-Search (Current work) board.
Aug 20 2018, 4:38 PM · Patch-For-Review, Discovery-Search (Current work), Operations
Gehel moved T198391: migrate elasticsearch cirrus cluster to RAID0 from In progress to Done on the Discovery-Search (Current work) board.
Aug 20 2018, 4:37 PM · Discovery-Search (Current work), Patch-For-Review, Operations, Discovery
Gehel committed rDPOM79b24dfcd545: remove checksum plugin entirely (authored by Gehel).
remove checksum plugin entirely
Aug 20 2018, 4:14 PM
Gehel committed rDPOMb282f79e32b9: [maven-release-plugin] prepare for next development iteration (authored by Gehel).
[maven-release-plugin] prepare for next development iteration
Aug 20 2018, 3:26 PM
Gehel committed rDPOMf83630dff05e: [maven-release-plugin] prepare release discovery-parent-pom-1.17 (authored by Gehel).
[maven-release-plugin] prepare release discovery-parent-pom-1.17
Aug 20 2018, 3:26 PM
Gehel triaged T202297: Tests maps with new nodejs as High priority.
Aug 20 2018, 3:19 PM · Reading-Infrastructure-Team-Backlog (Kanban), Maps-Sprint, Maps (Tilerator)
Gehel added projects to T202297: Tests maps with new nodejs: Maps (Tilerator), Maps-Sprint.
Aug 20 2018, 3:19 PM · Reading-Infrastructure-Team-Backlog (Kanban), Maps-Sprint, Maps (Tilerator)
Gehel committed rDPOMfd95223f17f0: checksums should be generated before GPG, so that they are signed (authored by Gehel).
checksums should be generated before GPG, so that they are signed
Aug 20 2018, 3:09 PM
Gehel committed rDPOMa876ed664653: [maven-release-plugin] prepare for next development iteration (authored by Gehel).
[maven-release-plugin] prepare for next development iteration
Aug 20 2018, 2:31 PM
Gehel committed rDPOM7fdc959e82d8: [maven-release-plugin] prepare release discovery-parent-pom-1.16 (authored by Gehel).
[maven-release-plugin] prepare release discovery-parent-pom-1.16
Aug 20 2018, 2:31 PM
Gehel committed rDPOM513af963b9ac: checksums should be attached as build artifacts and published to central (authored by Gehel).
checksums should be attached as build artifacts and published to central
Aug 20 2018, 2:19 PM
Gehel added a comment to T202120: mjolnir-kafka-bulk-daemon failed on all elastic / eqiad nodes.

Restart=always on the systemd unit should fix the immediate issue. This has been deployed. I'm keeping this task open for a few more days, until we can validate that the issue is not reproduced.

Aug 20 2018, 1:51 PM · Patch-For-Review, Operations, Discovery-Search (Current work)
Gehel committed rDPOM189a8bee758b: [maven-release-plugin] prepare for next development iteration (authored by Gehel).
[maven-release-plugin] prepare for next development iteration
Aug 20 2018, 12:48 PM
Gehel committed rDPOM70b0856ef4cb: Add SHA-512 checksums (authored by dcausse).
Add SHA-512 checksums
Aug 20 2018, 12:48 PM
Gehel committed rDPOMd33987e60a83: [maven-release-plugin] prepare release discovery-parent-pom-1.15 (authored by Gehel).
[maven-release-plugin] prepare release discovery-parent-pom-1.15
Aug 20 2018, 12:48 PM
Gehel committed rDPOMaa604793576d: Update dependencies to latest. (authored by Gehel).
Update dependencies to latest.
Aug 20 2018, 12:48 PM
Gehel updated the task description for T198622: migrate maps servers to stretch with the current style.
Aug 20 2018, 12:13 PM · Patch-For-Review, Reading-Infrastructure-Team-Backlog, Maps-Sprint, Operations, Maps
Gehel committed rDMTC58566ce54e0f: [maven-release-plugin] prepare release discovery-maven-tool-configs-1.6 (authored by Gehel).
[maven-release-plugin] prepare release discovery-maven-tool-configs-1.6
Aug 20 2018, 10:51 AM
Gehel committed rDMTC4a225126c156: [maven-release-plugin] prepare for next development iteration (authored by Gehel).
[maven-release-plugin] prepare for next development iteration
Aug 20 2018, 10:51 AM
Gehel closed T177631: check elastic1022 power supply redundancy as Resolved.

It looks like a reset of the management interface fixed the reporting issue to ipmi-sensors:

Aug 20 2018, 7:36 AM · Elasticsearch, ops-eqiad, Discovery-Search, Discovery, Operations

Aug 17 2018

Gehel added a comment to T202120: mjolnir-kafka-bulk-daemon failed on all elastic / eqiad nodes.

There seem to be some correlation with a high number of failed relocations that happened just before mjolnir failing (see [[ URL | logstash ]]). No idea if there is a causality here.

Aug 17 2018, 8:35 AM · Patch-For-Review, Operations, Discovery-Search (Current work)
Gehel triaged T202120: mjolnir-kafka-bulk-daemon failed on all elastic / eqiad nodes as High priority.
Aug 17 2018, 8:09 AM · Patch-For-Review, Operations, Discovery-Search (Current work)
Gehel created T202120: mjolnir-kafka-bulk-daemon failed on all elastic / eqiad nodes.
Aug 17 2018, 8:09 AM · Patch-For-Review, Operations, Discovery-Search (Current work)

Aug 16 2018

Gehel created P7464 (An Untitled Masterwork).
Aug 16 2018, 6:37 PM
Gehel added a comment to T177631: check elastic1022 power supply redundancy.

@Cmjohnson confirms that there is still nothing in the H/W logs and the PSU seem to work correctly. IPMI reporting a false positive is still the most likely explanation, we still need to understand why.

Aug 16 2018, 5:45 PM · Elasticsearch, Discovery-Search, ops-eqiad, Operations, Discovery
Gehel closed T201991: Broken memory on elastic1029 as Resolved.

Looking good!

Aug 16 2018, 5:39 PM · Operations, ops-eqiad
Gehel created P7462 (An Untitled Masterwork).
Aug 16 2018, 9:04 AM

Aug 15 2018

Gehel created T201986: cassandra-a instance on aqs1007 is not starting.
Aug 15 2018, 8:13 AM · Cassandra, Operations