Page MenuHomePhabricator

Mathew.onipe (onimisionipe)
User

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Sunday

  • Clear sailing ahead.

User Details

User Since
Aug 22 2018, 7:21 AM (60 w, 2 d)
Availability
Available
LDAP User
Mathew.onipe
MediaWiki User
Unknown

Recent Activity

Wed, Oct 16

Mathew.onipe added a comment to T235540: SPARQL query causes StackOverflowError and fails to execute.
Wed, Oct 16, 12:09 PM · Wikidata, Wikidata-Query-Service

Thu, Oct 10

Mathew.onipe triaged T235159: Enable write access for Mathew.onipe(onimisionipe) and gehel on wikidata gui repo as Normal priority.
Thu, Oct 10, 9:37 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata, Gerrit-Privilege-Requests
Mathew.onipe created T235159: Enable write access for Mathew.onipe(onimisionipe) and gehel on wikidata gui repo.
Thu, Oct 10, 9:37 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata, Gerrit-Privilege-Requests

Wed, Oct 9

Mathew.onipe committed rMSKA217ddd64b9d3: allow npm install devDeps (authored by Mathew.onipe).
allow npm install devDeps
Wed, Oct 9, 10:35 PM
Mathew.onipe committed rMSKAa6bafe419667: Add copies directive to build stage (authored by Mathew.onipe).
Add copies directive to build stage
Wed, Oct 9, 10:35 PM
Mathew.onipe added a comment to T233316: Deployment Pipeline fails with CPS error for Kartotherian.

@dduvall Thanks!. I removed the test stage also forced devdeps to install. We should definitely look at a better way to handle this later. but Its fine as it is.
Currently, Build is passing but not publishing yet. Do we need to enable CI publish stage for the repo?

Wed, Oct 9, 10:31 PM · Release-Engineering-Team-TODO (201910), Maps (Kartotherian), Release Pipeline, Release-Engineering-Team (Pipeline)
Mathew.onipe added a comment to T233316: Deployment Pipeline fails with CPS error for Kartotherian.

@dduvall Thanks. I will implement this.

Wed, Oct 9, 11:24 AM · Release-Engineering-Team-TODO (201910), Maps (Kartotherian), Release Pipeline, Release-Engineering-Team (Pipeline)

Thu, Oct 3

Mathew.onipe added a comment to T233403: Unassigned shards in eqiad.

This issue has come up again. Currently, we have only enwiki_content_1546970425 unassigned with too many shards [1] allocated to this node for index [enwiki_content_1546970425], index setting index.routing.allocation.total_shards_per_node=1] error from _cluster/allocation/explain.

Thu, Oct 3, 1:05 PM · Discovery-Search, Operations, Elasticsearch

Tue, Oct 1

Mathew.onipe added a comment to T233316: Deployment Pipeline fails with CPS error for Kartotherian.

Post merge builds seems to fail.
https://gerrit.wikimedia.org/r/c/mediawiki/services/kartotherian/+/539209

Tue, Oct 1, 12:31 AM · Release-Engineering-Team-TODO (201910), Maps (Kartotherian), Release Pipeline, Release-Engineering-Team (Pipeline)

Tue, Sep 24

Mathew.onipe added a comment to T225125: Migrate Elasticsearch from deprecated Gelf logstash input to rsyslog Kafka logging pipeline.

We should talk to elastic to see how we can move this forward.
Currently, we require jackson-databind 2.8.11 and jackson-annotation 2.8.11 for JsonLayout to work when using SyslogAppender. Version 2.8.6 is provided by debian for this packages. We should use the correct version to make sure everything work as expected.

Tue, Sep 24, 5:49 PM · Patch-For-Review, Discovery-Search (Current work), observability, Elasticsearch, Operations, Wikimedia-Logstash
Mathew.onipe moved T225125: Migrate Elasticsearch from deprecated Gelf logstash input to rsyslog Kafka logging pipeline from Needs review to Blocked on the Discovery-Search (Current work) board.
Tue, Sep 24, 5:45 PM · Patch-For-Review, Discovery-Search (Current work), observability, Elasticsearch, Operations, Wikimedia-Logstash
Mathew.onipe moved T232184: MIgrate WDQS to new logging pipeline from Waiting to Done on the Discovery-Search (Current work) board.
Tue, Sep 24, 5:40 PM · Discovery-Search (Current work), Wikimedia-Logstash, Operations, observability, Product-Analytics, Discovery-Analysis (Current work), Wikidata-Query-Service, Wikidata

Mon, Sep 23

Mathew.onipe triaged T233578: hw troubleshooting: Memory correctable errors -EDAC- for elastic1029.eqiad.wmnet as Normal priority.
Mon, Sep 23, 8:06 AM · Operations, ops-eqiad, DC-Ops
Mathew.onipe created T233578: hw troubleshooting: Memory correctable errors -EDAC- for elastic1029.eqiad.wmnet.
Mon, Sep 23, 8:05 AM · Operations, ops-eqiad, DC-Ops

Fri, Sep 20

Mathew.onipe created T233403: Unassigned shards in eqiad.
Fri, Sep 20, 12:51 PM · Discovery-Search, Operations, Elasticsearch

Sep 18 2019

Mathew.onipe moved T232184: MIgrate WDQS to new logging pipeline from in progress to Waiting on the Discovery-Search (Current work) board.
Sep 18 2019, 1:23 PM · Discovery-Search (Current work), Wikimedia-Logstash, Operations, observability, Product-Analytics, Discovery-Analysis (Current work), Wikidata-Query-Service, Wikidata

Sep 16 2019

Mathew.onipe created T233039: hw troubleshooting: <type of hardware failre> for <fqhn of server>.
Sep 16 2019, 5:53 PM · DC-Ops
Mathew.onipe closed T201991: Broken memory on elastic1029 as Resolved.
Sep 16 2019, 5:50 PM · Operations, ops-eqiad
Mathew.onipe reopened T201991: Broken memory on elastic1029 as "Open".
Sep 16 2019, 5:46 PM · Operations, ops-eqiad

Sep 12 2019

Mathew.onipe added a comment to T176875: Allow access to wdqs.svc.eqiad.wmnet on port 8888.

@Ladsgroup there's no TLS termination on that port for now. We should have and I will work on it in the nearest future. Please use HTTP for now

Sep 12 2019, 10:24 AM · Patch-For-Review, Traffic, Wikidata-Query-Service, Operations, WMDE-Analytics-Engineering, User-Addshore, Discovery, Wikidata
Mathew.onipe added a comment to T176875: Allow access to wdqs.svc.eqiad.wmnet on port 8888.

@Addshore @Ladsgroup @WMDE-leszek, can you test that you can reach wdqs.svc.eqiad.wmnet on port 8888. LVS and other appropriate changes have been merged and It should work. Thanks

Sep 12 2019, 8:54 AM · Patch-For-Review, Traffic, Wikidata-Query-Service, Operations, WMDE-Analytics-Engineering, User-Addshore, Discovery, Wikidata

Sep 11 2019

Mathew.onipe updated the task description for T232297: Create puppet configs for SDC query.
Sep 11 2019, 6:45 AM · Discovery-Wikidata-Query-Service-Sprint, Patch-For-Review, Structured-Data-Backlog, Structured Data Engineering, Operations, Discovery-Search (Current work), SDC General, Wikidata

Sep 10 2019

Mathew.onipe moved T225125: Migrate Elasticsearch from deprecated Gelf logstash input to rsyslog Kafka logging pipeline from in progress to Needs review on the Discovery-Search (Current work) board.
Sep 10 2019, 1:50 PM · Patch-For-Review, Discovery-Search (Current work), observability, Elasticsearch, Operations, Wikimedia-Logstash
Mathew.onipe added a project to T232184: MIgrate WDQS to new logging pipeline: Discovery-Search (Current work).
Sep 10 2019, 3:12 AM · Discovery-Search (Current work), Wikimedia-Logstash, Operations, observability, Product-Analytics, Discovery-Analysis (Current work), Wikidata-Query-Service, Wikidata

Sep 9 2019

Mathew.onipe updated the task description for T232297: Create puppet configs for SDC query.
Sep 9 2019, 2:49 PM · Discovery-Wikidata-Query-Service-Sprint, Patch-For-Review, Structured-Data-Backlog, Structured Data Engineering, Operations, Discovery-Search (Current work), SDC General, Wikidata
Mathew.onipe updated the task description for T232297: Create puppet configs for SDC query.
Sep 9 2019, 2:46 PM · Discovery-Wikidata-Query-Service-Sprint, Patch-For-Review, Structured-Data-Backlog, Structured Data Engineering, Operations, Discovery-Search (Current work), SDC General, Wikidata
Mathew.onipe updated the task description for T232297: Create puppet configs for SDC query.
Sep 9 2019, 2:42 PM · Discovery-Wikidata-Query-Service-Sprint, Patch-For-Review, Structured-Data-Backlog, Structured Data Engineering, Operations, Discovery-Search (Current work), SDC General, Wikidata
Mathew.onipe added a project to T232297: Create puppet configs for SDC query: Operations.
Sep 9 2019, 2:36 PM · Discovery-Wikidata-Query-Service-Sprint, Patch-For-Review, Structured-Data-Backlog, Structured Data Engineering, Operations, Discovery-Search (Current work), SDC General, Wikidata
Mathew.onipe updated the task description for T232297: Create puppet configs for SDC query.
Sep 9 2019, 2:21 PM · Discovery-Wikidata-Query-Service-Sprint, Patch-For-Review, Structured-Data-Backlog, Structured Data Engineering, Operations, Discovery-Search (Current work), SDC General, Wikidata
Mathew.onipe triaged T232297: Create puppet configs for SDC query as Normal priority.
Sep 9 2019, 7:42 AM · Discovery-Wikidata-Query-Service-Sprint, Patch-For-Review, Structured-Data-Backlog, Structured Data Engineering, Operations, Discovery-Search (Current work), SDC General, Wikidata
Mathew.onipe created T232297: Create puppet configs for SDC query.
Sep 9 2019, 7:41 AM · Discovery-Wikidata-Query-Service-Sprint, Patch-For-Review, Structured-Data-Backlog, Structured Data Engineering, Operations, Discovery-Search (Current work), SDC General, Wikidata

Sep 6 2019

Mathew.onipe added a comment to T232224: September 2019 DoS attacks [Public].

This is a know issue. The SRE team is finding a quick solution to restore these services. Thanks

Sep 6 2019, 6:22 PM · Wikimedia-Incident, Operations
Mathew.onipe added a comment to T225125: Migrate Elasticsearch from deprecated Gelf logstash input to rsyslog Kafka logging pipeline.

JsonLayout requires other dependencies for log4j. This include jackson databind. See https://logging.apache.org/log4j/2.x/runtime-dependencies.html.
Two options:

  1. Rebuild log4j with this dependencies
  2. Fall back to shipping logs with PatternLayout.
Sep 6 2019, 11:19 AM · Patch-For-Review, Discovery-Search (Current work), observability, Elasticsearch, Operations, Wikimedia-Logstash
Mathew.onipe updated subscribers of T228483: Delete (rather than archive) the maps/kartotherian and maps/tilerator repos.

Let's wait for @MSantos or @Mholloway opinion before deleting those repos please

Sep 6 2019, 7:40 AM · User-MarcoAurelio, Release-Engineering-Team-TODO, Release-Engineering-Team (Development services), Repository-Admins, Maps, Cleanup
Mathew.onipe triaged T232184: MIgrate WDQS to new logging pipeline as Normal priority.
Sep 6 2019, 7:36 AM · Discovery-Search (Current work), Wikimedia-Logstash, Operations, observability, Product-Analytics, Discovery-Analysis (Current work), Wikidata-Query-Service, Wikidata
Mathew.onipe created T232184: MIgrate WDQS to new logging pipeline.
Sep 6 2019, 7:36 AM · Discovery-Search (Current work), Wikimedia-Logstash, Operations, observability, Product-Analytics, Discovery-Analysis (Current work), Wikidata-Query-Service, Wikidata

Sep 4 2019

Mathew.onipe added a comment to T231928: CI service-pipeline-test-and-publish job assumes blubber config has a single production image.

Not sure but seems we are missing some configs in our config.yaml patch

Sep 4 2019, 11:20 AM · Release-Engineering-Team (Pipeline), Release-Engineering-Team-TODO, Maps (Kartotherian), Product-Infrastructure-Team-Backlog

Sep 3 2019

Mathew.onipe added a comment to T225125: Migrate Elasticsearch from deprecated Gelf logstash input to rsyslog Kafka logging pipeline.

rsyslog Json requires the @cee token which must be provided according to standard via profile::rsyslog::udp_localhost_compat. Let's use profile::rsyslog::udp_json_logback_compat instead as it permits parsing of json from log4j without the token.

Sep 3 2019, 3:50 PM · Patch-For-Review, Discovery-Search (Current work), observability, Elasticsearch, Operations, Wikimedia-Logstash

Sep 2 2019

Mathew.onipe moved T231516: Alert when a jvm hits more than 100 old gc ops/hour from Needs review to Done on the Discovery-Search (Current work) board.
Sep 2 2019, 9:33 AM · observability, Operations, Discovery-Search (Current work)

Aug 29 2019

Mathew.onipe added a comment to T231516: Alert when a jvm hits more than 100 old gc ops/hour.

On another note, I think this check make sense for other clusters as well

Aug 29 2019, 11:10 AM · observability, Operations, Discovery-Search (Current work)
Mathew.onipe edited projects for T231516: Alert when a jvm hits more than 100 old gc ops/hour, added: Discovery-Search (Current work), Operations; removed Discovery-Search.
Aug 29 2019, 8:32 AM · observability, Operations, Discovery-Search (Current work)
Mathew.onipe claimed T231516: Alert when a jvm hits more than 100 old gc ops/hour.
Aug 29 2019, 8:31 AM · observability, Operations, Discovery-Search (Current work)
Mathew.onipe reopened T214283: Memory correctable errors -EDAC- elastic1029 as "Open".

elastic1029 is back on icinga showing memory errors. see https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=elastic1029&service=Memory+correctable+errors+-EDAC-

Aug 29 2019, 3:03 AM · Discovery-Search (Current work), ops-eqiad, Discovery, DC-Ops, Operations

Aug 28 2019

Mathew.onipe added a comment to P8995 Khmer samples.

My screenshot from Windows 10/Version 76.0.3809.100 (Official Build) (64-bit)

Aug 28 2019, 2:44 PM · Discovery-Search
Mathew.onipe triaged T231446: Reindex commonswiki as shards have grown beyond critical threshold as Normal priority.
Aug 28 2019, 1:03 PM · Discovery-Search, Patch-For-Review, Operations, Elasticsearch
Mathew.onipe created T231446: Reindex commonswiki as shards have grown beyond critical threshold.
Aug 28 2019, 1:03 PM · Discovery-Search, Patch-For-Review, Operations, Elasticsearch
Mathew.onipe added a comment to T230774: Run jstack / jmap / etc... with PrivateTmp=true.

@Gehel I think you meant: https://wikitech.wikimedia.org/wiki/Search#Using_jstack_or_jmap_or_other_similar_tools_to_view_logs

Aug 28 2019, 3:47 AM · Discovery-Search (Current work), Operations
Mathew.onipe added a comment to T229980: Need help to create and deploy Debian-packaged Python 3 app.

I changed the priority of this to normal. Feel free to change it as you see fit

Aug 28 2019, 3:27 AM · serviceops, Operations, Packaging, CPT Initiatives (Session Management Service (CDP2))
Mathew.onipe triaged T229980: Need help to create and deploy Debian-packaged Python 3 app as Normal priority.
Aug 28 2019, 3:26 AM · serviceops, Operations, Packaging, CPT Initiatives (Session Management Service (CDP2))
Mathew.onipe updated subscribers of T231274: Have a link to the alert in the icinga alert email.
Aug 28 2019, 3:25 AM · Icinga, observability, Operations
Mathew.onipe triaged T231274: Have a link to the alert in the icinga alert email as Normal priority.
Aug 28 2019, 3:25 AM · Icinga, observability, Operations

Aug 27 2019

Mathew.onipe claimed T225125: Migrate Elasticsearch from deprecated Gelf logstash input to rsyslog Kafka logging pipeline.
Aug 27 2019, 5:18 PM · Patch-For-Review, Discovery-Search (Current work), observability, Elasticsearch, Operations, Wikimedia-Logstash
Mathew.onipe triaged T231010: Change partitioning scheme for elasticsearch from RAID to JBOD as Normal priority.
Aug 27 2019, 2:51 AM · Operations, Discovery-Search, Elasticsearch
Mathew.onipe updated subscribers of T230994: labweb100[12]: Search backend error during get of .[array] after 0: unknown: No enabled connection.
Aug 27 2019, 2:45 AM · Operations, Discovery-Search (Current work), CirrusSearch
Mathew.onipe assigned T230994: labweb100[12]: Search backend error during get of .[array] after 0: unknown: No enabled connection to dcausse.
Aug 27 2019, 2:43 AM · Operations, Discovery-Search (Current work), CirrusSearch
Mathew.onipe added a comment to T227529: Request rename of "waldir" to "waldyrious" on LDAP.

@thcipriani any update on this? seems stalled or partially resolved.

Aug 27 2019, 2:42 AM · LDAP-Access-Requests
Mathew.onipe added a comment to T227695: Requesting access to analytics-privatedata-users for mbsantos.

@MSantos
what's the latest on this? Do you want to follow up on Nuria?

Aug 27 2019, 2:35 AM · Operations, SRE-Access-Requests
Mathew.onipe closed T231111: Access to HUE for cchen as Resolved.

I'm guessing everyone is happy so I'm going to close this.

Aug 27 2019, 2:32 AM · Analytics-Kanban, SRE-Access-Requests, Operations, Analytics

Aug 26 2019

Mathew.onipe edited projects for T225125: Migrate Elasticsearch from deprecated Gelf logstash input to rsyslog Kafka logging pipeline, added: Discovery-Search (Current work); removed Discovery-Search.
Aug 26 2019, 2:26 PM · Patch-For-Review, Discovery-Search (Current work), observability, Elasticsearch, Operations, Wikimedia-Logstash
Mathew.onipe added a comment to T231009: Make jobprocessor's test not depend on external files.

@Joe here is the error: https://integration.wikimedia.org/ci/blue/organizations/jenkins/service-pipeline-test/detail/service-pipeline-test-and-publish/3100/pipeline/
Thanks!

Aug 26 2019, 9:20 AM · Release Pipeline, Operations, Maps (Kartotherian)

Aug 22 2019

Mathew.onipe created T231010: Change partitioning scheme for elasticsearch from RAID to JBOD.
Aug 22 2019, 2:05 PM · Operations, Discovery-Search, Elasticsearch
Mathew.onipe created T231009: Make jobprocessor's test not depend on external files.
Aug 22 2019, 1:52 PM · Release Pipeline, Operations, Maps (Kartotherian)
Mathew.onipe created T231006: Create helm chart for kartotherian k8s deployment.
Aug 22 2019, 1:31 PM · Patch-For-Review, Operations, Maps (Kartotherian)

Aug 16 2019

Mathew.onipe added a comment to T230597: can't SSH to elastic2050.mgmt .

@Papaul On second thought, we have other servers and losing one elastic node is Ok. So this should be set to normal

Aug 16 2019, 3:48 PM · ops-codfw, DC-Ops, Discovery-Search (Current work), Operations
Mathew.onipe lowered the priority of T230597: can't SSH to elastic2050.mgmt from High to Normal.
Aug 16 2019, 3:47 PM · ops-codfw, DC-Ops, Discovery-Search (Current work), Operations
Mathew.onipe triaged T230597: can't SSH to elastic2050.mgmt as High priority.
Aug 16 2019, 10:12 AM · ops-codfw, DC-Ops, Discovery-Search (Current work), Operations
Mathew.onipe added a project to T230597: can't SSH to elastic2050.mgmt : DC-Ops.
Aug 16 2019, 10:12 AM · ops-codfw, DC-Ops, Discovery-Search (Current work), Operations
Mathew.onipe updated the task description for T230597: can't SSH to elastic2050.mgmt .
Aug 16 2019, 8:08 AM · ops-codfw, DC-Ops, Discovery-Search (Current work), Operations
Mathew.onipe created T230597: can't SSH to elastic2050.mgmt .
Aug 16 2019, 8:06 AM · ops-codfw, DC-Ops, Discovery-Search (Current work), Operations

Aug 14 2019

Mathew.onipe closed T220946: Create cookbook for postgres initialization on maps cluster, a subtask of T203943: Spicerack cookbooks TODO list, as Resolved.
Aug 14 2019, 2:49 PM · SRE-tools, User-jijiki, User-Joe, Operations
Mathew.onipe closed T220946: Create cookbook for postgres initialization on maps cluster as Resolved.
Aug 14 2019, 2:49 PM · Maps, SRE-tools, User-jijiki, User-Joe, Operations
Mathew.onipe closed T224874: Maps2004 ran into disk space issues again after reimaging with new partitioning scheme, a subtask of T224395: Maps[12]004 /srv disk space is critical, as Resolved.
Aug 14 2019, 2:48 PM · Operations, Maps
Mathew.onipe closed T224874: Maps2004 ran into disk space issues again after reimaging with new partitioning scheme as Resolved.

This was traced to some initial problems during osm-initial-script. This was resolved by reinitializing osm again.

Aug 14 2019, 2:48 PM · Operations, Maps
Mathew.onipe closed T226161: Change maps codfw replication factor for v4 keyspace as Resolved.
Aug 14 2019, 2:45 PM · Operations, Maps
Mathew.onipe added a comment to T230366: Icinga reports read time out error for some checks on cloudelastic cluster.

After some conversation with @EBernhardson, it was discovered dump are currently being loaded into the cloudelastic cluster (https://phabricator.wikimedia.org/T220625) and this might be related to the slow response time. There's a heavy indexing going on this cluster (9200). This causes icinga alerts requests to timeout.
Also we think this slow response time should not impact users.

Aug 14 2019, 2:27 PM · Operations, Discovery-Search (Current work), Elasticsearch
Mathew.onipe added a project to T230366: Icinga reports read time out error for some checks on cloudelastic cluster: Operations.
Aug 14 2019, 2:13 PM · Operations, Discovery-Search (Current work), Elasticsearch
Mathew.onipe edited projects for T230366: Icinga reports read time out error for some checks on cloudelastic cluster, added: Discovery-Search (Current work); removed Discovery-Search.
Aug 14 2019, 2:12 PM · Operations, Discovery-Search (Current work), Elasticsearch
Mathew.onipe triaged T230366: Icinga reports read time out error for some checks on cloudelastic cluster as Normal priority.
Aug 14 2019, 2:12 PM · Operations, Discovery-Search (Current work), Elasticsearch
Mathew.onipe moved T229621: Icinga check defined from LVS configuration for cloudelastic are borked from in progress to Done on the Discovery-Search (Current work) board.
Aug 14 2019, 2:12 PM · Patch-For-Review, Discovery-Search (Current work), Elasticsearch, Traffic, Operations
Mathew.onipe added a comment to T229621: Icinga check defined from LVS configuration for cloudelastic are borked.

This issue is solved for now and cloudelastic checks for all ports have been generated on icinga. However, only IPv4 checks were generated and this is Ok for now. If there's need to generate IPv6 checks, we can always reopen this task

Aug 14 2019, 2:11 PM · Patch-For-Review, Discovery-Search (Current work), Elasticsearch, Traffic, Operations

Aug 13 2019

Mathew.onipe created T230409: Spicerack: extend elasticsearch_cluster module by allowing us to wait for write queue to go empty.
Aug 13 2019, 11:08 AM · Discovery-Search, Elasticsearch

Aug 12 2019

Mathew.onipe created T230366: Icinga reports read time out error for some checks on cloudelastic cluster.
Aug 12 2019, 4:36 PM · Operations, Discovery-Search (Current work), Elasticsearch

Aug 9 2019

Mathew.onipe added a comment to T229621: Icinga check defined from LVS configuration for cloudelastic are borked.

@jbond Thank you!
You fix is way better than mine. I will look at the patch now

Aug 9 2019, 2:36 PM · Patch-For-Review, Discovery-Search (Current work), Elasticsearch, Traffic, Operations

Aug 8 2019

Mathew.onipe triaged T230088: cloudelastic1002: SMART/disk error as Normal priority.
Aug 8 2019, 2:53 AM · ops-eqiad, DC-Ops, Operations, cloud-services-team (Kanban)
Mathew.onipe created T230088: cloudelastic1002: SMART/disk error.
Aug 8 2019, 2:52 AM · ops-eqiad, DC-Ops, Operations, cloud-services-team (Kanban)

Aug 6 2019

Mathew.onipe claimed T229621: Icinga check defined from LVS configuration for cloudelastic are borked.
Aug 6 2019, 5:44 PM · Patch-For-Review, Discovery-Search (Current work), Elasticsearch, Traffic, Operations
Mathew.onipe added a comment to T223275: Create blubberfile for deploying kartotherian into docker environment..

@MSantos Thank you!

Aug 6 2019, 2:57 PM · Release Pipeline, Operations, Maps (Kartotherian)
Mathew.onipe edited projects for T229621: Icinga check defined from LVS configuration for cloudelastic are borked, added: Discovery-Search (Current work); removed Discovery-Search.
Aug 6 2019, 7:24 AM · Patch-For-Review, Discovery-Search (Current work), Elasticsearch, Traffic, Operations
Mathew.onipe closed T225904: Mjolnir bulk update failure check - eqiad as Resolved.
Aug 6 2019, 7:22 AM · Discovery-Search
Mathew.onipe closed T229788: postgresql replication issues on maps1001 as Resolved.

Postgres reinitialization was performed to bring this slave back up. I'll close this task for now and investigate more if it re-occurs.

Aug 6 2019, 7:18 AM · Maps, Operations
Mathew.onipe closed T229861: Can't reach cloudelastic.wikimedia.org via IPv6 as Resolved.
Aug 6 2019, 7:07 AM · Operations, Traffic, Discovery-Search (Current work)

Aug 5 2019

Mathew.onipe triaged T229788: postgresql replication issues on maps1001 as High priority.
Aug 5 2019, 5:56 PM · Maps, Operations
Mathew.onipe added a comment to T229788: postgresql replication issues on maps1001.

running select * from pg_stat_wal_receiver; on maps1001 returns empty. This means postgres slave is not receiving update from master. Also master only show two nodes connected instead of three:

Aug 5 2019, 5:56 PM · Maps, Operations
Mathew.onipe updated subscribers of T229861: Can't reach cloudelastic.wikimedia.org via IPv6.
Aug 5 2019, 5:43 PM · Operations, Traffic, Discovery-Search (Current work)
Mathew.onipe triaged T229861: Can't reach cloudelastic.wikimedia.org via IPv6 as Normal priority.
Aug 5 2019, 5:30 PM · Operations, Traffic, Discovery-Search (Current work)
Mathew.onipe created T229861: Can't reach cloudelastic.wikimedia.org via IPv6.
Aug 5 2019, 5:30 PM · Operations, Traffic, Discovery-Search (Current work)
Mathew.onipe added a comment to T229621: Icinga check defined from LVS configuration for cloudelastic are borked.

Sadly, I don't think this will work as the host param will not be unique and icinga does not seem to handle that well. Another option might be to create more CNAMEs or more A-records like we have for git and git-ssh here: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/dns/+/master/templates/wikimedia.org#336

Aug 5 2019, 1:26 PM · Patch-For-Review, Discovery-Search (Current work), Elasticsearch, Traffic, Operations
Mathew.onipe triaged T229621: Icinga check defined from LVS configuration for cloudelastic are borked as Normal priority.
Aug 5 2019, 12:56 PM · Patch-For-Review, Discovery-Search (Current work), Elasticsearch, Traffic, Operations
Mathew.onipe added a comment to T229621: Icinga check defined from LVS configuration for cloudelastic are borked.

@BBlack yea yea.. I've missed your musing on complex system. Thanks. I will make a patch

Aug 5 2019, 12:56 PM · Patch-For-Review, Discovery-Search (Current work), Elasticsearch, Traffic, Operations
Mathew.onipe added a comment to T229621: Icinga check defined from LVS configuration for cloudelastic are borked.

About cloudelastic resolving to icinga1001, I had jbond help me do see where it cloudelastic.wikimedia.org resolves to and it seems to be resolving to the correct IP.
@Vgutierrez we could remove the icinga part of the configuration in configuration.yaml file and define the checks in lvs::monitor_services instead. I think that should work.

Aug 5 2019, 12:35 PM · Patch-For-Review, Discovery-Search (Current work), Elasticsearch, Traffic, Operations