elukey (Luca Toscano)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Jan 5 2016, 9:54 PM (106 w, 4 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
LToscano (WMF)

Recent Activity

Fri, Jan 19

elukey added a comment to T181036: Pull netflow data in realtime from Kafka via Tranquillity/Spark.

HMMM. If this is JSON data, and the schema is consistent, we could use JSONRefine to build the table, rather than doing all those Hive table/oozie job steps.

Fri, Jan 19, 5:56 PM · Analytics-Kanban, User-Elukey, monitoring, netops, Operations
elukey added a comment to T181036: Pull netflow data in realtime from Kafka via Tranquillity/Spark.

Thanks for the explanation!

Fri, Jan 19, 5:48 PM · Analytics-Kanban, User-Elukey, monitoring, netops, Operations
elukey closed T184788: mw2140 unresponsive, mgmt not accessible as Resolved.

Pooled and working correctly, closing!

Fri, Jan 19, 5:30 PM · Patch-For-Review, ops-codfw, Operations
elukey added a comment to T166248: Upgrade Analytics Cluster to Java 8.

Andrew and Joseph completed a test in labs to verify that Druid running on Java 7 would still work fine with Hadoop running java 8, and no surprises came up.

Fri, Jan 19, 3:53 PM · Patch-For-Review, Analytics-Kanban, User-Elukey, Analytics-Cluster
elukey claimed T185291: Verify duplicate entry warnings logged by the m4 mysql consumer.
Fri, Jan 19, 12:21 PM · Analytics-Kanban, User-Elukey, Analytics-EventLogging
elukey moved T185291: Verify duplicate entry warnings logged by the m4 mysql consumer from Next Up to In Progress on the Analytics-Kanban board.
Fri, Jan 19, 12:21 PM · Analytics-Kanban, User-Elukey, Analytics-EventLogging
elukey edited projects for T185291: Verify duplicate entry warnings logged by the m4 mysql consumer, added: Analytics-Kanban; removed Analytics.
Fri, Jan 19, 12:21 PM · Analytics-Kanban, User-Elukey, Analytics-EventLogging
elukey added a comment to T185291: Verify duplicate entry warnings logged by the m4 mysql consumer.

Just tested the use case in the description on stat1004 with:

Fri, Jan 19, 12:20 PM · Analytics-Kanban, User-Elukey, Analytics-EventLogging
elukey added a comment to T185291: Verify duplicate entry warnings logged by the m4 mysql consumer.

So the processor does event['uuid'] = capsule_uuid(event) that is defined like this:

Fri, Jan 19, 12:10 PM · Analytics-Kanban, User-Elukey, Analytics-EventLogging
elukey added a comment to T166248: Upgrade Analytics Cluster to Java 8.

For Spark I believe that update-java-alternatives is enough to force it to pick up java8. I found this in one of the scripts called by spark(-2)*-shell:

Fri, Jan 19, 11:08 AM · Patch-For-Review, Analytics-Kanban, User-Elukey, Analytics-Cluster
elukey added a comment to T166248: Upgrade Analytics Cluster to Java 8.

Draft of the upgrade plan in https://etherpad.wikimedia.org/p/analytics-hadoop-java8

Fri, Jan 19, 9:42 AM · Patch-For-Review, Analytics-Kanban, User-Elukey, Analytics-Cluster
elukey added a comment to T185291: Verify duplicate entry warnings logged by the m4 mysql consumer.

One thing that I noticed is that when a burst of warning happens the following is registered around the same time in syslog:

Fri, Jan 19, 8:21 AM · Analytics-Kanban, User-Elukey, Analytics-EventLogging
elukey triaged T185291: Verify duplicate entry warnings logged by the m4 mysql consumer as Normal priority.
Fri, Jan 19, 8:08 AM · Analytics-Kanban, User-Elukey, Analytics-EventLogging

Thu, Jan 18

elukey updated subscribers of T181036: Pull netflow data in realtime from Kafka via Tranquillity/Spark.

Had an interesting chat with @ayounsi and for the moment it seems that the only format expected in the netflow topic will be: tag,dst_as,as_path,peer_dst_as

Thu, Jan 18, 6:52 PM · Analytics-Kanban, User-Elukey, monitoring, netops, Operations
elukey set the point value for T108850: Set up auto-purging after 90 days {tick} to 0.
Thu, Jan 18, 5:02 PM · User-Elukey, Analytics, Patch-For-Review, DBA
elukey closed T108850: Set up auto-purging after 90 days {tick} as Resolved.
Thu, Jan 18, 5:02 PM · User-Elukey, Analytics, Patch-For-Review, DBA
elukey closed T108850: Set up auto-purging after 90 days {tick}, a subtask of T104877: Enforce policy for each schema: Sanitize {tick} [8 pts], as Resolved.
Thu, Jan 18, 5:02 PM · Analytics-Kanban
elukey closed T108850: Set up auto-purging after 90 days {tick}, a subtask of T102224: {tick} Schema Audit, as Resolved.
Thu, Jan 18, 5:02 PM · Analytics-EventLogging, Analytics-Kanban
elukey closed T184482: analytics VPS project puppet errors as Resolved.

Closing the task since puppet should be ok now, please re-open otherwise!

Thu, Jan 18, 3:42 PM · Analytics-Kanban, User-Elukey, Puppet
elukey moved T184482: analytics VPS project puppet errors from Analytics Backlog to In Progress on the User-Elukey board.
Thu, Jan 18, 3:12 PM · Analytics-Kanban, User-Elukey, Puppet
elukey added a comment to T184482: analytics VPS project puppet errors.

Fixed all except j1.analytics.eqiad.wmflabs - @Ottomata do we still need this? It seems running superset, and puppet is broken in there..

Thu, Jan 18, 2:32 PM · Analytics-Kanban, User-Elukey, Puppet
elukey added a comment to T170740: PuppetDB misbehaving on 2017-07-15.

The puppetdb grafana dashboard (and its related monitoring config for nitrogen/nihal) were added in https://phabricator.wikimedia.org/T184796

Thu, Jan 18, 11:54 AM · Patch-For-Review, Puppet, Operations
elukey closed T184796: Configure puppetdb to export metrics via Prometheus JMX Agent as Resolved.

Closing task since https://grafana.wikimedia.org/dashboard/db/puppetdb is almost a replica of the puppetdb localhost one.

Thu, Jan 18, 11:53 AM · User-Elukey, Patch-For-Review, monitoring, Operations
elukey updated subscribers of T181036: Pull netflow data in realtime from Kafka via Tranquillity/Spark.

@faidon whenever you have time do you mind to explain a bit what data is currently pushed to the netflow topic in Kafka Jumbo and how to read it? I am planning to work on this task soon (with Joseph's supervision) but I am very ignorant about the subject :)

Thu, Jan 18, 10:19 AM · Analytics-Kanban, User-Elukey, monitoring, netops, Operations
elukey added a subtask for T132324: Tracking and Reducing cron-spam from root@ : T185195: Sporadic logrotate issue for stretch mediawiki appservers.
Thu, Jan 18, 10:17 AM · Patch-For-Review, Operations
elukey added a parent task for T185195: Sporadic logrotate issue for stretch mediawiki appservers: T132324: Tracking and Reducing cron-spam from root@ .
Thu, Jan 18, 10:17 AM · Operations, User-Elukey
elukey triaged T185195: Sporadic logrotate issue for stretch mediawiki appservers as Normal priority.
Thu, Jan 18, 10:17 AM · Operations, User-Elukey

Wed, Jan 17

elukey moved T168414: Purge all old data from EventLogging master from Ready to Deploy to Done on the Analytics-Kanban board.
Wed, Jan 17, 2:58 PM · Analytics-Kanban, DBA
elukey added a comment to T168414: Purge all old data from EventLogging master.

The first run completed without any errors, and then another one (cleaning up only daily data) ran as well setting the following:

Wed, Jan 17, 2:58 PM · Analytics-Kanban, DBA
elukey set the point value for T168414: Purge all old data from EventLogging master to 13.
Wed, Jan 17, 2:56 PM · Analytics-Kanban, DBA
elukey closed T179640: mw1191 ipmi-sel cpu errors as Resolved.

Host decommed in https://phabricator.wikimedia.org/T183895

Wed, Jan 17, 9:36 AM · Operations, ops-eqiad

Tue, Jan 16

elukey created P6590 (An Untitled Masterwork).
Tue, Jan 16, 10:00 AM
elukey triaged T184796: Configure puppetdb to export metrics via Prometheus JMX Agent as Normal priority.
Tue, Jan 16, 9:57 AM · User-Elukey, Patch-For-Review, monitoring, Operations
elukey added a project to T184796: Configure puppetdb to export metrics via Prometheus JMX Agent: User-Elukey.
Tue, Jan 16, 9:57 AM · User-Elukey, Patch-For-Review, monitoring, Operations
elukey added a comment to T184794: Fix outstanding bugs preventing the use of prometheus jmx agent for Hive/Oozie.

In metastore.sh it is used the HIVE_METASTORE_HADOOP_OPTS that works fine (just tested), but there seems to be no equivalent for Hive Server (https://issues.apache.org/jira/browse/HIVE-12582).

Tue, Jan 16, 8:39 AM · Analytics-Kanban, User-Elukey

Mon, Jan 15

elukey added a comment to T184794: Fix outstanding bugs preventing the use of prometheus jmx agent for Hive/Oozie.

I added a timeout 3 bash command and it worked fine, but then a similar issue re-happened when I tried to restart the metastore service. Hive is surely not really able to run the jmx_agent at the moment, and oozie is on a similar boat. I am a bit worried about the other hadoop daemons though, everything went fine up to now but not sure if the confusing hadoop init.d scripts might have some bug waiting to bite in the future.

Mon, Jan 15, 4:04 PM · Analytics-Kanban, User-Elukey
elukey added a comment to T184794: Fix outstanding bugs preventing the use of prometheus jmx agent for Hive/Oozie.

So with a better ps what happens is clear:

Mon, Jan 15, 3:43 PM · Analytics-Kanban, User-Elukey
elukey added a comment to T184794: Fix outstanding bugs preventing the use of prometheus jmx agent for Hive/Oozie.

About Hive, I tried to re-apply the changes to the metastore and this is the difference in ps:

Mon, Jan 15, 2:59 PM · Analytics-Kanban, User-Elukey
elukey added a comment to T184796: Configure puppetdb to export metrics via Prometheus JMX Agent.

Started a dashboard in https://grafana-admin.wikimedia.org/dashboard/db/puppetdb

Mon, Jan 15, 11:32 AM · User-Elukey, Patch-For-Review, monitoring, Operations
elukey added a comment to T184796: Configure puppetdb to export metrics via Prometheus JMX Agent.

Given that it isn't that many metrics, I think it might be simpler to keep the standard jmx exporter configuration on the puppetdb side and drop the metrics we don't want at scrape time in prometheus config instead

Mon, Jan 15, 10:12 AM · User-Elukey, Patch-For-Review, monitoring, Operations

Fri, Jan 12

elukey added a comment to T184794: Fix outstanding bugs preventing the use of prometheus jmx agent for Hive/Oozie.

Tried to open https://community.cloudera.com/t5/CDH-Manual-Installation/Oozie-duplicates-CATALINA-OPTS-variables-in-oozie-env-sh/m-p/63654#M1607, not sure if it is the best place but let's see if anybody answers.

Fri, Jan 12, 6:00 PM · Analytics-Kanban, User-Elukey
elukey added a comment to T184794: Fix outstanding bugs preventing the use of prometheus jmx agent for Hive/Oozie.

The problem seems to be in the oozie debian package itself:

Fri, Jan 12, 5:35 PM · Analytics-Kanban, User-Elukey
elukey added a comment to T184794: Fix outstanding bugs preventing the use of prometheus jmx agent for Hive/Oozie.

Finally found the root cause. Each time that oozied.sh does start/stop from the init.d's script it starts with a clean environment. Then the duplication happens in oozie-sys.sh due to the symlink pointed out above and these:

Fri, Jan 12, 4:14 PM · Analytics-Kanban, User-Elukey
elukey added a comment to T184794: Fix outstanding bugs preventing the use of prometheus jmx agent for Hive/Oozie.

This is the journalctl snippet of oozie sourcing various files:

Fri, Jan 12, 3:47 PM · Analytics-Kanban, User-Elukey
elukey added a comment to T184794: Fix outstanding bugs preventing the use of prometheus jmx agent for Hive/Oozie.

In https://oozie.apache.org/docs/4.1.0/AG_Install.html -> Advanced/Custom Environment Settings I can't see any CATALINA_OPTS listed for oozie-env.sh, so we might need to set it elsewhere.

Fri, Jan 12, 3:19 PM · Analytics-Kanban, User-Elukey
elukey moved T143892: Update and add suggestions to the Infrastructure Overview graph from Ops Backlog to Stalled on the User-Elukey board.
Fri, Jan 12, 3:17 PM · User-Elukey, Documentation, Wikimedia-General-or-Unknown
elukey moved T175344: Move away from jmxtrans in favor of prometheus jmx_exporter from Analytics Backlog to Keep an eye on it on the User-Elukey board.
Fri, Jan 12, 3:16 PM · Analytics-Kanban, Patch-For-Review, User-Elukey
elukey moved T177458: Add the prometheus jmx exporter to all the Hadoop daemons from In Progress to Done on the User-Elukey board.
Fri, Jan 12, 3:16 PM · Patch-For-Review, Analytics-Kanban, User-Elukey
elukey moved T182993: TLS security review of the Kafka stack from In Progress to Stalled on the User-Elukey board.
Fri, Jan 12, 3:16 PM · Patch-For-Review, Traffic, User-Elukey, Analytics-Kanban, Operations, Analytics-Cluster
elukey set the point value for T177458: Add the prometheus jmx exporter to all the Hadoop daemons to 21.
Fri, Jan 12, 3:15 PM · Patch-For-Review, Analytics-Kanban, User-Elukey
elukey closed T155129: Create prometheus nutcracker exporter as Resolved.
Fri, Jan 12, 2:30 PM · User-Elukey, Operations, Prometheus-metrics-monitoring
elukey moved T177460: Add the prometheus jmx exporter to all the Zookeeper daemons from Ops Backlog to Analytics Backlog on the User-Elukey board.
Fri, Jan 12, 2:29 PM · User-Elukey, Analytics
elukey moved T171203: Run eventlogging purging script on beta labs to avoid disk getting full from Backlog to Analytics Backlog on the User-Elukey board.
Fri, Jan 12, 2:29 PM · Analytics-Kanban, User-Elukey, Analytics-EventLogging
elukey moved T159584: Secure hue and other private data access sites with 2FA from Backlog to Analytics Backlog on the User-Elukey board.
Fri, Jan 12, 2:29 PM · User-Elukey, Analytics
elukey moved T184794: Fix outstanding bugs preventing the use of prometheus jmx agent for Hive/Oozie from Backlog to In Progress on the User-Elukey board.
Fri, Jan 12, 2:29 PM · Analytics-Kanban, User-Elukey
elukey moved T184482: analytics VPS project puppet errors from Backlog to Analytics Backlog on the User-Elukey board.
Fri, Jan 12, 2:29 PM · Analytics-Kanban, User-Elukey, Puppet
elukey moved T184795: Add the prometheus jmx agent to AQS Cassandra from Backlog to Analytics Backlog on the User-Elukey board.
Fri, Jan 12, 2:29 PM · Analytics-Kanban, User-Elukey
elukey added a comment to T184796: Configure puppetdb to export metrics via Prometheus JMX Agent.

Since rates and other things like stdev are Mbean's attributes I cannot easily blacklist them, but sole rewrite rules are needed (in which we can explicitly select what attributes to render). This is what I came up with:

Fri, Jan 12, 1:48 PM · User-Elukey, Patch-For-Review, monitoring, Operations
elukey added a comment to T184796: Configure puppetdb to export metrics via Prometheus JMX Agent.

Current status on nitrogen (no jvm metrics displayed since they should already be ok):

Fri, Jan 12, 12:16 PM · User-Elukey, Patch-For-Review, monitoring, Operations
elukey created T184796: Configure puppetdb to export metrics via Prometheus JMX Agent.
Fri, Jan 12, 12:14 PM · User-Elukey, Patch-For-Review, monitoring, Operations
elukey renamed T184795: Add the prometheus jmx agent to AQS Cassandra from Move AQS Cassandra daemons to use the Prometheus JMX agent to Add the prometheus jmx agent to AQS Cassandra.
Fri, Jan 12, 12:00 PM · Analytics-Kanban, User-Elukey
elukey created T184795: Add the prometheus jmx agent to AQS Cassandra.
Fri, Jan 12, 11:59 AM · Analytics-Kanban, User-Elukey
elukey moved T184794: Fix outstanding bugs preventing the use of prometheus jmx agent for Hive/Oozie from Next Up to In Progress on the Analytics-Kanban board.
Fri, Jan 12, 11:56 AM · Analytics-Kanban, User-Elukey
elukey claimed T184794: Fix outstanding bugs preventing the use of prometheus jmx agent for Hive/Oozie.
Fri, Jan 12, 11:55 AM · Analytics-Kanban, User-Elukey
elukey moved T177458: Add the prometheus jmx exporter to all the Hadoop daemons from In Progress to Done on the Analytics-Kanban board.
Fri, Jan 12, 11:48 AM · Patch-For-Review, Analytics-Kanban, User-Elukey
elukey added a comment to T177458: Add the prometheus jmx exporter to all the Hadoop daemons.

I opened https://phabricator.wikimedia.org/T184794 to track down and fix Oozie/Hive bugs, I am inclined to close this task since:

Fri, Jan 12, 11:48 AM · Patch-For-Review, Analytics-Kanban, User-Elukey
elukey created T184794: Fix outstanding bugs preventing the use of prometheus jmx agent for Hive/Oozie.
Fri, Jan 12, 11:46 AM · Analytics-Kanban, User-Elukey

Thu, Jan 11

elukey added a comment to T184482: analytics VPS project puppet errors.

Just seen that there are more instances to fix. Some of them are under experiment at the moment, will try to fix them asap though.

Thu, Jan 11, 5:56 PM · Analytics-Kanban, User-Elukey, Puppet
elukey reopened T184482: analytics VPS project puppet errors as "Open".
Thu, Jan 11, 5:56 PM · Analytics-Kanban, User-Elukey, Puppet
elukey closed T184482: analytics VPS project puppet errors as Resolved.

Instance deleted!

Thu, Jan 11, 5:56 PM · Analytics-Kanban, User-Elukey, Puppet
elukey added a comment to T166248: Upgrade Analytics Cluster to Java 8.

it does the following (maybe I am reading the code in the wrong way):

Thu, Jan 11, 5:39 PM · Patch-For-Review, Analytics-Kanban, User-Elukey, Analytics-Cluster
elukey added a comment to T166248: Upgrade Analytics Cluster to Java 8.

Tested in labs the procedure outlined above (install + update-java-alternatives to java8) and everything went fine. The following errors are ok (double checked with Moritz):

Thu, Jan 11, 4:56 PM · Patch-For-Review, Analytics-Kanban, User-Elukey, Analytics-Cluster
elukey committed rOSDEae1a1300404d: README.md: fix virtual env suggestions (authored by elukey).
README.md: fix virtual env suggestions
Thu, Jan 11, 10:49 AM

Wed, Jan 10

elukey added a comment to T177458: Add the prometheus jmx exporter to all the Hadoop daemons.

The hive server/metastore issue is more subtle: everything starts, the jmx agent returns metrics correctly but the daemons do not bind to their ports (so they are not working):

Wed, Jan 10, 4:14 PM · Patch-For-Review, Analytics-Kanban, User-Elukey
elukey added a comment to T177458: Add the prometheus jmx exporter to all the Hadoop daemons.

I am testing in labs why oozie/hive daemons are not starting up with the -javaagent.

Wed, Jan 10, 2:29 PM · Patch-For-Review, Analytics-Kanban, User-Elukey
elukey added a comment to T177458: Add the prometheus jmx exporter to all the Hadoop daemons.

Created https://grafana.wikimedia.org/dashboard/db/prometheus-analytics-hadoop as 1:1 replica of its graphite alter ego https://grafana.wikimedia.org/dashboard/db/analytics-hadoop

Wed, Jan 10, 11:31 AM · Patch-For-Review, Analytics-Kanban, User-Elukey

Tue, Jan 9

elukey added a comment to T183771: dbstore1002 possibly MEMORY issues.

Maintenance done, the mgmt interface is now up and running (Chris also did a reseat of the DIMM banks).

Tue, Jan 9, 5:05 PM · ops-eqiad, Operations, Analytics-Kanban
elukey added a comment to P6528 puppetdb mbeans.

Further refinement to exclude jvm metrics duplication:

Tue, Jan 9, 10:39 AM
elukey added a comment to P6528 puppetdb mbeans.

Refined list:

Tue, Jan 9, 9:16 AM

Mon, Jan 8

elukey added a comment to T183771: dbstore1002 possibly MEMORY issues.

downtime announced to engineering@ and analytics@

Mon, Jan 8, 1:53 PM · ops-eqiad, Operations, Analytics-Kanban
elukey added a comment to T183771: dbstore1002 possibly MEMORY issues.

@Cmjohnson Would it be fine tomorrow around this time? Or whenever you prefer, I'd need to send an email and announce the downtime, better to alert people :)

Mon, Jan 8, 1:46 PM · ops-eqiad, Operations, Analytics-Kanban
elukey added a comment to T183771: dbstore1002 possibly MEMORY issues.

Now the BMC/IPMI doesn't seem to be happy:

Mon, Jan 8, 12:54 PM · ops-eqiad, Operations, Analytics-Kanban

Sun, Jan 7

elukey added a comment to T168414: Purge all old data from EventLogging master.

Wanted to sanity check the data on db1107 (el master) after the mysql consumers added the missing data from the past days:

Sun, Jan 7, 8:57 AM · Analytics-Kanban, DBA

Sat, Jan 6

Liuxinyu970226 awarded T166248: Upgrade Analytics Cluster to Java 8 a Like token.
Sat, Jan 6, 12:29 AM · Patch-For-Review, Analytics-Kanban, User-Elukey, Analytics-Cluster

Fri, Jan 5

dcausse awarded T166248: Upgrade Analytics Cluster to Java 8 a Like token.
Fri, Jan 5, 2:26 PM · Patch-For-Review, Analytics-Kanban, User-Elukey, Analytics-Cluster
dcausse awarded T166248: Upgrade Analytics Cluster to Java 8 a Like token.
Fri, Jan 5, 10:44 AM · Patch-For-Review, Analytics-Kanban, User-Elukey, Analytics-Cluster

Thu, Jan 4

elukey created P6528 puppetdb mbeans.
Thu, Jan 4, 12:59 PM

Wed, Jan 3

Slaporte awarded T183291: Requesting account expiration extension a Like token.
Wed, Jan 3, 7:37 PM · Analytics, Analytics-Cluster
elukey closed T181263: mw2251 failed memory dimm as Resolved.

Did a scap pull, set the host to pooled=yes and checked apache metrics. Everything looks good! Closing the task, let's re-open if it gives problems again.

Wed, Jan 3, 5:09 PM · Operations, ops-codfw
elukey moved T179943: Restart Analytics JVM daemons for open-jdk security updates from Paused to Done on the Analytics-Kanban board.
Wed, Jan 3, 3:16 PM · Analytics-Kanban, User-Elukey
elukey set the point value for T179943: Restart Analytics JVM daemons for open-jdk security updates to 13.
Wed, Jan 3, 3:16 PM · Analytics-Kanban, User-Elukey
elukey added a comment to T179943: Restart Analytics JVM daemons for open-jdk security updates.

We'll have to do another round of reboots probably next week, so the remaining kafka hosts will be done later on.

Wed, Jan 3, 3:15 PM · Analytics-Kanban, User-Elukey
elukey set the point value for T183273: Druid Woes to 5.
Wed, Jan 3, 3:12 PM · Patch-For-Review, Analytics-Kanban
elukey added a comment to T168414: Purge all old data from EventLogging master.

Maintenance is ongoing and it will probably last for a couple of days.

Wed, Jan 3, 2:44 PM · Analytics-Kanban, DBA
elukey added a comment to T177458: Add the prometheus jmx exporter to all the Hadoop daemons.

New metrics:

Wed, Jan 3, 1:41 PM · Patch-For-Review, Analytics-Kanban, User-Elukey
elukey moved T165519: rack and setup mw1307-1348 from In Progress to Stalled on the User-Elukey board.
Wed, Jan 3, 1:30 PM · Patch-For-Review, User-Elukey, User-Joe, Operations, ops-eqiad
elukey added a comment to T165519: rack and setup mw1307-1348 .

Next steps:

  1. image all the hosts in https://gerrit.wikimedia.org/r/397749 and put them in production (January)
  2. decom old row C appservers mw118[0-9]
  3. rack / image / productionize mw13[38-48] (10 api + 1 vs)
Wed, Jan 3, 11:05 AM · Patch-For-Review, User-Elukey, User-Joe, Operations, ops-eqiad
elukey added a comment to T183771: dbstore1002 possibly MEMORY issues.

I would consider fixing mgmt the first thing address here. If the server breaks, even with OOM, we would need to wait for Chris to reboot it for instance.

Wed, Jan 3, 10:42 AM · ops-eqiad, Operations, Analytics-Kanban

Tue, Jan 2

elukey added a comment to T183771: dbstore1002 possibly MEMORY issues.

I've discussed this task with my team and a couple of things came up:

Tue, Jan 2, 5:25 PM · ops-eqiad, Operations, Analytics-Kanban
elukey changed the status of T179192: Check analytics1037 power supply status from Open to Stalled.
Tue, Jan 2, 4:07 PM · ops-eqiad, Operations, User-Elukey, Analytics