Page MenuHomePhabricator

elukey (Luca Toscano)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Sunday

  • Clear sailing ahead.

User Details

User Since
Jan 5 2016, 9:54 PM (211 w, 2 d)
Availability
Busy Busy until Jan 26.
LDAP User
Unknown
MediaWiki User
LToscano (WMF) [ Global Accounts ]

Recent Activity

Wed, Jan 22

Masumrezarock100 awarded T242712: Deprecation (if possible) of the #central channel on irc.wikimedia.org a Heartbreak token.
Wed, Jan 22, 5:54 PM · Tool-stewardbots, User-Elukey, Analytics

Tue, Jan 21

elukey added a comment to T203693: Update to CDH 6 or other up-to-date Hadoop distribution.

Summary of current thoughts:

Tue, Jan 21, 4:46 PM · User-Elukey, Analytics-Cluster, Analytics
elukey closed T243239: Unable to access Hive from notebook1003 as Resolved.
Tue, Jan 21, 4:19 PM · Analytics
elukey added a comment to T203693: Update to CDH 6 or other up-to-date Hadoop distribution.

I had a very nice chat with the BigTop committers, here's some reference: https://lists.apache.org/thread.html/r9b588c1c9f693bd78549e7f3251004bc114c754b8d16f4edd796b828%40%3Cuser.bigtop.apache.org%3E

Tue, Jan 21, 4:16 PM · User-Elukey, Analytics-Cluster, Analytics
elukey added a comment to T239571: Check home leftovers of dfoy.

@Milimetric did you get any response?

Tue, Jan 21, 3:22 PM · Product-Analytics, Analytics
elukey added a comment to T240934: Enable encryption in Spark 2.4 by default.

this delicate change should probably be applied before all hands to avoid headaches

You mean after all hands? :)

Tue, Jan 21, 2:44 PM · Patch-For-Review, Analytics-Kanban, Analytics
elukey moved T242306: No queries run in Hue from Incoming to Operational Excellence on the Analytics board.
Tue, Jan 21, 10:49 AM · Analytics-Kanban, User-Elukey, Analytics
elukey moved T242306: No queries run in Hue from Next Up to Done on the Analytics-Kanban board.
Tue, Jan 21, 10:24 AM · Analytics-Kanban, User-Elukey, Analytics
elukey added a project to T242306: No queries run in Hue: Analytics-Kanban.
Tue, Jan 21, 10:24 AM · Analytics-Kanban, User-Elukey, Analytics
elukey added a comment to T242306: No queries run in Hue.

I have created https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hue#Hive_query_errors_with_Kerberos to help users finding the workaround.

Tue, Jan 21, 10:23 AM · Analytics-Kanban, User-Elukey, Analytics
elukey added a comment to T242712: Deprecation (if possible) of the #central channel on irc.wikimedia.org.

@MarcoAurelio thanks a lot for the feedback! I am trying to find use cases for the IRC channel, we don't want to cause disruptions to important bots. Do you have more info about how the account creation report is used by consumers of #cvn-unifications? I am wondering if the info is actively used or not to fight counter-vandalism, as you were saying some bots are old and they may not be up to date now (the info could also come from other places etc..)

Tue, Jan 21, 10:07 AM · Tool-stewardbots, User-Elukey, Analytics
elukey moved T240934: Enable encryption in Spark 2.4 by default from In Progress to In Code Review on the Analytics-Kanban board.
Tue, Jan 21, 7:49 AM · Patch-For-Review, Analytics-Kanban, Analytics
elukey added a comment to T240934: Enable encryption in Spark 2.4 by default.

Status: from my tests everything seems working fine, but this delicate change should probably be applied before all hands to avoid headaches :)

Tue, Jan 21, 7:49 AM · Patch-For-Review, Analytics-Kanban, Analytics

Mon, Jan 20

elukey added a comment to T240934: Enable encryption in Spark 2.4 by default.

Very interesting that the heisenbug seems now only triggering a warning, but not stopping pyspark:

Mon, Jan 20, 3:53 PM · Patch-For-Review, Analytics-Kanban, Analytics
elukey added a comment to T240934: Enable encryption in Spark 2.4 by default.

ok so today I found in the debug logs a warning that was indicating the failure to load openssl's crypto libs, and the fallback to standard JCE crypto. After a bit of digging I found this: https://issues.apache.org/jira/browse/HADOOP-12845

Mon, Jan 20, 3:32 PM · Patch-For-Review, Analytics-Kanban, Analytics
elukey closed T243109: request access to Hue as Resolved.

Done! :)

Mon, Jan 20, 11:49 AM · Analytics, Product-Analytics
elukey triaged T243149: Increased latency in CODFW API and APP monitoring urls (~07:20 UTC 19 Jan 2020) as Medium priority.
Mon, Jan 20, 7:49 AM · Performance-Team (Radar), serviceops, Operations

Sun, Jan 19

elukey updated subscribers of T243149: Increased latency in CODFW API and APP monitoring urls (~07:20 UTC 19 Jan 2020).

This seems to be related to T243148, db2085 was overwhelmed and this explains the high latency (Special:Blank page health checks were taking ages to complete).

Sun, Jan 19, 3:20 PM · Performance-Team (Radar), serviceops, Operations

Fri, Jan 17

elukey moved T242712: Deprecation (if possible) of the #central channel on irc.wikimedia.org from Backlog to Waiting for others on the User-Elukey board.
Fri, Jan 17, 3:39 PM · Tool-stewardbots, User-Elukey, Analytics
elukey moved T242306: No queries run in Hue from Waiting for others to Done on the User-Elukey board.
Fri, Jan 17, 3:39 PM · Analytics-Kanban, User-Elukey, Analytics
elukey moved T226035: Dropping data from druid takes down aqs hosts from Stalled to In Progress on the User-Elukey board.
Fri, Jan 17, 3:39 PM · Patch-For-Review, User-Elukey, Analytics-Kanban, Analytics
elukey moved T219928: Move AQS logging to new logging pipeline from Stalled to Done on the User-Elukey board.
Fri, Jan 17, 3:39 PM · Analytics-Kanban, User-Elukey, Patch-For-Review, observability, Analytics, Core Platform Team Legacy (Watching / External), Services (watching), service-runner, Wikimedia-Logstash, Operations
elukey moved T234229: Shorten the time it takes to move files from hadoop to dump hosts by Kerberizing/hadooping the dump hosts from Stalled to Done on the User-Elukey board.
Fri, Jan 17, 3:39 PM · Patch-For-Review, User-Elukey, Analytics-Kanban, Analytics
elukey moved T242870: Upgrade to Superset 0.35.2 from Backlog to Done on the User-Elukey board.
Fri, Jan 17, 3:38 PM · User-Elukey, Better Use Of Data, Analytics-Kanban, Product-Analytics
elukey changed Final Story Points from 21 to 13 on T234229: Shorten the time it takes to move files from hadoop to dump hosts by Kerberizing/hadooping the dump hosts .
Fri, Jan 17, 2:17 PM · Patch-For-Review, User-Elukey, Analytics-Kanban, Analytics
elukey set Final Story Points to 21 on T234229: Shorten the time it takes to move files from hadoop to dump hosts by Kerberizing/hadooping the dump hosts .
Fri, Jan 17, 2:16 PM · Patch-For-Review, User-Elukey, Analytics-Kanban, Analytics
elukey moved T234229: Shorten the time it takes to move files from hadoop to dump hosts by Kerberizing/hadooping the dump hosts from In Code Review to Done on the Analytics-Kanban board.
Fri, Jan 17, 2:16 PM · Patch-For-Review, User-Elukey, Analytics-Kanban, Analytics
elukey updated the task description for T211706: Superset Updates .
Fri, Jan 17, 7:42 AM · Better Use Of Data, Analytics-Kanban, Product-Analytics
elukey added a project to T242998: VPN access to superset/turnilo instead of LDAP: User-Elukey.
Fri, Jan 17, 7:38 AM · SecTeam Discussion, User-Elukey, Security-Team, Analytics
elukey added a comment to T242998: VPN access to superset/turnilo instead of LDAP.

Do we have any "approved" VPN apps/protocols for production?
What auth is the VPN going to use?
Is authing via VPN going to be the only auth? Then there will be no auth to use the app? Or will it still be LDAP auth behind the VPN?
This sounds like it's some workaround/replacement for having https://turnilo.wikimedia.org/ and https://superset.wikimedia.org/ exposed to the public internet?

Fri, Jan 17, 7:38 AM · SecTeam Discussion, User-Elukey, Security-Team, Analytics
elukey added a comment to T242306: No queries run in Hue.

@MMiller_WMF any re-occurrence of the issue?

Fri, Jan 17, 6:45 AM · Analytics-Kanban, User-Elukey, Analytics
elukey updated subscribers of T242306: No queries run in Hue.

Adding @Seddon to this task since he reported the issue in #wikimedia-analytics :)

Fri, Jan 17, 6:45 AM · Analytics-Kanban, User-Elukey, Analytics

Thu, Jan 16

elukey set Final Story Points to 5 on T242870: Upgrade to Superset 0.35.2.
Thu, Jan 16, 5:04 PM · User-Elukey, Better Use Of Data, Analytics-Kanban, Product-Analytics
elukey moved T242870: Upgrade to Superset 0.35.2 from In Code Review to Done on the Analytics-Kanban board.
Thu, Jan 16, 5:04 PM · User-Elukey, Better Use Of Data, Analytics-Kanban, Product-Analytics
elukey added a comment to T242870: Upgrade to Superset 0.35.2.

@mforns superset upgraded! Can you check superset.wikimedia.org to see if you spot any anomaly?

Thu, Jan 16, 3:20 PM · User-Elukey, Better Use Of Data, Analytics-Kanban, Product-Analytics
elukey added a comment to T242870: Upgrade to Superset 0.35.2.

Checked a lot of charts and everything seems rendering fine, all good from my side!

Thu, Jan 16, 8:06 AM · User-Elukey, Better Use Of Data, Analytics-Kanban, Product-Analytics

Wed, Jan 15

elukey moved T242870: Upgrade to Superset 0.35.2 from Next Up to In Code Review on the Analytics-Kanban board.
Wed, Jan 15, 3:36 PM · User-Elukey, Better Use Of Data, Analytics-Kanban, Product-Analytics
elukey claimed T242870: Upgrade to Superset 0.35.2.
Wed, Jan 15, 3:36 PM · User-Elukey, Better Use Of Data, Analytics-Kanban, Product-Analytics
elukey added a comment to T242870: Upgrade to Superset 0.35.2.

Deployed https://gerrit.wikimedia.org/r/#/c/analytics/superset/deploy/+/565037/ to an-tool1005, anybody can test it via:

Wed, Jan 15, 3:36 PM · User-Elukey, Better Use Of Data, Analytics-Kanban, Product-Analytics
elukey created T242870: Upgrade to Superset 0.35.2.
Wed, Jan 15, 3:35 PM · User-Elukey, Better Use Of Data, Analytics-Kanban, Product-Analytics
elukey moved T242754: Removed not used CDH packages from Hadoop nodes from Next Up to Done on the Analytics-Kanban board.
Wed, Jan 15, 10:45 AM · Analytics-Kanban, Analytics
elukey triaged T242754: Removed not used CDH packages from Hadoop nodes as Medium priority.
Wed, Jan 15, 10:44 AM · Analytics-Kanban, Analytics
elukey reopened T226035: Dropping data from druid takes down aqs hosts as "Open".

Re-happened this morning :(

Wed, Jan 15, 9:39 AM · Patch-For-Review, User-Elukey, Analytics-Kanban, Analytics
elukey changed the status of T219928: Move AQS logging to new logging pipeline, a subtask of T211125: Move service-runner to new logging infrastructure, from Stalled to Open.
Wed, Jan 15, 8:42 AM · observability, Core Platform Team Legacy (Watching / External), Patch-For-Review, service-runner, Wikimedia-Logstash, Operations
elukey changed the status of T219928: Move AQS logging to new logging pipeline from Stalled to Open.

Dan deployed the new version of service-runner for aqs, I applied the puppet patch and verified that the new settings works in logstash:

Wed, Jan 15, 8:42 AM · Analytics-Kanban, User-Elukey, Patch-For-Review, observability, Analytics, Core Platform Team Legacy (Watching / External), Services (watching), service-runner, Wikimedia-Logstash, Operations
elukey set Final Story Points to 5 on T219928: Move AQS logging to new logging pipeline.
Wed, Jan 15, 8:41 AM · Analytics-Kanban, User-Elukey, Patch-For-Review, observability, Analytics, Core Platform Team Legacy (Watching / External), Services (watching), service-runner, Wikimedia-Logstash, Operations
elukey moved T219928: Move AQS logging to new logging pipeline from Paused to Done on the Analytics-Kanban board.
Wed, Jan 15, 8:41 AM · Analytics-Kanban, User-Elukey, Patch-For-Review, observability, Analytics, Core Platform Team Legacy (Watching / External), Services (watching), service-runner, Wikimedia-Logstash, Operations
elukey added a comment to T240934: Enable encryption in Spark 2.4 by default.

Sent an email to users@spark.apache.org, let's see if anybody comes back with suggestions!

Wed, Jan 15, 8:16 AM · Patch-For-Review, Analytics-Kanban, Analytics
elukey updated subscribers of T242819: ores.wmflabs.org - 503 icinga alerts.
Wed, Jan 15, 7:45 AM · Scoring-platform-team (Current), Operations, ORES
elukey added a comment to T237752: Make stats.wikimedia.org point to wikistats2 by default .

I love the power mod_rewrite gives me, but after 6 months I always scratch my head again...

Wed, Jan 15, 7:39 AM · Patch-For-Review, Analytics-Kanban, Analytics

Tue, Jan 14

elukey added a comment to T242754: Removed not used CDH packages from Hadoop nodes.

Turned out that for a lot of reverse deps the only thing that I was able to remove was flume-ng spark-core spark-python from the Hadoop test workers. Waiting a day and applying it also to Hadoop Analytic as well..

Tue, Jan 14, 6:10 PM · Analytics-Kanban, Analytics
elukey added a comment to T242754: Removed not used CDH packages from Hadoop nodes.
elukey@an-master1001:~$ dpkg -l | grep cdh
ii  avro-libs                            1.7.6+cdh5.16.1+143-1.cdh5.16.1.p0.3~jessie-cdh5.16.1  all          Data serialization system
ii  bigtop-jsvc                          0.6.0+cdh5.16.1+934-1.cdh5.16.1.p0.3~jessie-cdh5.16.1  amd64        Application to launch java daemon
ii  bigtop-utils                         0.7.0+cdh5.16.1+0-1.cdh5.16.1.p0.3~jessie-cdh5.16.1    all          Collection of useful tools for Bigtop
ii  hadoop                               2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3~jessie-cdh5.16.1 all          A software platform for processing vast amounts of data
ii  hadoop-0.20-mapreduce                2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3~jessie-cdh5.16.1 amd64        A software platform for processing vast amounts of data
ii  hadoop-client                        2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3~jessie-cdh5.16.1 all          Hadoop client side dependencies
ii  hadoop-hdfs                          2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3~jessie-cdh5.16.1 all          The Hadoop Distributed File System
ii  hadoop-hdfs-namenode                 2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3~jessie-cdh5.16.1 all          Name Node for Hadoop
ii  hadoop-hdfs-zkfc                     2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3~jessie-cdh5.16.1 all          Hadoop HDFS failover controller
ii  hadoop-mapreduce                     2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3~jessie-cdh5.16.1 all          The Hadoop MapReduce (MRv2)
ii  hadoop-mapreduce-historyserver       2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3~jessie-cdh5.16.1 all          MapReduce History Server
ii  hadoop-yarn                          2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3~jessie-cdh5.16.1 all          The Hadoop NextGen MapReduce (YARN)
ii  hadoop-yarn-resourcemanager          2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3~jessie-cdh5.16.1 all          Resource manager for Hadoop
ii  libhdfs0                             2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3~jessie-cdh5.16.1 amd64        JNI Bindings to access Hadoop HDFS from C
ii  parquet                              1.5.0+cdh5.16.1+200-1.cdh5.16.1.p0.3~jessie-cdh5.16.1  all          A columnar storage format for Hadoop.
ii  parquet-format                       2.1.0+cdh5.16.1+22-1.cdh5.16.1.p0.3~jessie-cdh5.16.1   all          Format definitions for Parquet
ii  zookeeper                            3.4.5+cdh5.16.1+155-1.cdh5.16.1.p0.3~jessie-cdh5.16.1  all          A high-performance coordination service for distributed applications.
Tue, Jan 14, 5:16 PM · Analytics-Kanban, Analytics
elukey updated subscribers of T242754: Removed not used CDH packages from Hadoop nodes.
Tue, Jan 14, 5:03 PM · Analytics-Kanban, Analytics
elukey added a comment to T242754: Removed not used CDH packages from Hadoop nodes.
elukey@an-coord1001:~$ dpkg -l | grep cdh
ii  avro-libs                                 1.7.6+cdh5.16.1+143-1.cdh5.16.1.p0.3~jessie-cdh5.16.1  all          Data serialization system
ii  bigtop-jsvc                               0.6.0+cdh5.16.1+934-1.cdh5.16.1.p0.3~jessie-cdh5.16.1  amd64        Application to launch java daemon
ii  bigtop-tomcat                             0.7.0+cdh5.16.1+0-1.cdh5.16.1.p0.3~jessie-cdh5.16.1    all          Apache Tomcat
ii  bigtop-utils                              0.7.0+cdh5.16.1+0-1.cdh5.16.1.p0.3~jessie-cdh5.16.1    all          Collection of useful tools for Bigtop
ii  flume-ng                                  1.6.0+cdh5.16.1+192-1.cdh5.16.1.p0.3~jessie-cdh5.16.1  all          Flume is a reliable, scalable, and manageable distributed log collection application for collecting data such as logs and delivering it to data stores such as Hadoop's HDFS.
ii  hadoop                                    2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3~jessie-cdh5.16.1 all          A software platform for processing vast amounts of data
ii  hadoop-0.20-mapreduce                     2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3~jessie-cdh5.16.1 amd64        A software platform for processing vast amounts of data
ii  hadoop-client                             2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3~jessie-cdh5.16.1 all          Hadoop client side dependencies
ii  hadoop-hdfs                               2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3~jessie-cdh5.16.1 all          The Hadoop Distributed File System
ii  hadoop-hdfs-fuse                          2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3~jessie-cdh5.16.1 amd64        HDFS exposed over a Filesystem in Userspace
ii  hadoop-mapreduce                          2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3~jessie-cdh5.16.1 all          The Hadoop MapReduce (MRv2)
ii  hadoop-yarn                               2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3~jessie-cdh5.16.1 all          The Hadoop NextGen MapReduce (YARN)
ii  hbase                                     1.2.0+cdh5.16.1+482-1.cdh5.16.1.p0.3~jessie-cdh5.16.1  all          HBase is the Hadoop database. Use it when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware.
ii  hive                                      1.1.0+cdh5.16.1+1431-1.cdh5.16.1.p0.3~jessie-cdh5.16.1 all          Hive is a data warehouse infrastructure built on top of Hadoop
ii  hive-hcatalog                             1.1.0+cdh5.16.1+1431-1.cdh5.16.1.p0.3~jessie-cdh5.16.1 all          Apache HCatalog is a table and storage management service.
ii  hive-jdbc                                 1.1.0+cdh5.16.1+1431-1.cdh5.16.1.p0.3~jessie-cdh5.16.1 all          Provides libraries necessary to connect to Apache Hive via JDBC
ii  hive-metastore                            1.1.0+cdh5.16.1+1431-1.cdh5.16.1.p0.3~jessie-cdh5.16.1 all          Shared metadata repository for Hive
ii  hive-server2                              1.1.0+cdh5.16.1+1431-1.cdh5.16.1.p0.3~jessie-cdh5.16.1 all          Provides a Hive Thrift service with improved concurrency support.
ii  hive-webhcat                              1.1.0+cdh5.16.1+1431-1.cdh5.16.1.p0.3~jessie-cdh5.16.1 all          WebHcat provides a REST-like web API for HCatalog and related Hadoop components.
ii  kite                                      1.0.0+cdh5.16.1+151-1.cdh5.16.1.p0.3~jessie-cdh5.16.1  all          Kite Software Development Kit.
ii  libhdfs0                                  2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3~jessie-cdh5.16.1 amd64        JNI Bindings to access Hadoop HDFS from C
ii  mahout                                    0.9+cdh5.16.1+38-1.cdh5.16.1.p0.3~jessie-cdh5.16.1     all          A set of Java libraries for scalable machine learning.
ii  oozie                                     4.1.0+cdh5.16.1+503-1.cdh5.16.1.p0.3~jessie-cdh5.16.1  all          Oozie is a system that runs workflows of Hadoop jobs.
ii  oozie-client                              4.1.0+cdh5.16.1+503-1.cdh5.16.1.p0.3~jessie-cdh5.16.1  all          Client for Oozie Workflow Engine
ii  parquet                                   1.5.0+cdh5.16.1+200-1.cdh5.16.1.p0.3~jessie-cdh5.16.1  all          A columnar storage format for Hadoop.
ii  parquet-format                            2.1.0+cdh5.16.1+22-1.cdh5.16.1.p0.3~jessie-cdh5.16.1   all          Format definitions for Parquet
ii  pig                                       0.12.0+cdh5.16.1+117-1.cdh5.16.1.p0.3~jessie-cdh5.16.1 all          Pig is a platform for analyzing large data sets
ii  pig-udf-datafu                            1.1.0+cdh5.16.1+29-1.cdh5.16.1.p0.3~jessie-cdh5.16.1   all          A collection of user-defined functions for Hadoop and Pig.
ii  sentry                                    1.5.1+cdh5.16.1+559-1.cdh5.16.1.p0.3~jessie-cdh5.16.1  all          A system for enforcing fine grained role based authorization to data and metadata stored on a Hadoop cluster.
ii  solr                                      4.10.3+cdh5.16.1+532-1.cdh5.16.1.p0.3~jessie-cdh5.16.1 all          Apache Solr is the popular, blazing fast open source enterprise search platform
ii  spark-core                                1.6.0+cdh5.16.1+577-1.cdh5.16.1.p0.3~jessie-cdh5.16.1  all          Lightning-Fast Cluster Computing
ii  sqoop                                     1.4.6+cdh5.16.1+140-1.cdh5.16.1.p0.3~jessie-cdh5.16.1  all          Tool for easy imports and exports of data sets between databases and HDFS
ii  zookeeper                                 3.4.5+cdh5.16.1+155-1.cdh5.16.1.p0.3~jessie-cdh5.16.1  all          A high-performance coordination service for distributed applications.
Tue, Jan 14, 5:03 PM · Analytics-Kanban, Analytics
elukey added a comment to T242754: Removed not used CDH packages from Hadoop nodes.
elukey@an-worker1080:~$ dpkg -l | grep cdh
ii  avro-libs                             1.7.6+cdh5.16.1+143-1.cdh5.16.1.p0.3~jessie-cdh5.16.1  all          Data serialization system
ii  bigtop-jsvc                           0.6.0+cdh5.16.1+934-1.cdh5.16.1.p0.3~jessie-cdh5.16.1  amd64        Application to launch java daemon
ii  bigtop-tomcat                         0.7.0+cdh5.16.1+0-1.cdh5.16.1.p0.3~jessie-cdh5.16.1    all          Apache Tomcat
ii  bigtop-utils                          0.7.0+cdh5.16.1+0-1.cdh5.16.1.p0.3~jessie-cdh5.16.1    all          Collection of useful tools for Bigtop
ii  flume-ng                              1.6.0+cdh5.16.1+192-1.cdh5.16.1.p0.3~jessie-cdh5.16.1  all          Flume is a reliable, scalable, and manageable distributed log collection application for collecting data such as logs and delivering it to data stores such as Hadoop's HDFS.
ii  hadoop                                2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3~jessie-cdh5.16.1 all          A software platform for processing vast amounts of data
ii  hadoop-0.20-mapreduce                 2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3~jessie-cdh5.16.1 amd64        A software platform for processing vast amounts of data
ii  hadoop-client                         2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3~jessie-cdh5.16.1 all          Hadoop client side dependencies
ii  hadoop-hdfs                           2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3~jessie-cdh5.16.1 all          The Hadoop Distributed File System
ii  hadoop-hdfs-datanode                  2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3~jessie-cdh5.16.1 all          Data Node for Hadoop
ii  hadoop-mapreduce                      2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3~jessie-cdh5.16.1 all          The Hadoop MapReduce (MRv2)
ii  hadoop-yarn                           2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3~jessie-cdh5.16.1 all          The Hadoop NextGen MapReduce (YARN)
ii  hadoop-yarn-nodemanager               2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3~jessie-cdh5.16.1 all          Node manager for Hadoop
ii  hive                                  1.1.0+cdh5.16.1+1431-1.cdh5.16.1.p0.3~jessie-cdh5.16.1 all          Hive is a data warehouse infrastructure built on top of Hadoop
ii  hive-hcatalog                         1.1.0+cdh5.16.1+1431-1.cdh5.16.1.p0.3~jessie-cdh5.16.1 all          Apache HCatalog is a table and storage management service.
ii  hive-jdbc                             1.1.0+cdh5.16.1+1431-1.cdh5.16.1.p0.3~jessie-cdh5.16.1 all          Provides libraries necessary to connect to Apache Hive via JDBC
ii  kite                                  1.0.0+cdh5.16.1+151-1.cdh5.16.1.p0.3~jessie-cdh5.16.1  all          Kite Software Development Kit.
ii  libhdfs0                              2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3~jessie-cdh5.16.1 amd64        JNI Bindings to access Hadoop HDFS from C
ii  parquet                               1.5.0+cdh5.16.1+200-1.cdh5.16.1.p0.3~jessie-cdh5.16.1  all          A columnar storage format for Hadoop.
ii  parquet-format                        2.1.0+cdh5.16.1+22-1.cdh5.16.1.p0.3~jessie-cdh5.16.1   all          Format definitions for Parquet
ii  sentry                                1.5.1+cdh5.16.1+559-1.cdh5.16.1.p0.3~jessie-cdh5.16.1  all          A system for enforcing fine grained role based authorization to data and metadata stored on a Hadoop cluster.
ii  solr                                  4.10.3+cdh5.16.1+532-1.cdh5.16.1.p0.3~jessie-cdh5.16.1 all          Apache Solr is the popular, blazing fast open source enterprise search platform
ii  spark-core                            1.6.0+cdh5.16.1+577-1.cdh5.16.1.p0.3~jessie-cdh5.16.1  all          Lightning-Fast Cluster Computing
ii  sqoop                                 1.4.6+cdh5.16.1+140-1.cdh5.16.1.p0.3~jessie-cdh5.16.1  all          Tool for easy imports and exports of data sets between databases and HDFS
ii  zookeeper                             3.4.5+cdh5.16.1+155-1.cdh5.16.1.p0.3~jessie-cdh5.16.1  all          A high-performance coordination service for distributed applications.
Tue, Jan 14, 2:43 PM · Analytics-Kanban, Analytics
elukey created T242754: Removed not used CDH packages from Hadoop nodes.
Tue, Jan 14, 2:42 PM · Analytics-Kanban, Analytics
elukey moved T241649: Investigate Hue alarms from Paused to Done on the Analytics-Kanban board.
Tue, Jan 14, 1:39 PM · Analytics-Kanban, User-Elukey, Analytics
elukey moved T241650: Investigate sporadic failures in oozie hive actions due to Kerberos auth from Paused to Done on the Analytics-Kanban board.
Tue, Jan 14, 1:39 PM · Patch-For-Review, Analytics-Kanban, User-Elukey, Analytics
elukey updated the task description for T203693: Update to CDH 6 or other up-to-date Hadoop distribution.
Tue, Jan 14, 1:02 PM · User-Elukey, Analytics-Cluster, Analytics
elukey moved T242306: No queries run in Hue from Backlog to Waiting for others on the User-Elukey board.
Tue, Jan 14, 12:45 PM · Analytics-Kanban, User-Elukey, Analytics
elukey moved T241649: Investigate Hue alarms from Waiting for others to Done on the User-Elukey board.
Tue, Jan 14, 12:45 PM · Analytics-Kanban, User-Elukey, Analytics
elukey added a comment to T242306: No queries run in Hue.

I have restarted yesterday Hue to add (again) some hive query limits to avoid out of memory issues, let me know if the error re-appears.

Tue, Jan 14, 12:44 PM · Analytics-Kanban, User-Elukey, Analytics
elukey added a comment to T241795: (Need By: Jan 10) rack/setup/install mc-gp100[123].eqiad.wmnet.

Hi @Jclark-ctr, any timeline for these hosts to be racked?

Tue, Jan 14, 11:31 AM · serviceops, ops-eqiad, Operations
elukey added a comment to T240934: Enable encryption in Spark 2.4 by default.

Example of app logs without the option set:

Tue, Jan 14, 11:04 AM · Patch-For-Review, Analytics-Kanban, Analytics
elukey added a comment to T240934: Enable encryption in Spark 2.4 by default.

Comparing logs of the same application id, with one container successfully registering to the AM and the other one not (causing the failure):

Tue, Jan 14, 10:39 AM · Patch-For-Review, Analytics-Kanban, Analytics
elukey added a comment to T240934: Enable encryption in Spark 2.4 by default.

Today I can't repro anymore, pyspark --master yarn + spark-submit all work fine.. Could it be some weird capacity issue with dynamic allocation that happens in Hadoop test only under certain conditions? I'd be inclined to test this in Hadoop Analytics and see if it works, rollback is very quick and we'd have more datapoints to debug further..

Tue, Jan 14, 9:29 AM · Patch-For-Review, Analytics-Kanban, Analytics
elukey added a comment to T240182: Create EventStream's equivalent to irc.wikimedia.org's #central channel.

Created T242712

Tue, Jan 14, 8:28 AM · Event-Platform, User-Elukey, Analytics
elukey added a project to T242712: Deprecation (if possible) of the #central channel on irc.wikimedia.org: Tool-stewardbots.
Tue, Jan 14, 8:28 AM · Tool-stewardbots, User-Elukey, Analytics
elukey created T242712: Deprecation (if possible) of the #central channel on irc.wikimedia.org.
Tue, Jan 14, 8:28 AM · Tool-stewardbots, User-Elukey, Analytics
elukey added a comment to T241190: New Hadoop hardware. Refreshes and hosts with space for GPUs.

@Nuria I added some comments to T238587#5800760, please let me know if it makes sense or not.

Tue, Jan 14, 8:23 AM · Analytics-Kanban, User-Elukey, Analytics
elukey added a comment to T242461: restrouter.svc.{eqiad,codfw}.wmnet in a failed state.

I just acked two icinga LVS alerts for restrouter in icinga, please let me know if they were something different :)

Tue, Jan 14, 8:17 AM · serviceops, Core Platform Team Workboards (Clinic Duty Team)
elukey added a comment to T241187: Refresh stat1004 with a new host and GPU .

This is currently being discussed in T238587, when we'll have a final procurement task I'll add more info.

Tue, Jan 14, 7:51 AM · Analytics
elukey renamed T241187: Refresh stat1004 with a new host and GPU from Refresh 1004 with a new host and GPU to Refresh stat1004 with a new host and GPU .
Tue, Jan 14, 7:49 AM · Analytics
elukey added a comment to T241192: Purchase of GPUs to help support the open source software stack on top of AMD GPUs (donation to Debian).

@faidon @MoritzMuehlenhoff hi :) How should we proceed?

Tue, Jan 14, 7:49 AM · Analytics
elukey triaged T242705: Ores celery OOM event in codfw as High priority.
Tue, Jan 14, 7:31 AM · Scoring-platform-team (Current), Operations, ORES

Mon, Jan 13

elukey added a comment to T242525: Kerberos credentials for musikanimal.
elukey@krb1001:~$ sudo manage_principals.py create musikanimal --email_address=lziemba@wikimedia.org
Principal successfully created. Make sure to update data.yaml in Puppet.
Successfully sent email to lziemba@wikimedia.org
Mon, Jan 13, 5:43 PM · Analytics
elukey moved T240934: Enable encryption in Spark 2.4 by default from Next Up to In Progress on the Analytics-Kanban board.
Mon, Jan 13, 4:05 PM · Patch-For-Review, Analytics-Kanban, Analytics
elukey added a comment to T240852: CloudVPS: horizon giving http/500 intermitently.

Happened to me now while trying to create a VM in the analytics project. I see intermittent 500s with the generic error msg "The server encountered an internal error or misconfiguration and was unable to complete your request", and errors in populating dropbox lists due to various errors when creating a VM.

Mon, Jan 13, 12:41 PM · Horizon, cloud-services-team (Kanban)
elukey reopened T240852: CloudVPS: horizon giving http/500 intermitently as "Open".
Mon, Jan 13, 12:40 PM · Horizon, cloud-services-team (Kanban)
elukey closed T239249: (Need By: Jan 15) codfw: rack/setup/install mc-gp200[123].codfw.wmnet, a subtask of T240684: Upgrade and improve our application object caching service (memcached), as Resolved.
Mon, Jan 13, 11:17 AM · Operations, serviceops
elukey closed T239249: (Need By: Jan 15) codfw: rack/setup/install mc-gp200[123].codfw.wmnet as Resolved.
Mon, Jan 13, 11:17 AM · Operations, ops-codfw
elukey updated the task description for T239249: (Need By: Jan 15) codfw: rack/setup/install mc-gp200[123].codfw.wmnet.
Mon, Jan 13, 11:17 AM · Operations, ops-codfw
elukey added a comment to T240934: Enable encryption in Spark 2.4 by default.

commons-crypto 1.0.0 is contained in the spark-assembly jar on HDFS, but possibly this is only a Python issue with crypto libraries?

Mon, Jan 13, 9:22 AM · Patch-For-Review, Analytics-Kanban, Analytics
elukey added a comment to T240934: Enable encryption in Spark 2.4 by default.

Today I started with spark2-submit --conf spark.io.encryption.enabled=false --conf=spark.network.crypto.enabled=false --conf spark.dynamicAllocation.enabled=false --conf spark.shuffle.service.enabled=false --master yarn /home/joal/test_spark_submit/spark-2.4.4-bin-hadoop2.6/examples/src/main/python/pi.py 100 and remove one option at the time until the problem came up. I narrowed down the issue to spark.io.encryption.enabled, that is the option enabling AES encryption rather than SASL.

Mon, Jan 13, 9:14 AM · Patch-For-Review, Analytics-Kanban, Analytics

Sat, Jan 11

elukey added a comment to T240934: Enable encryption in Spark 2.4 by default.

Really interesting: after leaving only 'spark.authenticate=true' in the yarn's config, Spark refine started failing with:

Sat, Jan 11, 9:43 AM · Patch-For-Review, Analytics-Kanban, Analytics

Fri, Jan 10

elukey added a comment to T240934: Enable encryption in Spark 2.4 by default.

All the symptoms of the last issue to solve are highlighted in https://issues.apache.org/jira/browse/SPARK-19528

Fri, Jan 10, 6:34 PM · Patch-For-Review, Analytics-Kanban, Analytics
elukey added a comment to T242306: No queries run in Hue.

@elukey -- clicking "recreate" makes it work! Thank you.

Fri, Jan 10, 6:16 PM · Analytics-Kanban, User-Elukey, Analytics
elukey added a comment to T240934: Enable encryption in Spark 2.4 by default.

Just restarted oozie, it picked up the new config so in theory we'll not need to change any workflows when enabling spark encryption.

Fri, Jan 10, 2:42 PM · Patch-For-Review, Analytics-Kanban, Analytics
elukey added a comment to T242306: No queries run in Hue.

@MMiller_WMF I found a way in Hue's UI to force the re-creation of the Hive session, that seemed working for me (I was able to repro):

Fri, Jan 10, 10:33 AM · Analytics-Kanban, User-Elukey, Analytics
elukey added a comment to T242306: No queries run in Hue.

I can see the following in the an-coord1001's db:

Fri, Jan 10, 10:06 AM · Analytics-Kanban, User-Elukey, Analytics
elukey added a comment to T241170: Access to DataGrip refused.

Added a note to https://wikitech.wikimedia.org/w/index.php?title=Analytics%2FSystems%2FCluster%2FAccess&type=revision&diff=1850233&oldid=1838749

Fri, Jan 10, 7:47 AM · User-Elukey, Analytics, GLOW

Thu, Jan 9

elukey added a comment to T240934: Enable encryption in Spark 2.4 by default.

Applied manually on analytics1030 in Hadoop test:

Thu, Jan 9, 4:10 PM · Patch-For-Review, Analytics-Kanban, Analytics
elukey claimed T240934: Enable encryption in Spark 2.4 by default.
Thu, Jan 9, 4:03 PM · Patch-For-Review, Analytics-Kanban, Analytics
elukey added a comment to T240934: Enable encryption in Spark 2.4 by default.

@joal I found https://www.ericlin.me/2018/06/oozie-spark-action-not-loading-spark-configurations/ today, there is an option listed that seems good to test:

Thu, Jan 9, 4:02 PM · Patch-For-Review, Analytics-Kanban, Analytics
elukey updated subscribers of T240934: Enable encryption in Spark 2.4 by default.

@EBernhardson hi! I am looping you in since you are our top spark user :D

Thu, Jan 9, 2:38 PM · Patch-For-Review, Analytics-Kanban, Analytics
elukey added a project to T240934: Enable encryption in Spark 2.4 by default: Analytics-Kanban.
Thu, Jan 9, 2:25 PM · Patch-For-Review, Analytics-Kanban, Analytics
elukey closed T242046: Requesting kerberos access for snowick as Resolved.
Thu, Jan 9, 2:24 PM · Analytics
elukey closed T242222: Kerberos password for user mepps as Resolved.
Thu, Jan 9, 2:23 PM · Analytics
elukey moved T241649: Investigate Hue alarms from Done to Paused on the Analytics-Kanban board.
Thu, Jan 9, 2:22 PM · Analytics-Kanban, User-Elukey, Analytics
elukey added a comment to T242046: Requesting kerberos access for snowick .
elukey@krb1001:~$ sudo manage_principals.py create snowick --email_address=snowick@wikimedia.org
Principal successfully created. Make sure to update data.yaml in Puppet.
Successfully sent email to snowick@wikimedia.org
Thu, Jan 9, 11:17 AM · Analytics