Page MenuHomePhabricator

Upgrade Druid clusters to 0.11
Closed, ResolvedPublic13 Estimated Story Points

Description

We upgraded to 0.11 in T164008, and now the next step is 0.11. We are not inclined to go up to 0.12 (current latest stable) since its documentation prohibits a rollback to a version prior 0.11.

https://github.com/druid-io/druid/releases/tag/druid-0.11.0

We'd need to:

  • package Druid 0.11 - not uploaded to apt/reprepro yet
  • deploy it in labs and verify the upgrade procedure from 0.10
  • verify that pivot works
  • verify that real time ingestion works (a fix in Tranquillity is needed)
  • verify that regular indexing works

Event Timeline

elukey triaged this task as High priority.May 3 2018, 6:23 AM
elukey created this task.

Change 431522 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::druid::common: set java 8 as default

https://gerrit.wikimedia.org/r/431522

Change 431522 merged by Elukey:
[operations/puppet@production] profile::druid::common: set java 8 as default

https://gerrit.wikimedia.org/r/431522

I found an interesting thing in the druid logs today (/var/lib/druid/indexing-logs on d-1):

Error: com.google.inject.CreationException: Unable to create injector, see the following errors:

1) Error in custom provider, java.lang.VerifyError: class com.fasterxml.jackson.datatype.guava.deser.HostAndPortDeserializer overrides final method deserialize.(Lcom/fasterxml/jacks
on/core/JsonParser;Lcom/fasterxml/jackson/databind/DeserializationContext;)Ljava/lang/Object;
  at io.druid.jackson.JacksonModule.jsonMapper(JacksonModule.java:46)
  at io.druid.jackson.JacksonModule.jsonMapper(JacksonModule.java:46)
  while locating com.fasterxml.jackson.databind.ObjectMapper annotated with interface io.druid.guice.annotations.Json
  while locating com.fasterxml.jackson.databind.ObjectMapper
    for the 1st parameter of io.druid.guice.JsonConfigurator.<init>(JsonConfigurator.java:67)
  at io.druid.guice.ConfigModule.configure(ConfigModule.java:40)
  while locating io.druid.guice.JsonConfigurator
    for the 2nd parameter of io.druid.guice.JsonConfigProvider.inject(JsonConfigProvider.java:188)
  at io.druid.guice.JsonConfigProvider.bind(JsonConfigProvider.java:133)

[..]

In http://druid.io/docs/0.11.0/operations/other-hadoop.html there is a section about CDH users experiencing the same problem.

mforns set the point value for this task to 13.May 7 2018, 3:49 PM

https://github.com/druid-io/druid/releases for 0.10.1 mention this:

Deprecation of support for Hadoop versions < 2.6.0
To add support for Amazon's S3A filesystem, Druid is now built against Hadoop 2.7.3 libraries, and we are deprecating support for Hadoop versions older than 2.6.0.

For users running a Hadoop version older than 2.6.0, it is possible to continue running Druid 0.10.1 with the older Hadoop version using a workaround.

The user would need to downgrade hadoop.compile.version in the main Druid pom.xml, remove the hadoop-aws dependency from pom.xml in the druid-hdfs-storage core extension, and then rebuild Druid.

Users are strongly encouraged to upgrade their Hadoop clusters to a 2.6.0+ version as of this release, as support for Hadoop <2.6.0 may be dropped completely in future releases.

If users wish to use Hadoop 2.7.3 as default for ingestion tasks, users should double check any existing druid.indexer.task.defaultHadoopCoordinates configurations.

So the indexing error seems to be related to a Jackson version updated indirectly by 0.10.1+ versions because of hadoop-client bumped from 2.3 to 2.7.2.

We tried to add "hadoopDependencyCoordinates": ["org.apache.hadoop:hadoop-client:cdh"] to the labs middlemanagers, and the error is now something like (reported by Yarn):

 /user/druid/deep-storage-analytics_test0/webrequest/2018-05-08T00:00:00.000Z_2018-05-08T01:00:00.000Z/2018-05-08T11:47:44.005Z/0/index.zip.1 from
hdfs://analytics-hadoop-labs/user/druid/deep-storage-analytics_test0/webrequest/2018-05-08T00:00:00.000Z_2018-05-08T01:00:00.000Z/2018-05-08T11:47:44.005Z/0/index.zip.1 is not a valid DFS filename

No more jackson errors in the indexing logs, but we don't know what this error means yet.

While testing, opened a gh issue: https://github.com/druid-io/druid/issues/5763

One of the options described in http://druid.io/docs/0.11.0/operations/other-hadoop.html is to build druid via maven lowering down the hadoop compile version to what needed, in our case 2.6.0. This seems not working due to a Guava dependency issue (see more info the the gh issue).

Finally we figured out the issue, namely our own version of the druid-hdfs-storage extension. It was working fine with hadoop-client:2.3.0 but it seems not anymore with hadoop-client:2.7.3. Using the default one seems to work fine.

Tranquillity seems not working due to https://github.com/druid-io/tranquility/blob/master/core/src/main/scala/com/metamx/tranquility/druid/DruidEnvironment.scala#L28, not needed anymore in Druid 0.11 since the logic for naming services changed.

Change 432571 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/debs/druid@debian] Release version 0.11.0-1

https://gerrit.wikimedia.org/r/432571

Change 432571 merged by Elukey:
[operations/debs/druid@debian] Release version 0.11.0-1

https://gerrit.wikimedia.org/r/432571

Change 432582 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::druid::(analytics|public)::worker: upgrade to Druid 0.11.0

https://gerrit.wikimedia.org/r/432582

@JAllemandou I've pushed all the changes for the druid package to the Druid git repo, rebuilt and deployed those packages in labs. Not yet uploaded the new debs to the Debian APT since I'd like to do it as last step.

Things to do:

  1. Sanity check that labs still works with the last debs.
  2. Find a solution for Tranquillity/KIS/etc..
  3. Schedule the deploy (will need https://gerrit.wikimedia.org/r/#/c/432582/ to skip the druid-hdfs-storage-cdh extension).

Change 433131 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/debs/druid@debian] Add the community extension for Parquet

https://gerrit.wikimedia.org/r/433131

Change 434350 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] druid: set the default druid.storage.storageDirectory even without cdh

https://gerrit.wikimedia.org/r/434350

Change 434350 merged by Elukey:
[operations/puppet@production] druid: set the default druid.storage.storageDirectory even without cdh

https://gerrit.wikimedia.org/r/434350

Change 434363 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::druid::common: add the use_cdh_hadoop_config parameter

https://gerrit.wikimedia.org/r/434363

Change 434363 merged by Elukey:
[operations/puppet@production] profile::druid::common: add the use_cdh_hadoop_config parameter

https://gerrit.wikimedia.org/r/434363

Change 434493 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::analytics_cluster::coordinator: remove refinery-relaunch-banner-streaming

https://gerrit.wikimedia.org/r/434493

Change 434493 merged by Elukey:
[operations/puppet@production] role::analytics_cluster::coordinator: remove refinery-relaunch-banner-streaming

https://gerrit.wikimedia.org/r/434493

Change 432582 merged by Elukey:
[operations/puppet@production] role::druid::analytics::worker: upgrade to Druid 0.11.0

https://gerrit.wikimedia.org/r/432582

Change 434503 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::druid::analytics::worker: correct hdfs extension

https://gerrit.wikimedia.org/r/434503

Change 434503 merged by Elukey:
[operations/puppet@production] role::druid::analytics::worker: correct hdfs extension

https://gerrit.wikimedia.org/r/434503

Change 433131 merged by Elukey:
[operations/debs/druid@debian] Add the community extension for Parquet

https://gerrit.wikimedia.org/r/433131

Change 434520 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/debs/druid@debian] Remove mysql-metadata-storage-0.10.0.jar

https://gerrit.wikimedia.org/r/434520

Change 434520 merged by Elukey:
[operations/debs/druid@debian] Remove mysql-metadata-storage-0.10.0.jar

https://gerrit.wikimedia.org/r/434520

Change 434533 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::prometheus::alerts: disable druid realtime alert

https://gerrit.wikimedia.org/r/434533

Change 434533 merged by Elukey:
[operations/puppet@production] profile::prometheus::alerts: disable druid realtime alert

https://gerrit.wikimedia.org/r/434533

Change 434649 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::druid::public:worker: upgrade settings for Druid 0.11

https://gerrit.wikimedia.org/r/434649

Change 434649 merged by Elukey:
[operations/puppet@production] role::druid::public:worker: upgrade settings for Druid 0.11

https://gerrit.wikimedia.org/r/434649

Both clusters upgraded to 0.11!

Vvjjkkii renamed this task from Upgrade Druid clusters to 0.11 to fqdaaaaaaa.Jul 1 2018, 1:12 AM
Vvjjkkii reopened this task as Open.
Vvjjkkii removed elukey as the assignee of this task.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed the point value for this task.
Vvjjkkii removed subscribers: gerritbot, Aklapper.
Community_Tech_bot renamed this task from fqdaaaaaaa to Upgrade Druid clusters to 0.11.Jul 1 2018, 6:05 AM
Community_Tech_bot closed this task as Resolved.
Community_Tech_bot assigned this task to elukey.
Community_Tech_bot set the point value for this task to 13.
Community_Tech_bot updated the task description. (Show Details)