Upgrade Druid to its latest upstream version (currently 0.19.0)
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	elukey
	Feb 6 2020, 1:46 PM

Description

https://github.com/apache/druid/releases/tag/druid-0.19.0

Druid is currently at version 0.19.0, and our version 0.12.3 is starting to lag too much from upstream. This means more painful upgrades and more things to check/consider as new releases comes out. Two major motivations to upgrade:

Having the last security auth/encryption features (useful to integrate Kerberos for example)
Hopefully solving problems like T226035

Things to do/keep-in-mind:

Druid is now a top level project, it went out from the incubator so some APIs etc.. might have changed.
Check the prometheus exporter before calling it done since it will surely require adjustments.
Check the full list of changes in the changelog to figure out if any change will affect our current settings in a negative way.
Follow up on https://groups.google.com/forum/#!msg/druid-user/Lvkhcj6M1C4/2-drOFLWCAAJ if brokers lock up again

Details

Subject	Repo	Branch	Lines +/-
Update README.Debian after the 0.19 release	operations/debs/druid	debian	+10 -16
druid: fix monitoring configuration	operations/puppet	production	+14 -16
druid: puppet cleanup after upgrading all clusters to 0.19	operations/puppet	production	+7 -34
prometheus::druid_exporter: adjust metric list for Druid 0.19	operations/puppet	production	+2 -0
role::druid::analytics::worker: upgrade druid to 0.19	operations/puppet	production	+6 -2
druid: add cache monitoring for 0.19 clusters	operations/puppet	production	+4 -2
Update Druid Parquet ingestion format class after cluster upgrade	analytics/refinery	master	+1 -1
profile::druid::historical: set class prefixes like the other daemons	operations/puppet	production	+9 -1
role::druid::public::worker: update settings for 0.19	operations/puppet	production	+5 -3
druid: add metrics for version 0.19 and update Druid test's config	operations/puppet	production	+400 -2
role::druid::test_analytics::worker: set middlemanager java opts	operations/puppet	production	+1 -1
druid: allow different package/class prefixes for logging/alarming	operations/puppet	production	+59 -8
role::druid::test_analytics::worker: use CDH hadoop client	operations/puppet	production	+1 -0

Related Objects
Search...

Status	Assigned	Task
		Restricted Task
Open	None	T250484 Add authentication and encryption to Druid Analytics clients
Resolved	elukey	T244482 Upgrade Druid to its latest upstream version (currently 0.19.0)

Event Timeline

elukey created this task.Feb 6 2020, 1:46 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 6 2020, 1:46 PM

• fdans triaged this task as Medium priority.Feb 6 2020, 5:58 PM

• fdans moved this task from Incoming to Operational Excellence on the Analytics board.

elukey renamed this task from Upgrade Druid to its latest upstream version (currently 0.17) to Upgrade Druid to its latest upstream version (currently 0.17.1).Apr 15 2020, 10:10 AM

elukey added a project: User-Elukey.

elukey updated the task description. (Show Details)

elukey mentioned this in T250484: Add authentication and encryption to Druid Analytics clients.Apr 17 2020, 1:14 PM

elukey added a parent task: T250484: Add authentication and encryption to Druid Analytics clients.

Milimetric updated the task description. (Show Details)Apr 21 2020, 1:24 AM

elukey renamed this task from Upgrade Druid to its latest upstream version (currently 0.17.1) to Upgrade Druid to its latest upstream version (currently 0.18.1).May 18 2020, 4:21 PM

elukey updated the task description. (Show Details)

Highlights for release 0.13: https://github.com/apache/druid/releases/tag/druid-0.13.0-incubating

Things to consider:

automatic compaction of segments has been added, that may mean IIUC that we could automatically compact hourly segments into daily, or daily into monthly, etc..
the mysql driver is not shipped anymore with the Druid stuff, may mean that we need to add it manually (but IIRC we already do something similar).

Highlights for release 0.14: https://github.com/apache/druid/releases/tag/druid-0.14.0-incubating

New console (merges coordinator and historical ones) https://druid.apache.org/docs/0.14.0-incubating/operations/management-uis.html
The new console seems to need a new daemon, the router, but we need to understand if we need it or not.
Interesting change about the parquet extension (will review it with @JAllemandou just to be sure)
The hadoop version supported is now 2.8.3, so the docs suggest to recompile druid with a different version if needed. I hope we'll not need it but worth to keep it in mind (it should cause issues..)

Highlights for 0.15: https://github.com/apache/druid/releases/tag/druid-0.15.0-incubating

new Druid data load UI, seems interesting for spec ingestion.

given note about automatic compaction let's skip trying to do that:

Note: This is the initial implementation and has limitations on interoperability with realtime ingestion tasks. Indexing tasks currently require acquisition of a lock on the portion of the timeline 
they will be modifying to prevent inconsistencies from concurrent operations.

elukey added a project: Analytics-Clusters.Jun 10 2020, 2:42 PM

elukey moved this task from Backlog to Q1 2020/2021 on the Analytics-Clusters board.

Aklapper removed a project: Analytics.Jul 4 2020, 7:59 AM

In https://github.com/apache/druid/releases/tag/druid-0.16.0-incubating I see that Middlemanagers/Peons can be replaced with the "indexer", a new multi-thread daemon.

In https://github.com/apache/druid/releases/tag/druid-0.17.0 the batch ingestion code has been reworked, we might need to change our JSON configs (to be checked).

In https://github.com/apache/druid/releases/tag/druid-0.18.0 there is full support for join operators and initial support for Java 11.

Change 613084 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::druid::test_analytics::worker: use CDH hadoop client

https://gerrit.wikimedia.org/r/613084

Change 613084 merged by Elukey:
[operations/puppet@production] role::druid::test_analytics::worker: use CDH hadoop client

https://gerrit.wikimedia.org/r/613084

I tried 0.19.0 with the -Dhadoop.mapreduce.job.classloader=true option for the middlemanager, and I was finally able to make the indexation working with the hadoop 2.8.5 drivers (so no need for cdh ones anymore).

COOL!

Change 615759 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] druid: allow different package/class prefixes for logging/alarming

https://gerrit.wikimedia.org/r/615759

Change 615759 merged by Elukey:
[operations/puppet@production] druid: allow different package/class prefixes for logging/alarming

https://gerrit.wikimedia.org/r/615759

Change 615762 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::druid::test_analytics::worker: set middlemanager java opts

https://gerrit.wikimedia.org/r/615762

Change 615762 merged by Elukey:
[operations/puppet@production] role::druid::test_analytics::worker: set middlemanager java opts

https://gerrit.wikimedia.org/r/615762

Tested some indexing jobs;

navtiming hive2druid
webrequest json
webrequest parquet

The last two thanks to Joseph's help :). Everything works as expected, I also tried from the coordinator's ui to drop a datasource etc.. and it works nicely.

I noticed though the following strange behavior from the middlemanager, when the hadoop map-reduce job fails for some reason:

2020-07-24T06:44:34,700 INFO org.apache.druid.indexer.JobHelper: Deleting path[/tmp/druid-indexing/test_webrequest_new_hadoop/2020-07-24T064311.660Z_5d3b1f48fbc244be97c7dbe42c49bef5]
2020-07-24T06:44:34,999 INFO org.apache.druid.indexing.worker.executor.ExecutorLifecycle: Task completed with status: {
  "id" : "index_hadoop_test_webrequest_new_hadoop_cldagnim_2020-07-24T06:43:11.653Z",
  "status" : "FAILED",
  "duration" : 74867,
  "errorMsg" : "{\"attempt_1594733405064_2112_m_000009_1\":\"Error: org.apache.druid.java.util.common.RE: Failure on ro...",
  "location" : {
    "host" : null,
    "port" : -1,
    "tlsPort" : -1
  }
}
2020-07-24T06:44:35,008 INFO org.apache.druid.java.util.common.lifecycle.Lifecycle: Stopping lifecycle [module] stage [ANNOUNCEMENTS]
2020-07-24T06:44:35,011 INFO org.apache.druid.java.util.common.lifecycle.Lifecycle: Stopping lifecycle [module] stage [SERVER]
2020-07-24T06:44:35,016 INFO org.eclipse.jetty.server.AbstractConnector: Stopped ServerConnector@36f40d72{HTTP/1.1,[http/1.1]}{0.0.0.0:8200}
2020-07-24T06:44:35,016 INFO org.eclipse.jetty.server.session: node0 Stopped scavenging
2020-07-24T06:44:35,019 INFO org.eclipse.jetty.server.handler.ContextHandler: Stopped o.e.j.s.ServletContextHandler@3bc18fec{/,null,UNAVAILABLE}
2020-07-24T06:44:35,021 INFO org.apache.druid.java.util.common.lifecycle.Lifecycle: Stopping lifecycle [module] stage [NORMAL]
2020-07-24T06:44:35,022 INFO org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner: Starting graceful shutdown of task[index_hadoop_test_webrequest_new_hadoop_cldagnim_2020-07-24T06:43:11.653Z].

[..]

2020-07-24T06:44:39,457 WARN org.apache.hadoop.ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
2020-07-24T06:44:39,464 INFO org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider: Failing over to analytics1029-eqiad-wmnet
2020-07-24T06:44:39,466 WARN org.apache.hadoop.ipc.Client: Failed to connect to server: analytics1029.eqiad.wmnet/10.64.36.129:8032: retries get failed due to exceeded maximum allowed retries number: 0
java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_252]
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:714) ~[?:1.8.0_252]
	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) ~[hadoop-common-2.8.5.jar:?]
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) ~[hadoop-common-2.8.5.jar:?]
	at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:685) ~[hadoop-common-2.8.5.jar:?]
	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:788) ~[hadoop-common-2.8.5.jar:?]
	at org.apache.hadoop.ipc.Client$Connection.access$3500(Client.java:410) ~[hadoop-common-2.8.5.jar:?]
	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1550) ~[hadoop-common-2.8.5.jar:?]
	at org.apache.hadoop.ipc.Client.call(Client.java:1381) ~[hadoop-common-2.8.5.jar:?]
	at org.apache.hadoop.ipc.Client.call(Client.java:1345) ~[hadoop-common-2.8.5.jar:?]
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) ~[hadoop-common-2.8.5.jar:?]
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) ~[hadoop-common-2.8.5.jar:?]
	at com.sun.proxy.$Proxy318.getApplicationReport(Unknown Source) ~[?:?]
	at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:228) ~[hadoop-yarn-common-2.8.5.jar:?]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_252]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_252]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_252]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_252]
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:409) ~[hadoop-common-2.8.5.jar:?]
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163) ~[hadoop-common-2.8.5.jar:?]
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155) ~[hadoop-common-2.8.5.jar:?]
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) ~[hadoop-common-2.8.5.jar:?]
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:346) ~[hadoop-common-2.8.5.jar:?]
	at com.sun.proxy.$Proxy319.getApplicationReport(Unknown Source) ~[?:?]
	at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:480) ~[hadoop-yarn-client-2.8.5.jar:?]
	at org.apache.hadoop.mapred.ResourceMgrDelegate.getApplicationReport(ResourceMgrDelegate.java:314) ~[hadoop-mapreduce-client-jobclient-2.8.5.jar:?]
	at org.apache.hadoop.mapred.ClientServiceDelegate.getProxy(ClientServiceDelegate.java:155) ~[hadoop-mapreduce-client-jobclient-2.8.5.jar:?]
	at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:324) ~[hadoop-mapreduce-client-jobclient-2.8.5.jar:?]
	at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:429) ~[hadoop-mapreduce-client-jobclient-2.8.5.jar:?]
	at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:617) ~[hadoop-mapreduce-client-jobclient-2.8.5.jar:?]
	at org.apache.hadoop.mapreduce.Cluster.getJob(Cluster.java:207) ~[hadoop-mapreduce-client-core-2.8.5.jar:?]
	at org.apache.hadoop.mapreduce.tools.CLI.getJob(CLI.java:547) ~[hadoop-mapreduce-client-core-2.8.5.jar:?]
	at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:304) ~[hadoop-mapreduce-client-core-2.8.5.jar:?]
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) ~[hadoop-common-2.8.5.jar:?]
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) ~[hadoop-common-2.8.5.jar:?]
	at org.apache.druid.indexing.common.task.HadoopIndexTask$HadoopKillMRJobIdProcessingRunner.runTask(HadoopIndexTask.java:768) ~[druid-indexing-service-0.19.0.jar:0.19.0]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_252]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_252]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_252]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_252]
	at org.apache.druid.indexing.common.task.HadoopIndexTask.killHadoopJob(HadoopIndexTask.java:492) ~[druid-indexing-service-0.19.0.jar:0.19.0]
	at org.apache.druid.indexing.common.task.HadoopIndexTask.lambda$runInternal$0(HadoopIndexTask.java:311) ~[druid-indexing-service-0.19.0.jar:0.19.0]
	at org.apache.druid.indexing.common.task.TaskResourceCleaner.clean(TaskResourceCleaner.java:50) [druid-indexing-service-0.19.0.jar:0.19.0]
	at org.apache.druid.indexing.common.task.AbstractBatchIndexTask.stopGracefully(AbstractBatchIndexTask.java:132) [druid-indexing-service-0.19.0.jar:0.19.0]
	at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner.stop(SingleTaskBackgroundRunner.java:186) [druid-indexing-service-0.19.0.jar:0.19.0]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_252]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_252]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_252]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_252]
	at org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler.stop(Lifecycle.java:465) [druid-core-0.19.0.jar:0.19.0]
	at org.apache.druid.java.util.common.lifecycle.Lifecycle.stop(Lifecycle.java:368) [druid-core-0.19.0.jar:0.19.0]
	at org.apache.druid.cli.CliPeon.run(CliPeon.java:306) [druid-services-0.19.0.jar:0.19.0]
	at org.apache.druid.cli.Main.main(Main.java:113) [druid-services-0.19.0.jar:0.19.0]

Eventually it stops logging errors, but it seems as if the shutdown workflow for the failed peon gets done without krb credentials.

Tried a kafka supervisor for netflow, plus some queries to the broker, everything looks good. I'll file a gh issue with the above description to upstream, but it doesn't seem to be something that would stop the upgrade.

Next thing to do is to create a config for the druid prometheus exporter, there are surely new/old metrics to check/drop/add etc..

Change 616025 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] druid: add metrics for version 0.19 and update Druid test's config

https://gerrit.wikimedia.org/r/616025

Change 616025 merged by Elukey:
[operations/puppet@production] druid: add metrics for version 0.19 and update Druid test's config

https://gerrit.wikimedia.org/r/616025

Created https://github.com/apache/druid/issues/10209

So current status: I'd love to get somebody else to test the new Druid version, but from my point of view we should be ready to upgrade the Analytics cluster.

Test that I did:

one off parquet ingestion with:

curl -X 'POST' -H 'Content-Type:application/json' -d '{
  "type" : "index_hadoop",
  "spec" : {
    "ioConfig" : {
      "type" : "hadoop",
      "inputSpec" : {
        "type" : "static",
        "inputFormat": "org.apache.druid.data.input.parquet.DruidParquetInputFormat",
        "paths" : "hdfs://analytics-test-hadoop/wmf/data/wmf/webrequest/webrequest_source=test_text/year=2020/month=6/day=24/hour=6"
      }
    },
    "dataSchema" : {
      "dataSource" : "test_webrequest_new_hadoop",
      "granularitySpec" : {
        "type" : "uniform",
        "segmentGranularity" : "hour",
        "queryGranularity" : "second",
        "intervals" : [ "2020-06-24/2020-06-25" ]
      },
      "parser" : {
        "type" : "parquet",
        "parseSpec" : {
          "format" : "timeAndDims",
          "dimensionsSpec" : {
            "dimensions" : [
                "webrequest_source",
                "hostname",
                "time_firstbyte",
                "ip",
                "http_status",
                "response_size",
                "http_method",
                "uri_host",
                "uri_path",
                "uri_query",
                "content_type",
                "referer",
                "user_agent",
                "x_cache",
                "continent",
                "country_code",
                "isp",
                "as_number",
                "is_pageview",
                "tls_version",
                "tls_key_exchange",
                "tls_auth",
                "tls_cipher"
            ]
          },
          "timestampSpec" : {
            "format" : "auto",
            "column" : "dt"
          }
        }
      },
      "metricsSpec" : [
        {
          "name" : "hits",
          "type" : "longSum",
          "fieldName" : "hits"
        },
        {
          "name" : "aggregated_response_size",
          "type" : "longSum",
          "fieldName" : "response_size"
        }
      ]
    },
    "tuningConfig" : {
      "type" : "hadoop",
      "overwriteFiles": true,
      "ignoreInvalidRows" : false,
      "partitionsSpec" : {
        "type" : "hashed",
        "numShards" : 1
      },
      "jobProperties" : {
        "mapreduce.output.fileoutputformat.compress": "org.apache.hadoop.io.compress.GzipCodec",
        "mapreduce.job.queuename": "default"
      }
    }
  }
}' http://analytics1041:8090/druid/indexer/v1/task

netflow realtime ingestion (kafka-supervisor)

curl -X 'POST' -H 'Content-Type:application/json' -d '{
  "type": "kafka",
  "dataSchema": {
    "dataSource": "wmf_netflow",
    "parser": {
      "type": "string",
      "parseSpec": {
        "format": "json",
        "flattenSpec": {
          "useFieldDiscovery": false,
          "fields": [
              "as_dst","as_path","peer_as_dst","as_src","ip_dst","ip_proto","ip_src","peer_as_src","port_dst","port_src","country_ip_src","country_ip_dst","tag2","tcp_flags","packets","bytes","peer_ip_src"
          ]
        },
        "timestampSpec": {
          "column": "stamp_inserted",
          "format": "auto"
        },
        "dimensionsSpec": {
          "dimensions": [
              "as_dst","as_path","peer_as_dst","as_src","ip_dst","ip_proto","ip_src","peer_as_src","port_dst","port_src","country_ip_src","country_ip_dst","tag2","tcp_flags","peer_ip_src"
          ]
        }
      }
    },
    "transformSpec": {},
    "metricsSpec": [
      {
        "name": "count",
        "type": "count"
      },
      {
        "name": "bytes",
        "type": "doubleSum",
        "fieldName": "bytes"
      },
      {
        "name": "packets",
        "type": "doubleSum",
        "fieldName": "packets"
      }
    ],
    "granularitySpec": {
      "type": "uniform",
      "segmentGranularity": "HOUR",
      "queryGranularity": "SECOND"
    }
  },
  "tuningConfig": {
    "type": "kafka",
    "maxRowsPerSegment": 5000000
  },
  "ioConfig": {
    "topic": "netflow",
    "consumerProperties": {
      "bootstrap.servers": "kafka-jumbo1001.eqiad.wmnet:9092"
    },
    "taskCount": 1,
    "replicas": 1,
    "taskDuration": "PT1H"
  }
}' http://analytics1041:8090/druid/indexer/v1/supervisor

one off webrequest ingestion (JSON - Set up by Joseph before leaving for holidays):

curl -X 'POST' -H 'Content-Type:application/json' -d '{
  "type" : "index_hadoop",
  "spec" : {
    "ioConfig" : {
      "type" : "hadoop",
      "inputSpec" : {
        "type" : "static",
        "paths" : "hdfs://analytics-test-hadoop/user/joal/webrequest_druid_test"
      }
    },
    "dataSchema" : {
      "dataSource" : "test_webrequest_new_hadoop",
      "granularitySpec" : {
        "type" : "uniform",
        "segmentGranularity" : "hour",
        "queryGranularity" : "second",
        "intervals" : [ "2020-06-25/2020-06-26" ]
      },
      "parser" : {
        "type" : "string",
        "parseSpec" : {
          "format" : "json",
          "dimensionsSpec" : {
            "dimensions" : [
                "webrequest_source",
                "hostname",
                "time_firstbyte",
                "ip",
                "http_status",
                "response_size",
                "http_method",
                "uri_host",
                "uri_path",
                "uri_query",
                "content_type",
                "referer",
                "user_agent",
                "x_cache",
                "continent",
                "country_code",
                "isp",
                "as_number",
                "is_pageview",
                "tls_version",
                "tls_key_exchange",
                "tls_auth",
                "tls_cipher"
            ]
          },
          "timestampSpec" : {
            "format" : "auto",
            "column" : "dt"
          }
        }
      },
      "metricsSpec" : [
        {
          "name" : "hits",
          "type" : "longSum",
          "fieldName" : "hits"
        },
        {
          "name" : "aggregated_response_size",
          "type" : "longSum",
          "fieldName" : "response_size"
        }
      ]
    },
    "tuningConfig" : {
      "type" : "hadoop",
      "overwriteFiles": true,
      "ignoreInvalidRows" : false,
      "partitionsSpec" : {
        "type" : "hashed",
        "numShards" : 1
      },
      "jobProperties" : {
        "mapreduce.output.fileoutputformat.compress": "org.apache.hadoop.io.compress.GzipCodec",
        "mapreduce.job.queuename": "default"
      }
    }
  },
  "hadoopDependencyCoordinates" : [ "org.apache.hadoop:hadoop-client:cdh" ]
}' http://analytics1041:8090/druid/indexer/v1/task

Uou, after a couple trials, managed to test it properly, and it looks it's working!
This ingestion spec re-compacts already indexed hourly data into daily segments for event_navigationtiming.

curl -X 'POST' -H 'Content-Type:application/json' -d '{
  "type" : "index_hadoop",
  "spec" : {
    "ioConfig" : {
      "type" : "hadoop",
      "inputSpec" : {
        "type" : "dataSource",
        "ingestionSpec" : {
          "dataSource": "event_navigationtiming",
          "intervals": [ "2020-07-25/2020-07-26" ]
        }
      }
    },
    "dataSchema" : {
      "dataSource" : "event_navigationtiming",
      "granularitySpec" : {
        "type" : "uniform",
        "segmentGranularity" : "day",
        "queryGranularity" : "second",
        "intervals" : [ "2020-07-25/2020-07-26" ]
      },
      "parser" : {
        "type" : "string",
        "parseSpec" : {
          "format" : "json",
          "dimensionsSpec" : {
            "dimensions" : [
              "event_action",
              "event_isAnon",
              "event_isOversample",
              "event_mediaWikiVersion",
              "event_mobileMode",
              "event_namespaceId",
              "event_netinfoEffectiveConnectionType",
              "event_originCountry",
              "recvFrom",
              "revision",
              "useragent_browser_family",
              "useragent_browser_major",
              "useragent_device_family",
              "useragent_is_bot",
              "useragent_os_family",
              "useragent_os_major",
              "wiki",
              "event_connectEnd_buckets",
              "event_connectStart_buckets",
              "event_dnsLookup_buckets",
              "event_domComplete_buckets",
              "event_domInteractive_buckets",
              "event_fetchStart_buckets",
              "event_firstPaint_buckets",
              "event_loadEventEnd_buckets",
              "event_loadEventStart_buckets",
              "event_redirecting_buckets",
              "event_requestStart_buckets",
              "event_responseEnd_buckets",
              "event_responseStart_buckets",
              "event_secureConnectionStart_buckets",
              "event_unload_buckets",
              "event_gaps_buckets",
              "event_mediaWikiLoadEnd_buckets",
              "event_RSI_buckets"
            ]
          },
          "timestampSpec" : {
            "format" : "auto",
            "column" : "dt"
          }
        }
      },
      "metricsSpec" : [
        {
          "type": "count",
          "name": "count"
        }
      ]
    },
    "tuningConfig" : {
      "type" : "hadoop",
      "overwriteFiles": true,
      "ignoreInvalidRows" : false,
      "partitionsSpec" : {
        "type" : "hashed",
        "numShards" : 1
      },
      "jobProperties" : {
        "mapreduce.output.fileoutputformat.compress": "org.apache.hadoop.io.compress.GzipCodec",
        "mapreduce.job.queuename": "default"
      }
    }
  }
}' http://analytics1041:8090/druid/indexer/v1/task

Tomorrow will do a couple more tests (hopefully faster).

Sanitization by reloading data (or re-compacting) leaving out some fields, works as well.
The test I did was using the same ingestion spec as above with some fields removed.

With these tests and the 3 Luca did in the first place, I think we're good.

Change 617382 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::druid::public::worker: update settings for 0.19

https://gerrit.wikimedia.org/r/617382

While reading https://druid.apache.org/docs/latest/operations/rolling-updates.html I noticed that the coordinator and the overlord daemons can now be merged into one (namely the coordinator takes over the overlord's functionalities). It is not mandatory but I'd try to do it after this upgrade, it seems a nice step forward.

Change 617394 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/refinery@master] Update Druid Parquet ingestion format class after cluster upgrade

https://gerrit.wikimedia.org/r/617394

Change 617382 merged by Elukey:
[operations/puppet@production] role::druid::public::worker: update settings for 0.19

https://gerrit.wikimedia.org/r/617382

Change 617421 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::druid::historical: set class prefixes like the other daemons

https://gerrit.wikimedia.org/r/617421

Change 617421 merged by Elukey:
[operations/puppet@production] profile::druid::historical: set class prefixes like the other daemons

https://gerrit.wikimedia.org/r/617421

Change 617394 merged by Elukey:
[analytics/refinery@master] Update Druid Parquet ingestion format class after cluster upgrade

https://gerrit.wikimedia.org/r/617394

Change 617604 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] druid: add cache monitoring for 0.19 clusters

https://gerrit.wikimedia.org/r/617604

Change 617604 merged by Elukey:
[operations/puppet@production] druid: add cache monitoring for 0.19 clusters

https://gerrit.wikimedia.org/r/617604

The upgrade of the public cluster went fine, I think that next week we'll be able to upgrade the Analytics one without too many problems.

RhinosF1 subscribed.Jul 31 2020, 9:37 AM

Change 618005 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::druid::analytics::worker: upgrade druid to 0.19

https://gerrit.wikimedia.org/r/618005

elukey renamed this task from Upgrade Druid to its latest upstream version (currently 0.18.1) to Upgrade Druid to its latest upstream version (currently 0.19.0).Aug 3 2020, 9:46 AM

elukey updated the task description. (Show Details)

Change 618005 merged by Elukey:
[operations/puppet@production] role::druid::analytics::worker: upgrade druid to 0.19

https://gerrit.wikimedia.org/r/618005

Change 618244 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] prometheus::druid_exporter: adjust metric list for Druid 0.19

https://gerrit.wikimedia.org/r/618244

Change 618244 merged by Elukey:
[operations/puppet@production] prometheus::druid_exporter: adjust metric list for Druid 0.19

https://gerrit.wikimedia.org/r/618244

Change 618506 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] druid: puppet cleanup after upgrading all clusters to 0.19

https://gerrit.wikimedia.org/r/618506

Change 618506 merged by Elukey:
[operations/puppet@production] druid: puppet cleanup after upgrading all clusters to 0.19

https://gerrit.wikimedia.org/r/618506

All done!

elukey claimed this task.Aug 5 2020, 9:54 AM

elukey added a project: Analytics-Kanban.

elukey set Final Story Points to 13.

elukey moved this task from Next Up to Done on the Analytics-Kanban board.

• Nuria closed this task as Resolved.Aug 5 2020, 10:59 PM

Change 618705 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] druid: fix monitoring configuration

https://gerrit.wikimedia.org/r/618705

Change 618705 merged by Elukey:
[operations/puppet@production] druid: fix monitoring configuration

https://gerrit.wikimedia.org/r/618705

Change 619993 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/debs/druid@debian] Update README.Debian after the 0.19 release

https://gerrit.wikimedia.org/r/619993

Change 619993 merged by Elukey:
[operations/debs/druid@debian] Update README.Debian after the 0.19 release

https://gerrit.wikimedia.org/r/619993

Upgrade Druid to its latest upstream version (currently 0.19.0)Closed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Upgrade Druid to its latest upstream version (currently 0.19.0)
Closed, ResolvedPublic
Actions

Related Objects
Search...