Druid 0.9.2 has several performance improvements and the ability to group by month granularities. This will be very useful while planning and prototyping the Wikistats 2.0 backend so we decided to upgrade Druid twice. Once to 0.9.2 which should be quick, and later to 0.10 which requires the whole cluster to be upgraded to Java 8.
Description
Details
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Duplicate | Ottomata | T157977 Upgrade druid | |||
| Resolved | Ottomata | T170590 Upgrade Druid to 0.9.2 as a temporary measure |
Event Timeline
Hm, building this deb was kind of annoying, because I've already imported 0.10 into the git repo. Here's what I did:
git clone ssh://otto@gerrit.wikimedia.org:29418/operations/debs/druid
# make a new master / upstream branch to use starting from before the 0.10 import
git checkout -b master-0.9.2 0fc8f987982698a458f62f850c6ac617df1c002c
# Make a special debian branch for this
git checkout -b debian-0.9.2
# import the tarball
git-import-orig -u 0.9.2 --upstream-branch=master-0.9.2 --debian-branch=debian-0.9.2 ../druid-0.9.2-bin.tar.gz
# WTH, the debian/ dir was deleted in ^ merge. Resurrect it
git checkout 0fc8f987982698a458f62f850c6ac617df1c002c -- debian
git add debian && git commit
# edit gbp.conf and change upstream-branch and debian branch to the -0.9.2 versions
vim debian/gbp.conf
# update include/binaries
find {debian,extensions,hadoop-dependencies,lib} -name "*.jar" > debian/source/include-binaries
# increment changelog
dch -i
git commit -a
GIT_PBUILDER_AUTOCONF=no DIST=jessie WIKIMEDIA=yes git-buildpackage -sa -us -uc --git-builder=git-pbuilder --source-option="--include-removal"Change 369432 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Change default druid extension load for 0.9.2 upgrade
Change 369432 merged by Ottomata:
[operations/puppet@production] Change default druid extension load for 0.9.2 upgrade
Hm, welp, I had to rollback. I only ever restarted the historical node on druid1001. After it finished loading all its historical indexes, a few of them failed:
2017-08-01T17:37:54,127 ERROR io.druid.server.coordination.ZkCoordinator: Failed to load segment for dataSource: {class=io.druid.server.coordination.ZkCoordinator, exceptionType=class io.druid.segment.loading.SegmentLoadingException, exceptionMessage=Exception loading segment[unique-devices-per-domain-daily_2017-06-27T00:00:00.000Z_2017-06-28T00:00:00.000Z_2017-06-28T01:21:33.295Z], segment=DataSegment{size=604086, shardSpec=HashBasedNumberedShardSpec{partitionNum=0, partitions=1, partitionDimensions=[]}, metrics=[uniques_underestimate, uniques_offset, uniques_estimate], dimensions=[domain, country, country_code], version='2017-06-28T01:21:33.295Z', loadSpec={type=hdfs, path=hdfs://analytics-hadoop/user/druid/deep-storage/unique-devices-per-domain-daily/20170627T000000.000Z_20170628T000000.000Z/2017-06-28T01_21_33.295Z/0/index.zip}, interval=2017-06-27T00:00:00.000Z/2017-06-28T00:00:00.000Z, dataSource='unique-devices-per-domain-daily', binaryVersion='9'}}
io.druid.segment.loading.SegmentLoadingException: Exception loading segment[unique-devices-per-domain-daily_2017-06-27T00:00:00.000Z_2017-06-28T00:00:00.000Z_2017-06-28T01:21:33.295Z]
at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:310) ~[druid-server-0.9.2.jar:0.9.2]
at io.druid.server.coordination.ZkCoordinator.addSegment(ZkCoordinator.java:351) [druid-server-0.9.2.jar:0.9.2]
at io.druid.server.coordination.SegmentChangeRequestLoad.go(SegmentChangeRequestLoad.java:44) [druid-server-0.9.2.jar:0.9.2]
at io.druid.server.coordination.ZkCoordinator$1.childEvent(ZkCoordinator.java:153) [druid-server-0.9.2.jar:0.9.2]
at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:522) [curator-recipes-2.11.0.jar:?]
at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:516) [curator-recipes-2.11.0.jar:?]
at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:93) [curator-framework-2.11.0.jar:?]
at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) [guava-16.0.1.jar:?]
at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:84) [curator-framework-2.11.0.jar:?]
at org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners(PathChildrenCache.java:513) [curator-recipes-2.11.0.jar:?]
at org.apache.curator.framework.recipes.cache.EventOperation.invoke(EventOperation.java:35) [curator-recipes-2.11.0.jar:?]
at org.apache.curator.framework.recipes.cache.PathChildrenCache$9.run(PathChildrenCache.java:773) [curator-recipes-2.11.0.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:473) [?:1.7.0_131]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) [?:1.7.0_131]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:473) [?:1.7.0_131]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) [?:1.7.0_131]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_131]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_131]
at java.lang.Thread.run(Thread.java:745) [?:1.7.0_131]
Caused by: java.lang.IllegalArgumentException: Could not resolve type id 'hdfs' into a subtype of [simple type, class io.druid.segment.loading.LoadSpec]
at [Source: N/A; line: -1, column: -1]
at com.fasterxml.jackson.databind.ObjectMapper._convert(ObjectMapper.java:2774) ~[jackson-databind-2.4.6.jar:2.4.6]
at com.fasterxml.jackson.databind.ObjectMapper.convertValue(ObjectMapper.java:2700) ~[jackson-databind-2.4.6.jar:2.4.6]
at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegmentFiles(SegmentLoaderLocalCacheManager.java:142) ~[druid-server-0.9.2.jar:0.9.2]
at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegment(SegmentLoaderLocalCacheManager.java:95) ~[druid-server-0.9.2.jar:0.9.2]
at io.druid.server.coordination.ServerManager.loadSegment(ServerManager.java:152) ~[druid-server-0.9.2.jar:0.9.2]
at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:306) ~[druid-server-0.9.2.jar:0.9.2]
... 18 more
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Could not resolve type id 'hdfs' into a subtype of [simple type, class io.druid.segment.loading.LoadSpec]
at [Source: N/A; line: -1, column: -1]
at com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:148) ~[jackson-databind-2.4.6.jar:2.4.6]
at com.fasterxml.jackson.databind.DeserializationContext.unknownTypeException(DeserializationContext.java:862) ~[jackson-databind-2.4.6.jar:2.4.6]
at com.fasterxml.jackson.databind.jsontype.impl.TypeDeserializerBase._findDeserializer(TypeDeserializerBase.java:167) ~[jackson-databind-2.4.6.jar:2.4.6]
at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedForId(AsPropertyTypeDeserializer.java:99) ~[jackson-databind-2.4.6.jar:2.4.6]
at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject(AsPropertyTypeDeserializer.java:84) ~[jackson-databind-2.4.6.jar:2.4.6]
at com.fasterxml.jackson.databind.deser.AbstractDeserializer.deserializeWithType(AbstractDeserializer.java:132) ~[jackson-databind-2.4.6.jar:2.4.6]
at com.fasterxml.jackson.databind.deser.impl.TypeWrappedDeserializer.deserialize(TypeWrappedDeserializer.java:41) ~[jackson-databind-2.4.6.jar:2.4.6]
at com.fasterxml.jackson.databind.ObjectMapper._convert(ObjectMapper.java:2769) ~[jackson-databind-2.4.6.jar:2.4.6]
at com.fasterxml.jackson.databind.ObjectMapper.convertValue(ObjectMapper.java:2700) ~[jackson-databind-2.4.6.jar:2.4.6]
at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegmentFiles(SegmentLoaderLocalCacheManager.java:142) ~[druid-server-0.9.2.jar:0.9.2]
at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegment(SegmentLoaderLocalCacheManager.java:95) ~[druid-server-0.9.2.jar:0.9.2]
at io.druid.server.coordination.ServerManager.loadSegment(ServerManager.java:152) ~[druid-server-0.9.2.jar:0.9.2]
at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:306) ~[druid-server-0.9.2.jar:0.9.2]
... 18 more@elukey, let's hold off on this for now. Upgrading is going to be more delicate than we hoped.
! I think we did it. Yesterday I was juggling just too much stuff to realize what I had done wrong.
https://gerrit.wikimedia.org/r/#/c/355469/ had not yet been merged. This patch automates recreation of the hdfs extension .jar symlinks if the extension doesn't exist. (Previously it only did this during the first druid node provision puppet run). The hdfs extension symlinks need recreated after every upgrade. I didn't do this, which caused the historical node not to know how to load things out of hdfs.
Today we upgraded more properly, and everything looks fine!
Just in case, I haven't yet added the 0.9.2 .debs to our apt repo, so that it will be easier to rollback in case something goes wrong. Let's wait a day or two and then add these to apt.
Here are rollback instructions:
- Install previous version
sudo apt-get install druid-common=0.9.0-2~jessie1 druid-historical=0.9.0-2~jessie1 druid-coordinator=0.9.0-2~jessie1 druid-broker=0.9.0-2~jessie1 druid-middlemanager=0.9.0-2~jessie1 druid-overlord=0.9.0-2~jessie1
- Stop puppet
sudo puppet agent --disable
- Edit /etc/druid/common.runtime.properties and revert the druid.extensions.loadList to:
druid.extensions.loadList=["druid-datasketches","druid-hdfs-storage-cdh","druid-histogram","druid-namespace-lookup","mysql-metadata-storage"] # (Or revert https://gerrit.wikimedia.org/r/#/c/369432/ and run puppet)
- Recreate the hdfs extension symlinks:
sudo /usr/local/bin/druid-hdfs-storage-cdh-link /usr/share/druid/extensions/druid-hdfs-storage /usr/share/druid/extensions/druid-hdfs-storage-cdh /usr/lib/hadoop/client
- Follow restart instructions at http://druid.io/docs/latest/operations/rolling-updates.html
Change 369997 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Use newer guava jar in druid hdfs cdh storage extension
Change 369997 merged by Ottomata:
[operations/puppet@production] Use newer guava jar in druid hdfs cdh storage extension
I don't want this task to disappear yet! I still need to upload the 0.9.2 .debs to our apt. I hadn't done it yet because I wanted to make rollback easy. I'll keep this open until that's done.
Alright! 0.9.2 pushed to gerrit (in branch debian-0.9.2) and added to apt: https://apt.wikimedia.org/wikimedia/pool/main/d/druid/